Authors: Ben Packer and Carlota Soto
You launched your product, you got some users, and now... Things are getting slower. Queries that used to run in 100 ms are now taking one second, and your team (which has spent the last six months writing application code) faces a whole new set of questions around PostgreSQL performance tuning and optimization.
Scaling PostgreSQL can be challenging, but you don’t need to panic. There are plenty of ways to scale a PostgreSQL database. You may need:
More CPU power or memory
A partition strategy
Alter your PostgreSQL settings like shared_buffers or max_parallel_workers
Optimizing indexes
Updating database schema
First, congrats! These are great problems to have: they mean you’re scaling your PostgreSQL database.
Second, don't panic—and definitely don't start frantically Googling for a different database to use. PostgreSQL is the most loved database out there and the best foundation for your application, but if you’re scaling, it may need a little extra love.
In this article, we will cover CPU and memory sizing for your database. As often happens with databases, there aren’t foolproof guidelines we can assure will work without direct experimentation and observation: there are very few true invariants and always caveats. Fortunately, though, there are some principles you can systematically follow, and we’ll explore them in this series.
Check out other articles in this series that cover partitioning strategy, PostgreSQL parameters, index optimization, and schema design.
It may be tempting to fix every performance issue by throwing more resources (a.k.a. money) at the problem—don’t do this. That is an unsustainable solution that will raise your bill through the roof in the future. So, let’s start by breaking down the relationship between your available database resources (CPU and memory) and performance.
To properly size your database CPU resources, you need to estimate requirements and monitor your system once it is in place. Let’s start by looking at a solid estimate of how CPU resources affect database throughput.
There are three main components of the (usually virtual) machine that runs your database: memory (RAM), CPU, and disk I/O. This post will focus on CPU and memory.
Your CPU is the core work unit in your database. Operations such as data transfer between the database and client, index scanning, data joining, and the evaluation of WHERE clauses all rely on the CPU. Generally, given the absence of memory or disk constraints, PostgreSQL's read throughput scales in direct proportion to the number of available cores.
Your CPU also handles concurrency in your database. PostgreSQL can run more than one query per core. Still, in general, you should use the following formula as a rough estimate for queries per second:
💡Queries per Second (QPS) ≈ (1 / Average Query Runtime in Seconds) × Number of Cores
This estimate relates your average query time, queries per second, and the number of available CPU cores. The formula assumes each core can handle one query at a time and that other factors, like memory or disk access, are not bottlenecks. You can use it to estimate target CPU capacities or available throughput.
For example:
If you anticipate that your system needs to handle 320 queries per second and know the average query takes 50 ms, you'd need roughly 16 cores (given the linear scaling assumption).
Conversely, if you have a fixed number of cores available, like eight, and know your average query time, you can estimate the load your system can handle efficiently.
Remember, this is a back-of-the-napkin estimate. Real-world performance will vary due to many other factors. In most cases, you can expect a higher QPS than what this formula provides.
Once you have enough CPU power to handle your throughput, you need to monitor the system to ensure it works properly.
If you’ve sized your CPU properly and yet your database is slow, here are some signs that you might need to increase the number of cores on your machine:
Observed CPU usage remains consistently high. You should always keep an eye on current and historical CPU utilization information to correlate that utilization with application activity. If you’re using Timescale, you can access this information via the Metrics panel in the console. Remember—increasing your CPU is not always the best solution to counter low performance. Your CPU usage may be maxed out because you’re using resources inefficiently, for example, due to misconfigured partitioning or missing indexes.
Queries slow down when there is higher activity, even without more data. If most of your queries are fast during low-volume periods, but many slow down when you reach certain queries per second, that may be a sign that you need more cores.
You are using analytical queries heavily (aka aggregates). All queries use CPU, but big joins, aggregates, and sequential scans use more (they may use multiple cores) and for longer. If you have queries that compute aggregates over large amounts of data, make sure that you have enough CPU to still serve your application’s requests. Timescale’s continuous aggregates also solve this problem—they will reduce your CPU requirements considerably while keeping your analytical queries very fast. Consider using them if you're a heavy user of aggregates.
To catch these issues, you must monitor system performance and adjust resources accordingly. Here are a few tips on CPU monitoring:
Use pg_stat_activity to monitor wait_event: pg_stat_activity
is a PostgreSQL view that provides information about the current activities of processes. The wait_event
column specifically tells you what a process is waiting for. If the value in this column is null, it generally indicates that the process isn't waiting on any I/O or lock operations but is actively processing and utilizing CPU. In simpler terms, a null value suggests that the query or operation is CPU-bound, meaning that the CPU is the primary resource being utilized by that particular process.
Use EXPLAIN_ANALYZE to monitor parallelism: The EXPLAIN ANALYZE
command in PostgreSQL shows a detailed execution plan of a query, along with the actual runtime statistics. This includes information about which parts of the query are executed in parallel, how many workers are used, and the efficiency of the parallel processing. This information can help you maximize the multi-core setup of your CPU.
Use Pgwatch2 to monitor the whole system: Pgwatch2
is a popular open-source monitoring tool designed specifically for PostgreSQL databases. One of its standout features is its detailed breakdown of CPU usage for different components and operations within your PostgreSQL instance. By using Pgwatch2
, you can gain insights into which parts of their workload are consuming the most CPU and potentially where optimizations can be made.
Now that we’ve looked at sizing your CPU, let's look at sizing memory for your database.
A PostgreSQL database utilizes memory in two primary ways, each governed by distinct parameters:
Caching data and indexes. PostgreSQL maintains a cache of both raw data and indexes in memory. The size of this cache is determined by the shared_buffers
parameter. Data is stored in "pages," which are essentially chunks of data. This applies to both data files and indexes. The presence or absence of a page in memory determines if it's cached. While there are ample resources to delve into the intricacies of cache eviction algorithms and page structures, the key thing to remember is that PostgreSQL uses an LRU (Least Recently Used) system for cache eviction.
Working memory for intermediate data structures. When processing queries, PostgreSQL often needs to create temporary data structures to aid in delivering results. This could be as part of join operations, sorting, or even generating aggregates. The amount of memory that can be allocated for these structures is managed by the work_mem
parameter. For instance, consider the various join algorithms employed by PostgreSQL: many require memory for operations like sorting or creating a hash map. Similarly, aggregate functions like GROUP BY
and DISTINCT
, or queries using ORDER BY
(that don't use an index for sorting), may also create temporary data structures that occupy memory. If a join or hash needs more memory than your work_mem
setting, it will spill to disk, which will be much slower.
These two parameters (shared_buffers
and work_mem
) are especially important. We look at them in detail later in this series, as they’re convenient in understanding memory-related performance issues in PostgreSQL:
The cache maintained in shared_buffers
is crucial for speeding up data access. A low cache hit ratio indicates that the database has to frequently read from disk rather than from the faster memory cache. Increasing the shared_buffers
might help, but ensure it's balanced with the OS cache.
If complex operations (like sorts or joins) frequently spill to disk because they exceed the configured work_mem
, performance will suffer. Increasing work_mem
may help, but you must be cautious here, too, as setting it too high could eventually consume all your memory, especially with many concurrent connections.
It's worth noting that there are two layers of caching: the PostgreSQL memory cache, which is directly managed by the database and is the primary cache level where data and indexes are held, and the operating system cache. This is an auxiliary cache level managed by the operating system. While PostgreSQL isn't directly aware of what's cached here, this layer significantly accelerates file-reading operations by holding onto files recently read from disk.
Generally, if your running database has more memory, it has to read from (and sometimes even write to) disk less, which is a good thing. But extra memory is expensive. How can you know that you have to increase the amount of RAM on your machine?
Queries have unpredictable performance. If a query is slow when run once and then fast on the second go, it could be a sign that the data it needs cannot stay in the cache consistently. However, there are other potential causes for this type of speed pattern (such as high query planning time, which is cached), so it's worth directly observing this difference by comparing shared read
vs. shared hit
in your query plan.
You have many big indexes. If all of your queries can be satisfied with index scans, and all of your indexes can fit in memory, you will likely get good enough performance even if the database has to go to disk to fetch the whole returned row. Many users are aware that more indexes on a table mean that inserts, updates, and deletes have to do more work, but it's also the case that in order to properly take advantage of additional indexes, you may need to increase the memory on your machine so they can stay in the cache.
Your sorts are spilling to disk. As we’ve mentioned before, if you see an External Sort
or External Merge
node while debugging slow queries, this means that the memory required for the operation exceeds the available work_mem
. As a result, PostgreSQL had to write its intermediate data structure to disk and then clean up at the end.
You see a lot of DataFileRead
wait events in pg_stat_activity
. Anytime a query reads from disk, a wait event is triggered. Directly observing these in pg_stat_activity
or comparing the number of events over time can give you a clue that you may need to fiddle with your memory. Significant wait events are also a sign that sequential scans are happening, so make sure to eliminate the possibility that you're simply missing an index (or have written the query to make the use of the index impossible) before spending more money.
Lastly, here are a few extra tips on memory monitoring.
Indexes can consume a significant portion of memory when loaded into the cache, particularly large ones. Knowing the size of your indexes helps you understand how much memory they might occupy and decide which indexes are most crucial. To do so, use the pg_relation_size()
function along with the index name:
SELECT pg_size_pretty(pg_relation_size('index_name_here'));
An unused index not only consumes disk space but can also lead to unnecessary overhead during write operations. However, it's worth noting that only the blocks of an index used by queries will take up space in memory when loaded into cache.
You can regularly monitor which indexes are being accessed and which aren't via the pg_stat_user_indexes
view, which contains statistics about user-defined indexes. Specifically, the idx_scan
column will indicate how many times an index was scanned. A value close to or at zero over extended periods shows the index might be unused.
Evaluating these points requires monitoring. Here are a few extra tips on monitoring memory usage in your database.
Use pg_relation_size() to measure your indexes: Indexes, particularly large ones, can consume a significant portion of memory when loaded into the cache. Knowing the size of your indexes helps you understand how much memory they might occupy and enables you to decide which indexes are most crucial. To do so, use the pg_relation_size() function along with the index name.
Use pg_stat_user_indexes to monitor index usage: An unused index consumes disk space and can lead to unnecessary overhead during write operations. You can regularly monitor which indexes are being accessed and which aren't via the pg_stat_user_indexes view, which contains statistics about user-defined indexes. The idx_scan column will indicate how many times an index was scanned. If the value is consistently low, you may want to remove this index.
Use EXPLAIN_ANALYZE to tell if something was fetched from disk: to get a detailed view of memory interactions, you should employ EXPLAIN (ANALYZE, VERBOSE, BUFFERS). This combination of flags furnishes insights into memory usage, buffer hits, and disk reads. In the output, you will see shared hit and shared read. A shared hit implies that the required data was found in the cache, while a shared read suggests that the data had to be fetched from the disk. A higher ratio of hits to reads usually indicates better memory optimization.
use pg_stat_activity to check memory: Querying the pg_stat_activity view can show which queries are waiting on memory-related events or are consuming significant resources. This view can be especially useful to see if certain queries are consistently causing issues.
In this article, we have seen how to size your CPU and memory to keep your PostgreSQL database in top condition. Take a look at our other articles in this series covering partitioning strategy, PostgreSQL parameters, index optimization, and schema design.