Nov 25, 2024
Posted by
Ryan Booz
pg_stat_statements
allows you to quickly identify problematic or slow Postgres queries, providing instant visibility into your database performance. Today, we're announcing that we've enabled pg_stat_statements
by default in all Timescale services. This is part of our #AlwaysBeLaunching Cloud Week with MOAR features! 🐯☁️
PostgreSQL is one of the fastest-growing databases in terms of usage and community size, being backed by many dedicated developers and supported by a broad ecosystem of tooling, connectors, libraries, and visualization applications. PostgreSQL is also extensible: using PostgreSQL extensions, users can add extra functionality to PostgreSQL’s core. Indeed, TimescaleDB itself is packaged as a PostgreSQL extension, which also plays nicely with the broad set of other PostgreSQL extensions, as we’ll see today.
Today, we’re excited to share that pg_stat_statements
, one of the most popular and widely used PostgreSQL extensions, is now enabled by default in all Timescale services. If you’re new to Timescale, start a free trial (100 % free for 30 days, no credit card required).
pg_stat_statements
is a PostgreSQL extension that records information about your running queries. Identifying performance bottlenecks in your database can often feel like a cat-and-mouse game. Quickly written queries, index changes, or complicated ORM query generators can (and often do) negatively impact your database and application performance.
As we will show you in this post, pg_stat_statements
is an invaluable tool to help you identify which queries are performing slowly and poorly and why. For example, you can query pg_stat_statements
to know how many times a query has been called, the query execution time, the hit cache ratio for a query (how much data was available in memory vs. on disk to satisfy your query), and other helpful statistics such as the standard deviation of a query execution time.
Keep reading to learn how to query pg_stat_statements
to identify PostgreSQL slow queries and other performance bottlenecks in your Timescale database.
A huge thank you to Lukas Bernert, Monae Payne, and Charis Lam for taking care of all things pg_stat_statements in Timescale.
pg_stat_statements
in Timescale Querying statistics data for your Timescale database from the pg_stat_statements
view is straightforward once you're connected to the database.
The view returns many columns of data (more than 30!), but if you look at the results above, one value immediately sticks out: <insufficient privilege>
.
pg_stat_statements
collects data on all databases and users, which presents a security challenge if any user is allowed to query performance data. Therefore, although any user can query data from the views, only superusers and those specifically granted the pg_read_all_stats
permission can see all user-level details, including the queryid
and query
text.
This includes the tsdbadmin
user, which is created by default for all Timescale services. Although this user owns the database and has the most privileges, it is not a superuser account and cannot see the details of all other queries within the service cluster.
Therefore, it's best to filter pg_stat_statements
data by userid
for any queries you want to perform.
When you add the filter, only data that you have access to is displayed. If you have created additional accounts in your service for specific applications, you could also filter to those accounts.
To make the rest of our example queries easier to work with, we recommend that you use this base query with a common table expression (CTE). This query form will return the same data but make the rest of the query a little easier to write.
Now that we know how to query only the data we have access to, let's review a few of the columns that will be the most useful for spotting potential problems with your queries.
calls
: the number of times this query has been called.total_exec_time
: the total time spent executing the query, in milliseconds.rows
: the total number of rows retrieved by this query.shared_blks_hit
: the number of blocks already cached when read for the query.shared_blks_read
: the number of blocks that had to be read from the disk to satisfy all calls for this query form.Two quick reminders about the data columns above:
queryid
.Using these columns of data, let's look at a few common queries that can help you narrow in on the problematic queries.
One of the quickest ways to find slow Postgres queries that merit your attention is to look at each query’s average total time. This is not a time-weighted average since the data is cumulative, but it still helps frame a relevant context for where to start.
Adjust the calls
value to fit your specific application needs. Querying for higher (or lower) total number of calls can help you identify queries that aren't run often but are very expensive or queries that are run much more often than you expect and take longer to run than they should.
This sample database we're using for these queries is based on the NFT starter kit, which allows you to ingest data on a schedule from the OpenSea API and query NFT sales data. As part of the normal process, you can see that a TEMPORARY TABLE
is created to ingest new data and update existing records as part of a lightweight extract-transform-load process.
That query has been called 60,725 times since this service started and has taken around 4.5 minutes of total execution time to create the table. By contrast, the first query shown takes the longest, on average, to execute—around 350 milliseconds each time. It retrieves the oldest timestamp in the nft_sales
table and has used more than 12 minutes of execution time since the server was started.
From a work perspective, finding a way to improve the performance of the first query will have a more significant impact on the overall server workload.
Like nearly everything in computing, databases tend to perform best when data can be queried in memory rather than going to external disk storage. If PostgreSQL has to retrieve data from storage to satisfy a query, it will typically be slower than if all of the needed data was already loaded into the reserved memory space of PostgreSQL. We can measure how often a query has to do this through a value known as Hit Cache Ratio.
Hit Cache Ratio is a measurement of how often the data needed to satisfy a query was available in memory. A higher percentage means that the data was already available and it didn't have to be read from disk, while a lower value can be an indication that there is memory pressure on the server and isn't able to keep up with the current workload.
If PostgreSQL has to constantly read data from disk to satisfy the same query, it means that other operations and data are taking precedence and "pushing" the data your query needs back out to disk each time.
This is a common scenario for time-series workloads because newer data is written to memory first, and if there isn't enough free buffer space, data that is used less will be evicted. If your application queries a lot of historical data, older hypertable chunks might not be loaded into memory and ready to quickly serve the query.
A good place to start is with queries that run often and have a Hit Cache Ratio of less than 98 %. Do these queries tend to pull data from long periods of time? If so, that could be an indication that there's not enough RAM to efficiently store this data long enough before it is evicted for newer data.
Depending on the application query pattern, you could improve Hit Cache Ratio by increasing server resources, consider index tuning to reduce table storage, or use TimescaleDB compression on older chunks that are queried regularly.
This sample database isn't very active, so the overall query counts are not very high compared to what a traditional application would probably show. In our example data above, a query called more than 500 times is a "frequently used query."
We can see above that one of the most expensive queries also happens to have the lowest Hit Cache Ratio of 93.75 %. This means that roughly 6 % of the time, PostgreSQL has to retrieve data from disk to satisfy the query. While that might not seem like a lot, your most frequently called queries should have a ratio of 99 % or more in most cases.
If you look closely, notice that this is the same query that stood out in our first example that showed how to find long-running queries. It's quickly becoming apparent that we can probably tune this query in some way to perform better. As it stands now, it's the slowest query per call, and it consistently has to read some data from disk rather than from memory.
For a final example, let's consider another way to judge which queries often have the greatest opportunity for improvement: using the standard deviation of a query execution time.
Finding the slowest queries is a good place to start. However, as discussed in the blog post What Time-Weighted Averages Are and Why You Should Care, averages are only part of the story. Although pg_stat_statements
doesn't provide a method for tracking time-weighted averages, it does track the standard deviation of all calls and execution time.
Standard deviation is a method of assessing how widely the time each query execution takes compared to the overall mean. If the standard deviation value is small, then queries all take a similar amount of time to execute. If the standard deviation value is large, this indicates that the execution time of the query varies significantly from request to request.
Determining how good or bad the standard deviation is for a particular query requires more data than just the mean and standard deviation values. To make the most sense of these numbers, we at least need to add the minimum and maximum execution times to the query. By doing this, we can start to form a mental model for the overall span execution times that the query takes.
In the example result below, we're only showing the data for one query to make it easier to read, the same ORDER BY time LIMIT 1
query we've seen in our previous example output.
In this case, we can extrapolate a few things from these statistics:
When I examine the output of these three queries together, this specific ORDER BY time LIMIT 1
query seems to stick out. It's slower per call than most other queries, it often requires the database to retrieve data from disk, and the execution times seem to vary dramatically over time.
As long as I understood where this query was used and how the application could be impacted, I would certainly put this "first point" query on my list of things to improve.
The pg_stat_statements
extension is an invaluable monitoring tool, especially when you understand how statistical data can be used in the database and application context.
For example, an expensive query called a few times a day or month might not be worth the effort to tune right now. Instead, a moderately slow query called hundreds of times an hour (or more) will probably better use your query tuning effort.
If you want to learn how to store metrics snapshots regularly and move from static, cumulative information to time-series data for more efficient database monitoring, check out the blog post Point-in-Time PostgreSQL Database and Query Monitoring With pg_stat_statements.
pg_stat_statements
is automatically enabled in all Timescale services. If you’re not a user yet, you can try out Timescale for free (no credit card required) to get access to a modern cloud-native database platform with TimescaleDB's top performance, one-click database replication, forking, and VPC peering.