Using pg_stat_statements to Optimize Queries

pg_stat_statements allows you to quickly identify problematic or slow Postgres queries, providing instant visibility into your database performance. Today, we're announcing that we've enabled pg_stat_statements by default in all Timescale services. This is part of our #AlwaysBeLaunching Cloud Week with MOAR features! 🐯☁️

PostgreSQL is one of the fastest-growing databases in terms of usage and community size, being backed by many dedicated developers and supported by a broad ecosystem of tooling, connectors, libraries, and visualization applications. PostgreSQL is also extensible: using PostgreSQL extensions, users can add extra functionality to PostgreSQL’s core. Indeed, TimescaleDB itself is packaged as a PostgreSQL extension, which also plays nicely with the broad set of other PostgreSQL extensions, as we’ll see today.

Today, we’re excited to share that pg_stat_statements, one of the most popular and widely used PostgreSQL extensions, is now enabled by default in all Timescale services. If you’re new to Timescale, start a free trial (100 % free for 30 days, no credit card required).

What is pg_stat_statements?

pg_stat_statements is a PostgreSQL extension that records information about your running queries. Identifying performance bottlenecks in your database can often feel like a cat-and-mouse game. Quickly written queries, index changes, or complicated ORM query generators can (and often do) negatively impact your database and application performance.

How to use pg_stat_statements

As we will show you in this post, pg_stat_statements is an invaluable tool to help you identify which queries are performing slowly and poorly and why. For example, you can query pg_stat_statements to know how many times a query has been called, the query execution time, the hit cache ratio for a query (how much data was available in memory vs. on disk to satisfy your query), and other helpful statistics such as the standard deviation of a query execution time.

Keep reading to learn how to query pg_stat_statements to identify PostgreSQL slow queries and other performance bottlenecks in your Timescale database.

A huge thank you to Lukas Bernert, Monae Payne, and Charis Lam for taking care of all things pg_stat_statements in Timescale.

How to Query pg_stat_statements in Timescale

Querying statistics data for your Timescale database from the pg_stat_statements view is straightforward once you're connected to the database.

SELECT * FROM pg_stat_statements;

userid|dbid |queryid             |query                         
------+-----+--------------------+------------------------------
 16422|16434| 8157083652167883764|SELECT pg_size_pretty(total_by
    10|13445|                    |<insufficient privilege>      
 16422|16434|-5803236267637064108|SELECT game, author_handle, gu
 16422|16434|-8694415320949103613|SELECT c.oid,c.*,d.description
    10|16434|                    |<insufficient privilege>      
    10|13445|                    |<insufficient privilege>   
 ...  |...  |...                 |...  

Queries that the tsdbadmin user does not have access to will hide query text and identifier

The view returns many columns of data (more than 30!), but if you look at the results above, one value immediately sticks out: <insufficient privilege>.

pg_stat_statements collects data on all databases and users, which presents a security challenge if any user is allowed to query performance data. Therefore, although any user can query data from the views, only superusers and those specifically granted the pg_read_all_stats permission can see all user-level details, including the queryid and query text.

This includes the tsdbadmin user, which is created by default for all Timescale services. Although this user owns the database and has the most privileges, it is not a superuser account and cannot see the details of all other queries within the service cluster.

Therefore, it's best to filter pg_stat_statements data by userid for any queries you want to perform.

-- current_user will provide the rolname of the authenticated user
SELECT * FROM pg_stat_statements pss
	JOIN pg_roles pr ON (userid=oid)
WHERE rolname = current_user;


userid|dbid |queryid             |query                         
------+-----+--------------------+------------------------------
 16422|16434| 8157083652167883764|SELECT pg_size_pretty(total_by
 16422|16434|-5803236267637064108|SELECT game, author_handle, gu
 16422|16434|-8694415320949103613|SELECT c.oid,c.*,d.description
 ...  |...  |...                 |...  		 

Queries for only the tsdbadmin user, showing all details and statistics

When you add the filter, only data that you have access to is displayed. If you have created additional accounts in your service for specific applications, you could also filter to those accounts.

To make the rest of our example queries easier to work with, we recommend that you use this base query with a common table expression (CTE). This query form will return the same data but make the rest of the query a little easier to write.

-- current_user will provide the rolname of the authenticated user
WITH statements AS (
SELECT * FROM pg_stat_statements pss
		JOIN pg_roles pr ON (userid=oid)
WHERE rolname = current_user
)
SELECT * FROM statements;

userid|dbid |queryid             |query                         
------+-----+--------------------+------------------------------
 16422|16434| 8157083652167883764|SELECT pg_size_pretty(total_by
 16422|16434|-5803236267637064108|SELECT game, author_handle, gu
 16422|16434|-8694415320949103613|SELECT c.oid,c.*,d.description
 ...  |...  |...                 |...   		 

Query that shows the same results as before, but this time with the base query in a CTE for more concise queries later

Now that we know how to query only the data we have access to, let's review a few of the columns that will be the most useful for spotting potential problems with your queries.

  • calls: the number of times this query has been called.
  • total_exec_time: the total time spent executing the query, in milliseconds.
  • rows: the total number of rows retrieved by this query.
  • shared_blks_hit: the number of blocks already cached when read for the query.
  • shared_blks_read: the number of blocks that had to be read from the disk to satisfy all calls for this query form.

Two quick reminders about the data columns above:

  1. All values are cumulative since the last time the service was started, or a superuser manually resets the values.
  2. All values are for the same query form after parameterizing the query and based on the resulting hashed queryid.

Using these columns of data, let's look at a few common queries that can help you narrow in on the problematic queries.

Long-Running PostgreSQL Queries

One of the quickest ways to find slow Postgres queries that merit your attention is to look at each query’s average total time. This is not a time-weighted average since the data is cumulative, but it still helps frame a relevant context for where to start.

Adjust the calls value to fit your specific application needs. Querying for higher (or lower) total number of calls can help you identify queries that aren't run often but are very expensive or queries that are run much more often than you expect and take longer to run than they should.

-- query the 10 longest running queries with more than 500 calls
WITH statements AS (
SELECT * FROM pg_stat_statements pss
		JOIN pg_roles pr ON (userid=oid)
WHERE rolname = current_user
)
SELECT calls, 
	mean_exec_time, 
	query
FROM statements
WHERE calls > 500
AND shared_blks_hit > 0
ORDER BY mean_exec_time DESC
LIMIT 10;


calls|mean_exec_time |total_exec_time | query
-----+---------------+----------------+-----------
 2094|        346.93 |      726479.51 | SELECT time FROM nft_sales ORDER BY time ASC LIMIT $1 |
 3993|         5.728 |       22873.52 | CREATE TEMPORARY TABLE temp_table ... |
 3141|          4.79 |       15051.06 | SELECT name, setting FROM pg_settings WHERE ... |
60725|          3.64 |      221240.88 | CREATE TEMPORARY TABLE temp_table ... |   
  801|          1.33 |        1070.61 | SELECT pp.oid, pp.* FROM pg_catalog.pg_proc p  ...|
 ... |...            |...                 |  		 

Queries that take the most time, on average, to execute

This sample database we're using for these queries is based on the NFT starter kit, which allows you to ingest data on a schedule from the OpenSea API and query NFT sales data. As part of the normal process, you can see that a TEMPORARY TABLE is created to ingest new data and update existing records as part of a lightweight extract-transform-load process.

That query has been called 60,725 times since this service started and has taken around 4.5 minutes of total execution time to create the table. By contrast, the first query shown takes the longest, on average, to execute—around 350 milliseconds each time. It retrieves the oldest timestamp in the nft_sales table and has used more than 12 minutes of execution time since the server was started.

From a work perspective, finding a way to improve the performance of the first query will have a more significant impact on the overall server workload.

Hit Cache Ratio

Like nearly everything in computing, databases tend to perform best when data can be queried in memory rather than going to external disk storage. If PostgreSQL has to retrieve data from storage to satisfy a query, it will typically be slower than if all of the needed data was already loaded into the reserved memory space of PostgreSQL. We can measure how often a query has to do this through a value known as Hit Cache Ratio.

Hit Cache Ratio is a measurement of how often the data needed to satisfy a query was available in memory. A higher percentage means that the data was already available and it didn't have to be read from disk, while a lower value can be an indication that there is memory pressure on the server and isn't able to keep up with the current workload.

If PostgreSQL has to constantly read data from disk to satisfy the same query, it means that other operations and data are taking precedence and "pushing" the data your query needs back out to disk each time.

This is a common scenario for time-series workloads because newer data is written to memory first, and if there isn't enough free buffer space, data that is used less will be evicted. If your application queries a lot of historical data, older hypertable chunks might not be loaded into memory and ready to quickly serve the query.

A good place to start is with queries that run often and have a Hit Cache Ratio of less than 98 %. Do these queries tend to pull data from long periods of time? If so, that could be an indication that there's not enough RAM to efficiently store this data long enough before it is evicted for newer data.

Depending on the application query pattern, you could improve Hit Cache Ratio by increasing server resources, consider index tuning to reduce table storage, or use TimescaleDB compression on older chunks that are queried regularly.

-- query the 10 longest running queries
WITH statements AS (
SELECT * FROM pg_stat_statements pss
		JOIN pg_roles pr ON (userid=oid)
WHERE rolname = current_user
)
SELECT calls, 
	shared_blks_hit,
	shared_blks_read,
	shared_blks_hit/(shared_blks_hit+shared_blks_read)::NUMERIC*100 hit_cache_ratio,
	query
FROM statements
WHERE calls > 500
AND shared_blks_hit > 0
ORDER BY calls DESC, hit_cache_ratio ASC
LIMIT 10;


calls | shared_blks_hit | shared_blks_read | hit_cache_ratio |query
------+-----------------+------------------+-----------------+--------------
  118|            441126|                 0|           100.00| SELECT bucket, slug, volume AS "volume (count)", volume_eth...
  261|          62006272|             22678|            99.96| SELECT slug FROM streamlit_collections_daily cagg...¶        I
 2094|         107188031|           7148105|            93.75| SELECT time FROM nft_sales ORDER BY time ASC LIMIT $1...      
  152|          41733229|                 1|            99.99| SELECT slug FROM streamlit_collections_daily cagg...¶        I
  154|          36846841|             32338|            99.91| SELECT a.img_url, a.name, MAX(s.total_price) AS price, time...

 ... |...               |...               | ...             | ...

The query that shows the Hit Cache Ratio of each query, including the number of buffers that were ready from disk or memory to satisfy the query

This sample database isn't very active, so the overall query counts are not very high compared to what a traditional application would probably show. In our example data above, a query called more than 500 times is a "frequently used query."

We can see above that one of the most expensive queries also happens to have the lowest Hit Cache Ratio of 93.75 %. This means that roughly 6 % of the time, PostgreSQL has to retrieve data from disk to satisfy the query. While that might not seem like a lot, your most frequently called queries should have a ratio of 99 % or more in most cases.

If you look closely, notice that this is the same query that stood out in our first example that showed how to find long-running queries. It's quickly becoming apparent that we can probably tune this query in some way to perform better. As it stands now, it's the slowest query per call, and it consistently has to read some data from disk rather than from memory.

Queries With High Standard Deviation

For a final example, let's consider another way to judge which queries often have the greatest opportunity for improvement: using the standard deviation of a query execution time.

Finding the slowest queries is a good place to start. However, as discussed in the blog post What Time-Weighted Averages Are and Why You Should Care, averages are only part of the story. Although pg_stat_statements doesn't provide a method for tracking time-weighted averages, it does track the standard deviation of all calls and execution time.

How can this be helpful?

Standard deviation is a method of assessing how widely the time each query execution takes compared to the overall mean. If the standard deviation value is small, then queries all take a similar amount of time to execute. If the standard deviation value is large, this indicates that the execution time of the query varies significantly from request to request.

Determining how good or bad the standard deviation is for a particular query requires more data than just the mean and standard deviation values. To make the most sense of these numbers, we at least need to add the minimum and maximum execution times to the query. By doing this, we can start to form a mental model for the overall span execution times that the query takes.

In the example result below, we're only showing the data for one query to make it easier to read, the same ORDER BY time LIMIT 1 query we've seen in our previous example output.

-- query the 10 longest running queries
WITH statements AS (
SELECT * FROM pg_stat_statements pss
		JOIN pg_roles pr ON (userid=oid)
WHERE rolname = current_user
)
SELECT calls, 
	min_exec_time,
	max_exec_time, 
	mean_exec_time,
	stddev_exec_time,
	(stddev_exec_time/mean_exec_time) AS coeff_of_variance,
	query
FROM statements
WHERE calls > 500
AND shared_blks_hit > 0
ORDER BY mean_exec_time DESC
LIMIT 10;


Name              |Value                                                |
------------------+-----------------------------------------------------+
calls             |2094                                                 |
min_exec_time     |0.060303                                             |
max_exec_time     |1468.401726                                          |
mean_exec_time    |346.9338636657108                                    |
stddev_exec_time  |212.3896857655582                                    |
coeff_of_variance |0.612190702635494                                    |
query             |SELECT time FROM nft_sales ORDER BY time ASC LIMIT $1|	 

Queries showing the min, max, mean, and standard deviation of each query

In this case, we can extrapolate a few things from these statistics:

  • For our application, this query is called frequently (remember, more than 500 calls is a lot for this sample database).
  • If we look at the full range of execution time in conjunction with the mean, we see that the mean is not centered. This could imply that there are execution time outliers or that the data is skewed. Both are good reasons to investigate this query’s execution times further.
  • Additionally, if we look at the coefficient of variation column, which is the ratio between the standard deviation and the mean (also called the coefficient of variation), we get 0.612 which is fairly high. In general, if this ratio is above 0.3, then the variation of your data is quite large. Since we find the data is quite varied, it seems to imply that instead of a few outliers skewing the mean, there are a number of execution times taking longer than they should. This provides further confirmation that the execution time for this query should be investigated further.

When I examine the output of these three queries together, this specific ORDER BY time LIMIT 1 query seems to stick out. It's slower per call than most other queries, it often requires the database to retrieve data from disk, and the execution times seem to vary dramatically over time.

As long as I understood where this query was used and how the application could be impacted, I would certainly put this "first point" query on my list of things to improve.

Speed Up Your PostgreSQL Queries

The pg_stat_statements extension is an invaluable monitoring tool, especially when you understand how statistical data can be used in the database and application context.

For example, an expensive query called a few times a day or month might not be worth the effort to tune right now. Instead, a moderately slow query called hundreds of times an hour (or more) will probably better use your query tuning effort.

If you want to learn how to store metrics snapshots regularly and move from static, cumulative information to time-series data for more efficient database monitoring, check out the blog post Point-in-Time PostgreSQL Database and Query Monitoring With pg_stat_statements.

pg_stat_statements is automatically enabled in all Timescale services. If you’re not a user yet, you can try out Timescale for free (no credit card required) to get access to a modern cloud-native database platform with TimescaleDB's top performance, one-click database replication, forking, and VPC peering.