Written by Semab Tariq
Monitoring your PostgreSQL database is crucial for ensuring its optimal performance and reliability. By monitoring key parameters such as query execution times, resource utilization, and server availability, you can identify potential issues early and take proactive measures to maintain efficiency.
In this article, we'll delve into monitoring PostgreSQL using built-in tools like pg_stat_activity, pg_stat_statements, and pg_stat_all_tables and Timescale’s own query monitoring tool, Insights. Additionally, we'll explore integrating the PostgreSQL exporter, Prometheus, and Grafana to create a dashboard for monitoring server health.
Want to learn more about how to build your application using PostgreSQL/Timescale? Be sure to check these articles—we’re taking you from data migration to monitoring your database.
pg_stat_statements is a built-in PostgreSQL extension that keeps track of SQL statements executed by the database server. It records details about query execution counts, total execution time, and I/O-related details.
Long-running queries are simply queries that take a significant amount of time to execute, potentially causing performance issues or delays in database operations. These queries may consume excessive resources, such as CPU or memory, and can impact the overall responsiveness of the database system. Identifying and optimizing long-running queries is essential for maintaining the efficiency and reliability of the PostgreSQL database.
You can use the following query to extract the top three longest-running queries in your PostgreSQL database:
SELECT
userid :: regrole,
dbid,
mean_exec_time / 1000 as mean_exec_time_secs,
max_exec_time / 1000 as max_exec_time_secs,
min_exec_time / 1000 as min_exec_time_secs,
stddev_exec_time,
calls,
query
from
pg_stat_statements
order by
mean_exec_time DESC limit 3;
userid | dbid | mean_exec_time_secs | max_exec_time_secs | min_exec_time_secs | stddev_exec_time | calls | query ----------------------------------------------------------------------------------------------------
semab | 5 | 0.0018055465678913738 | 0.006023764 | 0.001152825 | 0.7903876933367537 | 3756 | SELECT name, setting, COALESCE(unit, $1), short_desc, vartype FROM pg_settings WHERE vartype IN ($2, $3, $4) AND name != $5
postgres | 5 | 0.001178044 | 0.001178044 | 0.001178044 | 0 | 1 | SELECT pg_stat_statements_reset()
semab | 5 | 0.0007018039854898849 | 0.001922814 | 0.000597347 | 0.034553571651097015 | 15024 | SELECT pg_database_size($1)
(3 rows)
userid :: regrole:
This syntax converts the userid column to the regrole data type, representing the executing role (user) of the SQL statement. Thus, it displays the actual name of the user instead of their userid, enhancing readability.
dbid
: Represents the database ID where the SQL statement was executed.
mean_exec_time / 1000 as mean_exec_time_secs
: it calculates the mean execution time(average execution time) of the SQL statement in seconds.
max_exec_time / 1000 as max_exec_time_secs
: it calculates the maximum execution time of the SQL statement in seconds.
min_exec_time / 1000 as min_exec_time_secs
: it calculates the minimum execution time of the SQL statement in seconds.
stddev_exec_time
: It measures the amount of variation or dispersion in the execution times of the query. A higher stddev_exec_time indicates more variability in the query execution times, while a lower value suggests more consistency. This metric helps assess the stability and predictability of query performance.
calls
: it indicates the number of times the SQL statement has been executed.
query
: it represents the SQL query statement itself.
ORDER BY
: it orders the result set based on the mean_exec_time column in descending order, meaning SQL statements with the longest mean execution times will appear first in the result set.
LIMIT 3
: it limits the number of rows returned to three, ensuring only the top three SQL statements with the longest mean execution times are included in the result set.
I/O-intensive queries are database operations that heavily rely on input/output operations, typically involving frequent reads or writes to disk. These queries often require significant disk access, leading to high disk I/O usage and potential performance bottlenecks.
You can use the following query to identify queries that frequently access disk resources, indicating potential disk-intensive operations:
SELECT
mean_exec_time / 1000 as mean_exec_time_secs,
calls,
rows,
shared_blks_hit,
shared_blks_read,
shared_blks_hit /(shared_blks_hit + shared_blks_read):: NUMERIC * 100 as hit_ratio,
(blk_read_time + blk_write_time)/calls as average_io_time_ms,
query
FROM
pg_stat_statements
where
shared_blks_hit > 0
ORDER BY
(blk_read_time + blk_write_time)/calls DESC;
shared_blks_hit
: it retrieves the amount of data read from the shared buffer cache.
shared_blks_read
: it retrieves the amount of data read from the disk.
shared_blks_hit /(shared_blks_hit + shared_blks_read):
: NUMERIC * 100 as hit_ratio: It calculates the hit ratio, which represents the percentage of shared blocks found in the buffer cache compared to the total number of shared blocks accessed (both from cache and disk). You can calculate it as (shared_blks_hit / (shared_blks_hit + shared_blks_read)) * 100.
(blk_read_time + blk_write_time)/calls as average_io_time_ms
: it calculates the average I/O time per call in milliseconds, which is the sum of block read and block write time divided by the number of calls.
WHERE: the filter shared_blks_hit > 0
retrieves the rows where the number of shared blocks in the buffer cache is greater than zero (0), focusing only on statements with at least one shared block hit.
ORDER BY
: this filter (blk_read_time + blk_write_time)/calls DESC
sorts the result in descending order based on the average I/O time per call.
pg_stat_all_tables is a system view in PostgreSQL that provides statistical information about all tables in the current database. It includes various metrics related to table access and usage, such as the number of sequential and index scans performed on each table, the number of tuples inserted, updated, and deleted, as well as information about vacuum and analysis operations.
A sequential scan refers to the process of scanning all the rows in a table sequentially, usually without using an index. It reads each row one by one from start to finish, which can be less efficient for large tables compared to using an index to access specific rows directly.
You can use the following query to retrieve the top three tables with the most sequential scans:
SELECT
schemaname,
relname,
Seq_scan,
idx_scan seq_tup_read,
seq_tup_read / seq_scan as avg_seq_read
FROM
pg_stat_all_tables
WHERE
seq_scan > 0 AND schemaname not in (‘pg_catalog’,’information_schema’)
ORDER BY
Avg_seq_read DESC LIMIT 3;
schemaname:
the name of the schema containing the table.
relname
: the name of the table.
seq_scan
: the number of sequential scans initiated on the table.
idx_scan
: the number of index scans initiated on the table.
seq_tup_read
: the number of live rows fetched by sequential scans.
seq_tup_read / seq_scan as avg_seq_read
: it calculates the average number of rows read per sequential scan.
seq_scan > 0
: it selects only tables that have been sequentially scanned at least once.
schemaname not in ('pg_catalog', 'information_schema')
: This clause in the SQL query filters out tables from the pg_catalog and information_schema schemas. These schemas contain system tables and views that are automatically created by PostgreSQL and are not typically user-created or user-managed.
ORDER BY
: it orders the result set based on the calculated avg_seq_read column in descending order, meaning tables with the highest average sequential read rate will appear first in the result set.
LIMIT 10
: it limits the number of rows returned to 10, ensuring only the top 10 tables with the highest average sequential read rates are included in the result set.
Infrequently accessed tables in a database are those that are not frequently queried or manipulated. These tables typically have low activity and are accessed less frequently compared to other tables in the database. They may contain historical data, archival data, or data that is rarely used in day-to-day operations.
The following query retrieves tables with low total scan counts (both sequential and index scans combined):
SELECT
schemaname,
relname,
seq_scan,
idx_scan,
(COALESCE(seq_scan, 0) + COALESCE(idx_scan, 0)) as
total_scans_performed
FROM
pg_stat_all_tables
WHERE
(COALESCE(seq_scan, 0) + COALESCE(idx_scan, 0)) < 10
AND schemaname not in (‘pg_catalog’, ‘information_schema’)
ORDER BY
5 DESC;
seq_scan
: this column represents the number of sequential scans performed on the table.
idx_scan
: this column represents the number of index scans performed on the table.
(COALESCE(seq_scan, 0) + COALESCE(idx_scan, 0)) as total_scans_performed
: This expression calculates the total number of scans performed on the table by summing up the sequential scans and index scans. COALESCE function is used to handle NULL values by replacing them with 0.
(COALESCE(seq_scan, 0) + COALESCE(idx_scan, 0)) < 10
: This condition filters the tables based on the total number of scans performed on each table. It selects tables where the total number of scans (sequential scans + index scans) is less than 10.
total_scans_performed DESC
: This clause orders the result set by the total number of scans performed on each table in descending order. Tables with the highest total number of scans appear first in the result set.
pg_stat_activity is a system view in PostgreSQL that provides information about the current activity of database connections. It includes one row per server process, showing details such as the username of the connected user, the database being accessed, the state of the connection (idle, active, waiting, etc.), the current query being executed, and more.
The following SQL query retrieves information about currently running (active) database sessions in PostgreSQL that have been executing for longer than one minute.
SELECT
datname AS database_name,
usename AS user_name,
application_name,
client_addr AS client_address,
client_hostname,
query AS current_query,
state,
query_start,
now() - query_start AS query_duration
FROM
pg_stat_activity
WHERE
state = 'active' AND now() - query_start > INTERVAL '10 sec'
ORDER BY
query_start DESC;
now() - query_start AS query_duration
: it calculates the duration of the query execution by subtracting the start time from the current time.
state = 'active'
: it filters the results to include only active queries currently running.
now() - query_start > INTERVAL '10 sec'
: it filters the results to include only queries that have been running for more than 10 seconds.
query_start DESC
: it orders the results based on the query start time in descending order, showing the most recently started queries first.
The PostgreSQL exporter (postgres_exporter) is a specialized tool designed to extract metrics and statistics from PostgreSQL database servers. It collects various performance-related data points and makes them available for monitoring and analysis through monitoring systems like Prometheus.
An open-source application used for event monitoring and alerting, Prometheus records metrics in a time-series database built using an HTTP pull model with flexible queries and real-time alerting.
Grafana is a versatile open-source analytics and interactive visualization tool accessible across multiple platforms. It enables the creation of charts, graphs, and alerts on the web, offering comprehensive insights when linked with compatible data sources.
For this integration, we are using an EC2 Ubuntu 22.04 instance where we are first setting up PostgreSQL and Postgres exporter and will then set up Prometheus and Grafana.
PostgreSQL: 5432
Postgres exporter: 9100
Prometheus: 9090
Grafana: 3000
Install the latest version of PostgreSQL:
sudo sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
sudo apt-get -y install postgresql
Modify the authentication method in the pg_hba.conf file to use scram-sha-256 instead of peer for local connections. This will allow you to use password-based authentication to the server.
Local all all scram-sha-256
Restart the server:
sudo systemctl restart postgresql
Create a database user for monitoring and set a password for this user:
CREATE USER monitoring_user WITH PASSWORD test@1234 SUPERUSER;
Download the latest release of the postgres_exporter binary:
wget https://github.com/prometheus-community/postgres_exporter/releases/download/v0.14.0/postgres_exporter-0.14.0.linux-amd64.tar.gz
Unzip the binary:
tar xzf postgres_exporter-0.14.0.linux-amd64.tar.gz
Move postgres_exporter binary to /usr/local/bin:
sudo cp postgres_exporter /usr/local/bin
Create a new directory under /opt to store connection information for the PostgreSQL server:
mkdir /opt/postgres_exporter
echo DATA_SOURCE_NAME="postgresql://monitoring_user:test@1234@localhost:5432/?sslmode=disable" > /opt/postgres_exporter/postgres_exporter.env
Create a service file for the postgres_exporter:
echo '[Unit]
Description=Postgres exporter for Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=postgres
Group=postgres
WorkingDirectory=/opt/postgres_exporter
EnvironmentFile=/opt/postgres_exporter/postgres_exporter.env
ExecStart=/usr/local/bin/postgres_exporter --web.listen-address=localhost:9100 --web.telemetry-path=/metrics
Restart=always
[Install]|
WantedBy=multi-user.target' >> /etc/systemd/system/postgres_exporter.service
Since we created a new service file, it is better to reload the demon once so it recognizes the new file:
sudo systemctl daemon-reload
Start and enable the postgres_exporter service:
sudo systemctl start postgres_exporter
sudo systemctl enable postgres_exporter
Check the service status:
sudo systemctl status postgres_exporter
postgres_exporter.service - Prometheus exporter for Postgresql
Loaded: loaded (/etc/systemd/system/postgres_exporter.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2024-03-05 13:52:56 UTC; 2h 15min ago
Main PID: 9438 (postgres_export)
Tasks: 6 (limit: 9498)
Verify the postgres_exporter setup from the browser:
Create a system group named Prometheus:
sudo groupadd --system prometheus
Create a system user named Prometheus in a Prometheus group without an interactive login:
sudo useradd -s /sbin/nologin --system -g prometheus prometheus
Creating the required directory structure:
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
Download the Prometheus source:
wget https://github.com/prometheus/prometheus/releases/download/v2.43.0/prometheus-2.43.0.linux-amd64.tar.gz
Decompress the source code:
tar vxf prometheus*.tar.gz
Set up proper permissions for the installation files:
cd prometheus*/
sudo mv prometheus /usr/local/bin
sudo mv promtool /usr/local/bin
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
sudo mv consoles /etc/prometheus
sudo mv console_libraries /etc/prometheus
sudo mv prometheus.yml /etc/prometheus
sudo chown prometheus:prometheus /etc/prometheus
sudo chown -R prometheus:prometheus /etc/prometheus/consoles
sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries
sudo chown -R prometheus:prometheus /var/lib/prometheus
Add the PostgreSQL Exporter configurations inside the prometheus.yml file at the following location:
/etc/prometheus/prometheus.yml
scrape_configs:
- job_name: "Postgres exporter"
scrape_interval: 5s
static_configs:
- targets: [localhost:9100]
Create a new service file for Prometheus at the following location:
/etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries
[Install]
WantedBy=multi-user.target
Reload systemd manager:
sudo systemctl daemon-reload
Start the Prometheus service:
sudo systemctl enable prometheus
sudo systemctl start prometheus
Verify the Prometheus service:
sudo systemctl status prometheus
prometheus.service - Prometheus
Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2024-03-05 13:53:51 UTC; 2h 27min ago
Main PID: 9470 (prometheus)
Tasks: 8 (limit: 9498)
Verify the Prometheus setup from the browser:
Install the pre-required packages:
sudo apt install -y apt-transport-https software-properties-common
Add the Grafana GPG key:
sudo mkdir -p /etc/apt/keyrings/
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
Add Grafana’s APT repository:
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt update
Install the Grafana package:
sudo apt install grafana
Start the Grafana service:
sudo systemctl start grafana-server
sudo systemctl enable grafana-server
Verify the Grafana service:
sudo systemctl status grafana-server
● grafana-server.service - Grafana instance
Loaded: loaded (/lib/systemd/system/grafana-server.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2024-03-12 05:45:23 UTC; 1h 29min ago
Docs: http://docs.grafana.org
Main PID: 719 (grafana)
Verify your Grafana setup from the browser:
Note: if you see the login screen, log in by specifying the following credentials.
Username: admin
Password: admin
Add new data source for Prometheus:
Configure the data source, click Save, and test.
Find a PostgreSQL Grafana dashboard online and copy the URL of the dashboard that suits your preferences. Paste the URL and select Load.
Fill out the required fields. Select Prometheus from the data source and click Import.
Your dashboard is ready!
You can see the following properties straight away: average CPU usage, average memory usage, and a couple of important postgresql.conf properties:
Shared buffers
Effective cache
Maintenance work memory
Work memory, etc.
If you scroll down, you can see a more detailed version of database stats, including active/idle sessions, cache hit ratio, buffers, etc.
If you want to save yourself all of this work when it comes to database monitoring, Timescale (built on PostgreSQL) has a metrics dashboard that significantly improves the monitoring capabilities of your database services by providing detailed, service-level insights such as CPU, memory, and query-level statistics.
It offers support for various time ranges to view metrics, including the last hour, 24 hours, seven days, and 30 days, each with specific granularity. Furthermore, you can continuously monitor the health and resource consumption of your database services, enabling proactive management and optimization of database performance.
This feature facilitates the identification of trends, diagnosis of issues, and optimization of configurations to maintain optimal service health. It also includes pg_stat_statements by default.
However, we believe that pg_stat_statements has some limitations. So, to overcome them and provide a more in-depth query monitoring experience, you can utilize Insights, a tool we developed last year. Insights empowers you to closely examine the queries executed in your database within a specific timeframe, providing valuable statistics on timing/latency, memory usage, and more.
Upon identifying specific queries for further investigation, you can access additional information through a drill-down view to better understand the situation, such as whether your database performance has been deteriorating or improving (e.g., due to adding a new index).
Insights offers an unmatched level of granularity in database observability, complementing and enhancing the monitoring and optimization capabilities provided by pg_stat_statements
for PostgreSQL. This combination enables a comprehensive approach to query optimization and performance monitoring within the Postgres environment.
“Another amazing feature of your cloud service compared to running it yourself is the Insights section [in the Timescale UI], which gives you insights into what queries are impacting performance. This is a super valuable tool for us because we are currently monitoring at the application level, trying to measure some things there. However, seeing it on the database level, with the actual number of executions and total rows scanned, is super valuable for optimizing our queries. Since we have a super customizable application where customers can misconfigure some things, this tool is crucial to spot those changes and optimize where needed.”
Lukas Klein, CTO at Datacake
Effective database monitoring is crucial for maintaining optimal performance and reliability. By leveraging the right tools and techniques, developers can proactively identify and address issues, ensuring uninterrupted operations and maximizing the value of their database investments.
We walked you through five different ways of monitoring your PostgreSQL database. For easy query monitoring, try Timescale’s Insights: not only will you quickly figure out which queries are slowing down your performance, but you will also be able to further accelerate them with Timescale’s automatic partitioning, columnar compression, and automatically updated materialized views. Try Timescale today—create a free account.