Category: All posts
Nov 06, 2023
Posted by
Brian Muckian
Are you looking for an easy way to increase the number of client connections you can establish to your Timescale database? Would you like to reduce the compute (CPU/RAM) overhead of managing (starting-stopping-maintaining) all of those connections?
If you’re nodding your head as you read these questions, you may want to consider enabling connection pooling via PgBouncer for your service—or perhaps you’re using it already. As a support engineer at Timescale, I assist our customers daily with their database configuration. This includes connection pooling, specifically PgBouncer, which is the base of our own connection pooler. So I’ve learned one thing or two about it!
In this blog post, I’ll lay out some tips on implementing pgBouncer while avoiding some common mistakes we’ve seen among our customers.
At Timescale, we built our implementation of connection pooling on PgBouncer, a self-described “lightweight connection pooler for PostgreSQL,” since it is a widely used tool by many in our community and performs its function well.
To start using PgBouncer in Timescale, you first need to enable the feature in the web console. Once you’ve done this, you'll be provided an additional service URI with a specified port (for PgBouncer) and will be able to connect to either the "session pool" (named "tsdb") or the "transaction pool" (called "tsdb_transaction"). You will also still be able to make direct connections (via the original service URI and port) to your database server when or if you need to.
PgBouncer maintains up to a specified number of dedicated "server" connections to PostgreSQL for each pool, as a fluctuating number of clients comes and goes, gets a connection from the pool, runs its queries, and gathers results.
When the client disconnects from the connection pool, instead of terminating and deallocating the server connection (which can be very expensive in terms of compute utilization when it happens frequently), it is returned to the pool to be used by another client.
On a sufficiently busy system, you can see a substantial benefit of PgBouncer performing this work, along with helping to manage traffic to avoid problems on your database server and free up its resources for query and storage operations.
You'll likely see the greatest improvement (reduction) in resource utilization if you are able to take full advantage of the transaction pool (tsdb_transaction). In transaction mode, server connections are added back to the pool after each transaction is completed or rolled back. Almost immediately made available to another client that may be waiting, this allows you to serve a very large number of clients with a relatively small number of dedicated server connections. 💪
Now, let’s get into the advice. PgBouncer is an awesome tool that will give you so many benefits, but like all powerful things, it must be used with great responsibility. There are some common implementation mistakes that we see people making often—you can avoid them by following these tips:
First things first: when setting up a connection pool, it is highly recommended to monitor behavior.
To do so, you can access the PgBouncer administrative console and just specify "pgbouncer" as the pool (or database) name, like this:
psql postgres://tsdbadmin@HOST:PORT/pgbouncer?sslmode=require
Then, you can run many of the (read-only) SHOW
commands:
pgbouncer=# show help;
NOTICE: Console usage
DETAIL:
SHOW HELP|CONFIG|DATABASES|POOLS|CLIENTS|SERVERS|USERS|VERSION
SHOW PEERS|PEER_POOLS
SHOW FDS|SOCKETS|ACTIVE_SOCKETS|LISTS|MEM|STATE
SHOW DNS_HOSTS|DNS_ZONES
SHOW STATS|STATS_TOTALS|STATS_AVERAGES|TOTALS
It can be helpful to run some of these commands interactively to get a better understanding of how the pools behave under current conditions. For example, you can see how many clients and servers are active, waiting, or being canceled with the SHOW POOLS
command:
pgbouncer> show pools;
database | user | cl_active | cl_waiting |
------------------+-----------+-----------+------------+
pgbouncer | pgbouncer | 1 | 0 |
postgres | postgres | 0 | 0 |
tsdb | tsdbadmin | 2 | 0 |
tsdb_transaction| tsdbadmin | 11 | 0 |
Finally, here is a simple bash script you can use to periodically (default every 90 seconds) collect CSV results of those SHOW
commands up to a specified (default 20 minutes) time. This is a rudimentary—but effective—tool to observe some of those statistics change over a period of time.
#!/usr/bin/env bash
export PGBOUNCER_URL="${PGBOUNCER_URL:-empty}"
export EXEC_INTERVAL="${EXEC_INTERVAL:-90}"
export END_AFTER_MIN="${END_AFTER_MIN:-20}"
echo "gathering pgbouncer metrics"
echo "pgbouncer url: $PGBOUNCER_URL"
echo "run every $EXEC_INTERVAL seconds"
echo "end after $END_AFTER_MIN minutes"
declare -a arr=(databases pools clients servers users sockets active_sockets lists dns_hosts dns_zones stats stats_totals stats_averages stats_totals)
if [ $PGBOUNCER_URL != "empty" ]; then
psql --csv $PGBOUNCER_URL -c "show config;" | sed "s/$/,dt`date +%Y%m%d%H%M%S`/" > ./pb_config.csv
for key in "${arr[@]}"; do
psql --csv $PGBOUNCER_URL -c "show ${key}" | sed "s/$/,at_date_time/" | head -n1 > ./pb_${key}.csv
done
touch .get_pb_metrics;
while [ -f ".get_pb_metrics" ]; do
for key in "${arr[@]}"; do
psql --csv -t $PGBOUNCER_URL -c "show ${key}" | sed "s/$/,`date +%Y%m%d%H%M%S`/" >> ./pb_${key}.csv
done
sleep $EXEC_INTERVAL
find .get_pb_metrics -mmin +$END_AFTER_MIN -type f -exec rm -fv {} \;
done
tar czvf ./pb_metrics.tar.gz ./pb*.csv
rm ./pb*.csv
else
echo "you must set PGBOUNCER_URL"
fi
You can save that to a file (make it executable), set a proper PGBOUNCER_URL
environment variable, run it, and watch the tail of your favorite CSV. When it's done, collect them into tables for query if you like.
I know, I know, this all sounds awesome (because it is). But before you change all of your connection strings, keep reading for more best practices to ensure you get the absolute most out of this feature.
One of the mistakes we often see people making with PgBouncer is using session-based features (such as prepared statements, temporary tables, and SET commands) that can fail or produce unexpected results for customers using the transaction pool.
The PgBouncer FAQ briefly discusses this and then links to how to disable this feature in JDBC and PHP/PDO. Tip: You may have to configure your framework/library to prevent the use of prepared statements.
You will find that some tools, frameworks, and some "thick" client software may expect or require a connection to the database in "session" mode. For example, if you point pg_activity to your transaction pool, it may run for a short while, but you'll probably see errors similar to this:
prepared statement "_pg3_9" does not exist
Similarly, pgcli seems to work fine for most things, but if you use the \watch
command, you may see something such as:
prepared statement "_pg3_0" already exists
For another example to demonstrate why you should avoid session-based features while using the transaction pool, connect psql and do this:
home:/> psql postgres://tsdbadmin@HOST:PORT/tsdb_transaction?sslmode=require
tsdb_transaction> show search_path;
+-----------------+
| search_path |
|-----------------|
| "$user", public |
+-----------------+
tsdb_transaction> set search_path = "$user", custom, public;
tsdb_transaction> show search_path;
+-------------------------+
| search_path |
|-------------------------|
| "$user", custom, public |
+-------------------------+
tsdb_transaction> select * from temp_table;
relation "temp_table" does not exist
LINE 1: select * from temp_table
^
tsdb_transaction> create temporary table temp_table as select 'foo' as bar;
tsdb_transaction> select * from temp_table;
+-----+
| bar |
|-----|
| foo |
+-----+
tsdb_transaction> \q
Goodbye!
home:/> psql postgres://tsdbadmin@HOST:PORT/tsdb_transaction?sslmode=require
...
tsdb_transaction> select * from temp_table;
+-----+
| bar |
|-----|
| foo |
+-----+
tsdb_transaction> show search_path;
+-------------------------+
| search_path |
|-------------------------|
| "$user", custom, public |
+-------------------------+
Notice how, in this example, after disconnecting and reconnecting (terminating the "client" and creating a new "client" connection) to the transaction pool, the previously created temporary table and setting are visible to this connection (which obviously retrieved the same "server" connection which had been added back to the pool).
Our experience suggests that most users will get the greatest benefit from using connection pooling when you can take greater advantage of the transaction mode pool. If your use case does not require session-based features, and if most of your transactions are very short, we recommend using the transaction pool.
Large volumes of short-lived clients, things like event systems, IoT and sensor networks, and microservices architectures can all benefit from transaction pools. Meanwhile, for those “thick clients” (like your long-running sessions in your robust BI Suite), these connections may better be served by the session pool.
Our general guidelines would be: If you do not need and can avoid using any session-based features, use the transaction pool. If you do need those features, use the session pool. These two pools should meet most of your needs, but if you do encounter something that prevents you from using either pool available, you can still directly connect to your service.
As always, we’re happy to discuss this further, but we hope these simple tips will save you some precious time.
With connection pooling via PpgBouncer, we aim to provide a simple way for Timescale users to increase the number of database operations, manage system resources more efficiently, and improve database reliability.
We expect to keep learning more about the optimal implementation of pgBouncer as we move forward. We’ll make sure to share them with the community—in the meantime, please let us know in our community Slack, Forum, or reach out for support if you have any feedback or run into any obstacles at all. Also, if you have further advice on how to avoid pgBouncer pitfalls, we’d love to hear it!
Excelsior!