Nov 22, 2024
Posted by
Evan Davis
One advantage of being a Timescale customer is that you get direct assistance from the Timescale Team. As a data architect, I help our customers set up daily to take their TimescaleDB instances (which are 100 percent PostgreSQL under the hood) to the next level. In my everyday work, one of the most asked questions is how to optimize PostgreSQL ingest rate.
Why keep all this knowledge to ourselves? In this blog post, I'll give you tips on how to optimize your ingest rate, so you can get more row inserts per second in your PostgreSQL or TimescaleDB database.
It’s worth mentioning that the answer to most of our customers' questions ultimately depends on their use case: in the database world, it’s impossible to offer perfect, one-size-fits-all advice. That’s why there’s nothing like asking for help from the team when needed. Still, going through our general recommendations and best practices will certainly help you have a better understanding of the product you have in hand, helping you optimize your experience and troubleshoot autonomously.
Even if you are not a Timescale customer (yet!), we hope you will find this advice helpful and apply it in your own environment. If you’re curious, you can try Timescale completely for free (you won’t need a credit card to start the trial).
Time-series data workloads are relentless, and optimizing your ingest rate is a necessity. Since we supercharge PostgreSQL to handle time-series data and other demanding workloads effortlessly, almost every Timescale customer is interested in getting as many row inserts per second as possible into their TimescaleDB service.
How can you ensure you’re getting the best write performance? First, take a look at this blog post we published a while ago: 13 Tips to Improve PostgreSQL Insert Performance. Timescale is built on PostgreSQL, so the general best practices for maximizing your inserts into Postgres will also apply to Timescale.
However, we often find that what’s holding our Timescale customers back is “cloud-native best practices.” When dealing with a Timescale service, it’s not only essential to optimize your practices on the database level but also at the cloud infrastructure level.
In our experience, this is the advice that our customers find more useful:
As a general guideline, you should aim for ingesting 50-100k rows per second per ingest process. If you want to write more than 50k-100k rows each second, you will need to set up multiple ingest processes.
The volume of writes you can get will be highly influenced by the number of resources you allocate to this process: this is true from the Timescale side, as well as from your client side.
Especially if you set up multiple processes, make sure that you have enough CPU to cover them from the client side. For example, if you’re writing to Timescale from your AWS EC2 instance, ensure that your instance size is large enough to cover your expectations. If only one ingest process is needed, then maybe a 1-CPU EC2 instance might be adequate. However, if using multiple ingest processes, adjust the CPU count to at least the number of ingest processes (e.g., four ingest processes equals four CPU instances).
Just like on your client side, your Timescale service needs enough resources to handle the writes. We recommend having as many CPUs in the Timescale service as in your client.
If you’re not sure that you have enough resources in your Timescale service, you can use forks to make a copy of your database with more CPU and test if your ingest performance improves. Our customers find it very useful (and also cost-effective).
One of the great things about using a hosted database service is that you don't have to think about your infrastructure too much. But for certain things, it’s still worth remembering that your database is running on a server somewhere. If you want to really maximize the volume of data you’re writing, you will get better results if your data doesn’t have to cross the world to get to your Timescale server.
This (network throughput) is another major component to consider. To achieve optimal ingest rates, we highly recommend using a client located in the same AWS region as your Timescale service. Example: if you are writing massive volumes of data to a cloud service in us-east-1, try using an EC2 instance also located in us-east-1.
Finally, let’s cover a tip that’s also included in the blog post we referenced above but often arises as a problem.
To maximize your ingest rate in PostgreSQL or TimescaleDB, you should insert your data with many rows per INSERT
. When writing your data to TimescaleDB, we strongly recommend using a single INSERT
statement with multiple tuples: don't insert your data row-by-row. Instead, try to write hundreds (or thousands) of rows per INSERT
, following the syntax below.
INSERT INTO mytable (‘2022-06-01 12:00:00’ 1, 1.11, 1.12),
(‘2022-06-01 13:00:00’ 2, 2.21, 2.22)
We hope this post has given you some ideas on what to look for when trying to maximize your inserts per row in your Timescale database. Remember to look up this blog post for more specific tips on PostgreSQL and TimescaleDB.
If you still haven’t tried Timescale, you can use it for free for 30 days. It is the most convenient way to use TimescaleDB in production. You’ll get automatic backups and recovery, high availability via replicas in multiple availability zones, flexible resizing with autoscaling, and one-click database forking for testing and creating developer environments. And, of course, the advice of the Timescale Team. 😎
Stay tuned for the next post of the series!