Jul 09, 2024
Posted by
James Blackwood-Sewell
Since we launched Timescale, our cloud-hosted PostgreSQL service for time-series data and event and analytics workloads, we have seen large numbers of customers migrating onto it from the general-purpose Amazon RDS for PostgreSQL. These developers usually struggle with performance issues on ingest, sluggish real-time or historical queries, and spiraling storage costs.
They need a solution that will let them keep using PostgreSQL while not blocking them from getting value out of their time-series data. Timescale fits them perfectly, and this article will present benchmarks that help explain why.
When we talk to these customers, we often see a pattern:
Does it sound familiar? It’s usually at this stage when developers realize that Amazon RDS for PostgreSQL is no longer a good choice for their applications, start seeking alternatives, and come across Timescale.
Timescale runs on AWS, offering hosted PostgreSQL with added time-series superpowers. Since Timescale is still PostgreSQL and already in AWS, the transition from RDS is swift: Timescale integrates with your PostgreSQL-based application directly and plays nicely with your AWS infrastructure.
Timescale has always strived to enhance PostgreSQL with the ingestion, query performance, and cost-efficiency boosts that developers need to run their data-intensive applications, all while providing a seamless developer experience with advanced features to ease working with time-series data.
But don’t take our word for it—let the numbers speak for themselves. In this blog post, we share a benchmark comparing the performance of Timescale to Amazon RDS for PostgreSQL. You will find all the details of our comparison and all the information required to run the benchmark yourself using the Time-Series Benchmarking Suite (TSBS).
For those who can’t wait, here’s a summary: for a 160 GB dataset with almost 1 billion rows stored on a 1 TB volume, Timescale outperforms Amazon RDS for PostgreSQL with up to 44 % higher ingest rates, queries running up to 350x faster, and a 95 % smaller data footprint.
When we ingested data in both Timescale and Amazon RDS for PostgreSQL (using gp3 EBS volumes for both), Timescale was 34 % faster than RDS for 4 vCPU and 44 % for 8 vCPU configurations.
When we ran a variety of time-based queries on both databases, ranging from simple aggregates to more complex rollups through to last-point queries, Timescale consistently outperformed Amazon RDS for PostgreSQL in every query category, sometimes by as much as 350x (you can see all of the results in the Benchmarking section).
Timescale used 95 % less disk than Amazon RDS for PostgreSQL, thanks to Timescale’s native columnar compression, which reduced the size of the test database from 159 GB to 8.6 GB. Timescale's compression uses best-in-class algorithms, including Gorilla and delta-of-delta, to dramatically reduce the storage footprint.
And the storage savings above don’t even consider the effect of the object store built on Amazon S3 that we just announced for Timescale. This feature is available for testing via private beta at the time of writing but is not yet ready for production use.
Still, by running one SQL command, this novel functionality will allow you to tier an unlimited amount of data to the S3 object storage layer that’s now an integral part of Timescale. This layer is columnar (it’s based on Apache Parquet), elastic (you can increase and reduce your usage), consumption-based (you pay only for what you store), and one order of magnitude cheaper than our EBS storage, with no extra charges for queries or usage. This feature will make scalability even more cost-efficient in Timescale, so stay tuned for some exciting benchmarks!
In the remainder of this post, we’ll deep dive into our performance benchmark comparing Amazon RDS for PostgreSQL with Timescale, detailing our methods and results for comparing ingest rates, query speed, and storage footprint. We’ll also offer insight into why Timescale puts up the numbers it does, with a short introduction to its vital advantages for handling time-series, events, and analytics data.
If you’d like to see how Timescale performs for your workload, sign up for Timescale today— it’s free for 30 days, there’s no credit card required to sign up, and you can spin up your first database in minutes.
As for our previous Timescale benchmarks, we used the open-source Time-series Benchmarking Suite to run our tests. Feel free to download and run it for yourself using the settings below. Suggestions for improvements are also welcome: comment on Twitter or Timescale Slack to join the conversation.
We used the following TSBS configuration across all runs:
Timescale | Amazon RDS for PostgreSQL | |
PostgreSQL version | 14.5 | 14.4 (latest available) |
| No changes | synchronous_commit=off (to match Timescale) |
Partitioning system | TimescaleDB (partitions automatically configured) | pg_partman (partitions manually configured) |
Compression into columnar | Yes, for older partitions | Not supported |
Partition size | 4h (each system ended up with 26 non-default partitions) | |
Scale (number of devices) | 25,000 | |
Ingest workers | 16 | |
Rows ingested | 868,000,000 | |
TSBS profile | DevOps | |
Instance type | M5 series (4 vCPU+16 GB memory and 8 vCPU+32 GB memory) | |
Disk type | gp3 (16 K IOPs, 1000 MiBps throughput) | |
Volume size | 1 TB |
Hypertables are the base abstraction of Timescale's time-series magic. While they work just like regular PostgreSQL tables, they boost performance and the user experience with time-series data by automatically partitioning it (large tables become smaller chunks or data partitions within a table) and allowing it to be queried more efficiently.
If you’re familiar with PostgreSQL, you may be asking questions about partitioning in RDS. In the past, we have benchmarked TimescaleDB against unpartitioned PostgreSQL simply because that’s the journey most of our customers follow. However, we inevitably get questions about not comparing using pg_partman.
Pg_partman is another PostgreSQL extension that provides partition creation but doesn’t seamlessly create partitions on the fly: if someone inserted data outside of the currently created partitions, it would either go into a catch-all partition, degrading performance or, worse, still fail). It also doesn’t provide any additional time-series functionality, planner enhancements, or compression.
We listen to these comments, so we decided to highlight Timescale's performance (and convenience) by enabling pg_partman on the RDS systems in this benchmark. After all, the extension is considered a best practice for partitioned tables in Amazon RDS for PostgreSQL, so it was only fair we’d use it.
On our end, we enabled native compression on Timescale, compressing everything but the most recent chunk data. To do so, we segmented by the tags_id
and ordered by time descending and usage_user
columns. This is something we couldn’t reproduce in RDS since it doesn’t offer any equivalent functionality.
Almost everything else was exactly the same for both databases. We used the same data, indexes, and queries: almost one billion rows of data in which we ran a set of queries 100 times each using 16 threads. The only difference is that the Timescale queries use the time_bucket()
function for arbitrary interval bucketing, whereas the PostgreSQL queries use extract (which performs equally well but is much less flexible).
We have split the performance data extracted from the benchmark into three sections: ingest, query, and storage footprint.
As we started to run Timescale and RDS through our 16-thread ingestion benchmark to insert almost 1 billion rows of data, we began to see some amazing wins. Timescale beat RDS by 32 % with 4 vCPUs and 44 % with 8 vCPUs. Both systems had the same I/O performance configured on their gp3 disk, so we kept looking to get to the bottom of why we were winning on busy systems.
To test the outcome without any disk I/O involvement, we used pgbench to run the following CPU-hungry SQL statement on 8 vCPU machines (using a scale of 1,000 and 16 jobs) and had some more interesting results straight away.
SELECT count(*) FROM (SELECT generate_series(1,10000000)) a
Timescale was almost twice as fast, returning an average query latency of 518 ms, while RDS returned 904 ms. This 50 % difference was consistent on both 4 vCPU and 8 vCPU instances.
Unfortunately, we can’t look inside the black box that is RDS to see what’s happening here. One hypothesis is that a large part of this difference is because Timescale gives you the exact amount of vCPU you provision for PostgreSQL (thanks, Kubernetes!), while Amazon RDS provides you a host with that many vCPUs.
This means that we (Timescale) pay for the operating overhead on Timescale, while on RDS, you (as the user) pay for this. As instances get very busy and processes fight with the operating system for CPU (like for an ingest benchmark or when you’re crunching a lot of data), this becomes a much bigger advantage for Timescale than we had anticipated. As usual, if anybody has any other possible reasons for this difference, please reach out, we’d love to hear from you.
Our benchmark shows Timescale not only ingests data faster across the board but also provides more predictable and faster results under heavy CPU load. Not a bad feature when you want to get the most out of your instances.
Query performance is something that needs to be optimized in a time-series database. When you ask for data, you often need to have it as quickly as possible—especially when you’re powering a real-time dashboard. TSBS has a wide range of queries, each with its own somewhat hard-to-decode description (you can find a quick primer here). We ran each query 100 times on the 4 vCPU instance types (which wasn’t quick in some cases) and recorded the results.
When we look at the table of query runtimes, we can see a clear story. Timescale is consistently faster than Amazon RDS, often by more than 100x. In some cases, Timescale performs over 350x better, and it doesn’t perform worse for any query type. The table below shows the data for 4 vCPU instances, but results are similar across all the CPU types we tested (and of course, if your instance is very busy, you could get even better results).
When we examine the amount of data loaded and processed by some of the queries with the larger differences, the reason behind these improvements becomes clear. Timescale compresses data into a columnar format, which has several impacts on performance:
single-groupby-
query types).And just as a reminder, RDS had pg_partman configured for this test. This shows that while Timescale provides efficient partitioning via hypertables, we also provide a lot more than that (353x more in some instances).
Total storage size is measured at the end of the TSBS ingest cycle, looking at the size of the database which TSBS has been ingesting data into. For this benchmark on Timescale, all but the most recent partition of data is compressed into our native columnar format, which uses best-in-class algorithms, including Gorilla and delta-of-delta, to reduce the storage footprint for the CPU table dramatically.
After compression, you can still access the data as usual, but you get the benefits of it being smaller and the benefits of it being columnar.
Using less storage can mean smaller volumes, lower cost, and faster access (as we saw in the query results above). In the case of this benchmark, we saved 95 %, reducing our database from 159 GB to 8.6 GB. And this isn’t an outlier, we often see these numbers for production workloads at real customers.
Now that we’ve examined the results of the benchmark, let’s briefly explore some of the features that make these results possible. This section aims to offer insight into the performance comparison above and highlight some other aspects of Timescale that will improve your developer experience when working with time-series data.
Timescale is purpose-built to provide features that handle the unique demands of time-series, analytics, and event workloads—and as we’ve seen earlier in this post, performance at scale is one of the most challenging aspects to achieve with a vanilla PostgreSQL solution.
To make PostgreSQL more scalable, we built features like hypertables and added query planner improvements allowing you to seamlessly partition tables into high-performance chunks, ensuring that you can load and query data quickly.
While some other solutions force you to think about creating and maintaining data partitions, Timescale does this for you under the hood, as queries come in with no performance impact. In fact, some of Timescale’s improvements work on tables that don’t even hold time-series data, like SkipScan, which dramatically improves DISTINCT
queries on any PostgreSQL table with a matching B-tree index.
Another problem that comes with time-series data at scale is slow aggregate queries as you analyze or present data. Continuous aggregates let you take an often run or costly time-series query and incrementally materialize it in the background, providing real-time, up-to-date results in seconds or milliseconds rather than minutes or hours.
While this might sound similar to a materialized view, it not only reduces the load on your database but also takes into account the most recent inserts and doesn’t require any management once it’s configured.
Once you have time-series data loaded, Timescale also gives you the tools to work with it, offering over 100 built-in hyperfunctions—custom SQL functions that simplify complex time-series analysis, such as time-weighted averages, last observation carried forward and downsampling with LTTP or ASAP algorithms, and bucketing by hour, minute, month and timezone with time_bucket(), and time_bucket_gapfill().
We also provide a built-in job scheduler, which saves the effort of installing and managing another PostgreSQL extension and lets you schedule and monitor any SQL snippet or database function.
If you’re running your database in production, having direct access to a team of database experts will lift a heavy weight off your shoulders. Timescale gives all customers access to a world-class team of technical support engineers at no extra cost, encouraging discussion on any time-series topic, even if it’s not directly related to Timescale operations. You might want some help with ingest performance, tuning advice for a tricky SQL query, or best practices on setting up your schema—we are here to help.
As a comparison, deeply consultative support, general guidance, and best practices start at over $5,000 per month in Amazon RDS for PostgreSQL. Lower tiers have only a community forum or receive general advice. So this means that you need to pay an extra $60,000 a year just for such support on AWS, while you get this for free on Timescale.
Cost is one of the major factors when choosing any cloud database platform, and Timescale provides multiple ways to keep your spending under control.
Timescale's best-in-class native compression allows you to compress time-series data in place while still retaining the ability to query it as normal. Compressing data in Timescale often results in savings of 90 % or more (take another look at our benchmark results, which actually saw a 95 % storage footprint reduction).
Timescale also includes built-in features to manage data retention, making it easy to implement data lifecycle policies, which remove data you don’t care about quickly, easily, and without impacting your application. You can combine data retention policies with continuous aggregates to automatically downsample your data according to a schedule.
To help reduce costs even further, Timescale offers bottomless, consumption-based object storage built on Amazon S3 (currently in private beta). Providing access to an object storage layer from within the database itself enables you to seamlessly tier data from the database to S3, store an unlimited amount of data, and pay only for what you store. All the while you retain the ability to query data in S3 from within the database via standard SQL.
Last but not least, Timescale is just PostgreSQL under the hood. Timescale supports full SQL (not SQL-like or SQL-ish). You can leverage the full breadth of drivers, connectors, and extensions in the vibrant PostgreSQL ecosystem—if it works with PostgreSQL, it works with Timescale!
If you switch from Amazon RDS for PostgreSQL to Timescale, you won’t lose any compatibility, your application will operate the same as before (but it will probably be faster, as we’ve shown).
When you have time-series data, you need a database that can handle time-series workloads. While Amazon RDS for PostgreSQL provides a great cloud PostgreSQL experience, our benchmarks have shown that even when paired with the pg_partman extension to provide partition management, it can’t compete with Timescale. According to our tests, Timescale can be over 40 % faster to ingest data, up to 350x faster for queries, and takes 95 % less space to store data when compressed.
On top of these findings, we offer a rich collection of time-series features that weren’t used in the benchmark. You can speed queries up even further by incrementally pre-computing responses with continuous aggregates, benefit from our job scheduler, configure retention policies, use analytical hyperfunctions, speed up your non-time-series queries with features like Skip Scan, and so much more.
If you have time-series data, don’t wait until you hit that performance wall to give us a go. Spin up an account now: you can use it for free for 30 days; no credit card required.