Nov 25, 2024
Posted by
Nenad Malic
Continuous aggregates (or “caggs,” as we like to call them) were developed by Timescale to solve one of the biggest challenges application developers and data engineers faced: aggregating massive amounts of data efficiently without always having to query billions (or trillions) of raw data rows.
Continuous aggregates solve this problem by using PostgreSQL materialized views to continuously and incrementally refresh a query in the background, so that when you run said query, the only data that needs to be computed is the one that changed, instead of the entire dataset.
In the previous version of TimescaleDB, we upgraded continuous aggregates, making them up to 44,000x faster in some queries than in previous versions.
Sounds impressive, right? But it gets even better.
Now, with TimescaleDB 2.8.1, you don’t have to worry about migrating from the old continuous aggregates to the new. Say hello to our frictionless migration, an in-place upgrade that avoids disrupting queries over continuous aggregates in applications and dashboards and every time the data is not in the original hypertable.
Also, we should mention that the migration to the new version of continuous aggregates is not available in TimescaleDB versions before 2.8.1. If you are planning to migrate a continuous aggregate to the new format, make sure to upgrade to TimescaleDB 2.8.1 first.
Now that you know that this is going to be a fuss-free move, here are three reasons why you should upgrade to the new version of continuous aggregates.
Enjoy faster queries for the following common queries:
COUNT(*) FROM
is now up to 15x faster.SELECT COUNT(*)...
based on the value of a column is now up to 1,336x faster.ORDER BY
is now up to 15x to 44,000x faster.time_bucket()
is now up to 160x faster.FILTER
is now 5x faster.HAVING
clause is now 4.8x faster.Fewer data rows mean faster queries. Continuous aggregates increase query speed by decreasing the overall row count by at least 20x. The new continuous aggregates reduce the previously required second grouping with SELECT COUNT(*) FROM…
. That means TimescaleDB can query the data normally, leading to faster queries.
Another performance improvement shows us 1,336x faster queries when counting the number of records where the aggregate value is within a certain range. The new continuous aggregates allow TimescaleDB to search for values directly and add an index to two specific columns for increased query performance.
When querying data within a range of time to get the top rows, performance improved 45,000x. How do continuous aggregates achieve such an impressive boost? By only searching through the materialized data. They also allow hyperfunctions to re-aggregate into higher time buckets.
Continuous aggregates were previously incompatible with common SQL features, such as FILTER
, requiring workarounds that used more rows. Now, the aggregated data can be stored using a FILTER
clause to achieve the results without additional effort or rows. This saves storage, cuts rows by 50 percent in our sample data set, and saves CPU to finalize the COUNT(*)
and then filter results each time.
The new continuous aggregates also let you apply the HAVING
clause at materialization time, only storing rows that meet certain criteria and thus reducing the overall data footprint.
The new ability to use common SQL features provides not only storage savings but more flexibility. These are the aggregate queries now compatible with continuous aggregates: DISTINCT
, FILTER
, FILTER
in the HAVING
clause, aggregates without combine function, ordered-set aggregates, and hypothetical-set aggregates.
New to Timescale? Start a free 30-day trial, no credit card required, and get your TimescaleDB 2.8.1 database journey started in five minutes.
If you are using Timescale, upgrades are automatic, and you’ll be upgraded to TimescaleDB 2.8.1 in your next maintenance window.
If you’re self-hosting TimescaleDB, follow the upgrade instructions in our documentation.