Continuous aggregates vs materialized views

carlota · March 3, 2022, 7:51pm

PostgreSQL materialized views and continuous aggregates may seem very similar, and indeed, they are - but there are also important differences between them.

Plan materialized views allow you to make the results of an aggregate query persist in table form. When a materialized view is referenced in a query, the data is returned directly from the materialized view, making it faster to access the data.

But materialized views have a design limitation (at least in the moment of writing these lines): every time a user refreshes the materialized view to include the new data added since the last refresh, all the data is read again to compute the aggregate, even if only a small portion of the dataset is changed. Plus, with materialized views you may not see up-to-date results if you query them between refreshes. If you’ve set up a policy that refreshes the materialized view every X period of time, for example, when querying the view you will be missing the data that came in after the latest refresh, unless you also refresh the view manually.

We originally introduced continuous aggregates to improve materialized views. On one hand, continuous aggregatses are incrementally updated with a built-in refresh policy, so they can stay up-to-date as new data is added. When querying a continuous aggregate, the query engine will combine the data that is already pre-computed in the materialized view with the newest raw data in the underlying hypertable. In other words, when querying a continuous aggregate, you will always see up-to-date results. (We call this property of continuous aggregates “real-time aggregation”.)

Another important difference between continuous aggregates and materialized views is that TimescaleDB is able to refresh specific portions of a continuous aggregate without affecting the rest of the continuous aggregate. TimescaleDB uses internal invalidation records to determine which data has been changed or added in the underlying hypertable since the last refresh, and when you refresh your continuous aggregate, only the new or updated data is computed.

This implies two things:

TimescaleDB doesn’t need to look at the whole table every time a continuous aggregate gets refreshed. This saves tons of computation resources, speeding up the incremental maintenance of continuous aggregate, and allowing you to get faster results in your aggregate queries.
You can effectively use continuous aggregates for downsampling. In TimescaleDB, you are able to retain the continuous aggregate even after the original raw data has been dropped. When doing a refresh, you won’t lose the portions of the continuous aggregate that correspond to data that has been dropped from the underlying hypertable, as it would be the case with plain materialized views. To reduce the granularity of your time series data, you can simply define a continuous aggregate and delete your original data once it gets old, significantly reducing your storage costs while still keeping the ability to do analytics on your data.

If you want more info on how to use continuous aggregates for downsampling,

Check out this blogpost (you can jump in to the “How to use continuous aggregates with compression for downsampling” section)
Check out this demo!