OK, thanks for the follow-up.
The first two things I’ll mention should provide some pretty quick improvement, although I realize the first point (upgrading TimescaleDB) will take a little effort.
Upgrade to TimescaleDB 2.7
If you had a chance to read both blog posts above, you’ll remember that prior to TimescaleDB 2.7, the materialized results are not stored in their “finalized” form. Therefore, whenever you query data with something like SELECT * FROM [cagg]...
without a time predicate, the underlying view query must be run in its entirety to generate the finalized data before it can LIMIT the result. TimescaleDB 2.7 changed that architecture and results in much better performance and works the way that most users expect (even with most SELECT * FROM...
queries).
One caveat: to gain the advantage of the new finalized materialized data, you will need to create a new CAGG first. In your case, you can upgrade to 2.7+, drop the test table and CAGG, and then re-run your script.
Update the refresh policy
By default, TimescaleDB continuous aggregates are real-time by default. This means that whenever you query a continuous aggregate, it grabs all of the data from the materialized view that matches the query and then appends (with a UNION
) any data that has been INSERTED into the hypertable after the last materialized bucket.
Your example refresh policy currently looks like this in the script:
SELECT
add_continuous_aggregate_policy ('test_cagg_1min', start_offset := INTERVAL '1 day', end_offset := INTERVAL '6 hours', schedule_interval := INTERVAL '6 hours');
This policy will run every 6 hours and only materialize/update data in the CAGG from between 1 day ago and 6 hours ago (~18 hours of time). Therefore, the most recent 6 hours of time will never have any data materialized, which will require TimescaleDB to UNION
the CAGG data and a real-time query over the last six hours and then sort all fo the data before it can determine the five rows to return. With a GROUP BY
of ~20 columns, that’s just going to be a slower process.
Without knowing what you’re trying to test, my suggestion would be to materialize the data more frequently given your small bucket size. You’ll have to test the settings a little bit to find the optimal values for this policy. The example below tries to increase the refresh rate so that the CAGG is only 10 minutes behind the hypertable, while still allowing data that might get inserted/updated over the last 24 hours to be updated as necessary. If you don’t have much late arriving data (ie. an append-only application), then there’s often no reason to calculate the CAGG over the last 24 hours every time the process runs.
SELECT
add_continuous_aggregate_policy ('test_cagg_1min', start_offset := INTERVAL '1 day', end_offset := INTERVAL '10 minutes', schedule_interval := INTERVAL '10 minutes');
Disable real-time aggregation
Totally dependent on your application needs, you don’t have to use continuous aggregates in real-time mode. If you keep the materialized data up-to-date within a few minutes, turning off real-time aggregation will prevent TimescaleDB from doing the UNION
and simplifies the queries in most cases.
Again, this is very dependent on your application needs and not an indication that something is broken with real-time aggregation. The materialized data will continue to be updated with the refresh policy, it just won’t append the most recent set of raw data (based on your start/end offset intervals).
To turn real-time aggregates on/off, just modify the materialized_only
setting in the following SQL command.
ALTER MATERIALIZED VIEW test_cagg_1min SET (materialized_only=true);