Hi! Faced with situation when CAgg calculate aggregates long time while it have to calc fast. Assume user changed data at two days only: 2022-06-05, 2022-09-05. It equlas to two 1day-buckets to refresh. But in fact it will be written into “_timescaledb_catalog”.continuous_aggs_hypertable_invalidation_log by a single entry and period for 92 days to recalculate instead of two records, one day for each basket.
Script to reproduce:
CREATE TABLE public.my_hyper(
id NUMERIC,
ts TIMESTAMP,
val1 NUMERIC
);
CREATE INDEX my_hyper_idx ON public.my_hyper USING BTREE(id, ts);
SELECT create_hypertable('public.my_hyper', 'ts', chunk_time_interval => INTERVAL '1 day');
INSERT INTO public.my_hyper
SELECT DISTINCT ROUND(random() * 1000), a1.col1, RANDOM() * 10 FROM generate_series('2022-06-01'::TIMESTAMP, '2022-09-30', INTERVAL '12 hours') AS a1(col1);
CREATE MATERIALIZED VIEW my_hyper_cagg
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 day', ts) AS bucket,
SUM(val1) AS val1
FROM
public.my_hyper
GROUP BY 1;
INSERT INTO public.my_hyper VALUES
(2000, TO_TIMESTAMP('2022-06-05', 'yyyy-mm-dd'), RANDOM() * 10),
(2001, TO_TIMESTAMP('2022-09-05', 'yyyy-mm-dd'), RANDOM() * 10);
SELECT
hypertable_id,
"_timescaledb_internal".to_timestamp(lowest_modified_value),
"_timescaledb_internal".to_timestamp(greatest_modified_value),
"_timescaledb_internal".to_timestamp(greatest_modified_value) - _timescaledb_internal.to_timestamp(lowest_modified_value)
FROM "_timescaledb_catalog".continuous_aggs_hypertable_invalidation_log
WHERE hypertable_id = (SELECT raw_hypertable_id FROM "_timescaledb_catalog".continuous_agg WHERE user_view_name = 'my_hyper_cagg')
ORDER BY lowest_modified_value;
TimescaleDB version affected
2.8.1
PostgreSQL version used
PostgreSQL 12.8
PostgreSQL 12.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit