Continuous aggregates on top of another continuous aggregate without losing data

kimdre · August 11, 2024, 3:56pm

Hi,

I’m currently trying to aggregate data from one continuous aggregate to another.
The “parent” continuous aggregate has a relatively short retention time, which I noticed, also affects the “child” continuous aggregate. So when the retention policy cleans up old data in the “parent”, the aggregated data in the “child” (which has a much longer retention time) is also gone. Is there a way to prevent this behaviour?

These are the SQL statements I used to creating them:

-- First/"Parent" continuous aggregate

CREATE MATERIALIZED view event_graph_10mins
    WITH (timescaledb.continuous) AS
    SELECT
        time_bucket('10 minutes', timestamp) AS mins,
        COUNT(entity_id) AS event_count,
        type_id,
        studio_id
    FROM events
    GROUP BY mins, type_id, studio_id
WITH NO DATA;

SELECT add_continuous_aggregate_policy('event_graph_10mins',
    start_offset => INTERVAL '1 week',
    end_offset => INTERVAL '11 minutes',
    schedule_interval => INTERVAL '10 minutes'
);

SELECT add_retention_policy('event_graph_10mins', INTERVAL '2 weeks');

-- Second/"Child" continuous aggregate

CREATE MATERIALIZED view event_graph_hourly
WITH (timescaledb.continuous) AS
    SELECT
        time_bucket('1 hour', mins) AS hour,
        SUM(event_count) AS event_count,
        type_id,
        studio_id
    FROM event_graph_10mins
    GROUP BY hour, type_id, studio_id
WITH NO DATA;

SELECT add_continuous_aggregate_policy('event_graph_hourly',
    start_offset => INTERVAL '1 month',
    end_offset => INTERVAL '2 hours',
    schedule_interval => INTERVAL '1 hour'
);
 
SELECT add_retention_policy('event_graph_hourly', INTERVAL '1 year');

mike · August 12, 2024, 3:21am

Yes, you should use the settings of your continuous aggregate’s refresh policy to manage this. Basically, you should only refresh data (start_offset) that’s “newer” than the data retention policy’s drop_after.

Timescale Documentation | Refresh continuous aggregates

If you want to keep data in the continuous aggregate even if it is removed from the underlying hypertable, you can set the start_offset to match the data retention policy on the source hypertable. For example, if you have a retention policy that removes data older than one month, set start_offset to one month or less. This sets your policy so that it does not refresh the dropped data.

Also, see this whole discussion in the docs about: