Hierarchical continuous aggregates hourly and daily with averages

Steeve · May 1, 2024, 12:17pm

Hi, I use timescale for a personal application, and I recently upgraded it from 2.6 to 2.14.
I want to setup the new feature Hierarchical Continuous aggregates coupled with hyperfunctions Toolkit for computing performances metrics.

I want to check if I understood how to use the two step aggregation, here is the two material view I created, one for the average hourly, and the second for daily:

CREATE MATERIALIZED VIEW hourly_perf_per_id
WITH (timescaledb.continuous) AS
    SELECT    
        time_bucket(INTERVAL '1 hour', time)          as time_hourly,
        stats_agg("renderTime"::double precision)  as "renderTime_hourly",
        average(stats_agg("renderTime"::double precision))  as "renderTime_hourly_avg",
        "id"
    FROM "Credits"
GROUP BY time_hourly, "id"
WITH NO DATA;

SELECT add_continuous_aggregate_policy('hourly_perf_per_id',
  start_offset => INTERVAL '3 hours',
  end_offset   => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 hour');

CREATE MATERIALIZED VIEW daily_perf_per_id
WITH (timescaledb.continuous)
AS SELECT
    time_bucket('1 day', time_hourly) as time_daily,
    rollup("renderTime_hourly") as "renderTime_daily",
    average(rollup("renderTime_hourly")) as "renderTime_daily_avg",
    "id"
FROM hourly_perf_per_id
GROUP BY time_daily, "id"
WITH NO DATA;

SELECT add_continuous_aggregate_policy('daily_perf_per_id',
  start_offset => INTERVAL '3 days',
  end_offset   => INTERVAL '1 day',
  schedule_interval => INTERVAL '1 day');

Details:

I wrote the renderTime_hourly column only to compute the average renderTime_daily_avg.
The renderTime is cast as double because the column is a INTEGER

Is it th correct way to compute the average with hierarchical continuous aggregates? Is there a shorter syntax? Because the syntax looks a bit redundant:

stats_agg("renderTime"::double precision)  as "renderTime_hourly",
average(stats_agg("renderTime"::double precision))  as "renderTime_hourly_avg",

Finally, I want to add the average of 12 metrics hourly and daily: should I create two columns for each metrics ? one for the average average(stats_agg()) and another one stats_agg for the daily view ? I would mean I would have 24 columns ?

Timescale is awesome, congrats to the team
Thanks in advance and have a good day!

Steeve · May 14, 2024, 11:59am

Hi @jonatasdp,
I didn’t get answers and I would like to use Timescale for production usage with millions of records per months, and I want to make sure I’m following the best practice. Can you take a look ?

jonatasdp · June 2, 2024, 6:35pm

Hey Steve, the only part that I see improvement is split the final call of average or other functions and just materialize the rollup or the stats_agg instead of going to a full aggregation directly in the materialized view. The idea of the functions suffixed with _agg is build an intermediate component that you can extract several information. If you persist it in the view that calculus cannot be reused to a bigger frame using rolling or rollup functions. It will need to rebuild the processing data from scratch.

Then, the average call over the stats_agg component can be on a simple view not materialized and you can get other info like stddev and so on.