Slow query on nested CAgg

t-matz · May 23, 2023, 4:36am

Hi,

I have a hypertable with 1 second interval time-series data and I would like to create continuous aggregate on it. It works fine if the CAgg is created directly from the table. However when I create a CAgg from another CAgg which was created from the hypertable, a query for the new created CAgg is very slow.

Is this by design? or Am I missing something?

Environment

I’m using timescale docker image ‘timescale/timescaledb:2.10.3-pg15’

How to reproduce

Create Hypertable

CREATE TABLE my_hypertable (
  time        TIMESTAMPTZ NOT NULL,
  data1        INTEGER,
  data2        INTEGER,
  category     INTEGER
);

SELECT create_hypertable('my_hypertable', 'time');

Insert random data into it

INSERT INTO my_hypertable (time, data1, data2, category)
SELECT generate_series(NOW(), NOW() + INTERVAL '1 day', '1 second'), random(), random(), random();

Create CAgg on my_hypertable

CREATE MATERIALIZED VIEW IF NOT EXISTS ca1
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('1 second', time) as t,
  sum(data1) * 8 AS total1,
  sum(data2) * 8 AS total2,
  category
FROM my_hypertable
GROUP BY t, category
WITH NO DATA;

Create one more CAgg on the CAgg created step #3

CREATE MATERIALIZED VIEW IF NOT EXISTS ca2
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('30 minute', t) as t2,
  max(total1) as max1,
  max(total2) as max2
FROM ca1
GROUP BY  t2
WITH NO DATA;

Call SELECT query to

explain analyze select * from ca2;

Output

The query took 105398.398 ms

# explain analyze select * from ca2;
                                                                                                                                                                                 QUERY PLAN                                                                                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=5494.03..5496.53 rows=200 width=24) (actual time=105385.730..105385.740 rows=49 loops=1)
   Group Key: time_bucket('00:30:00'::interval, "*SELECT* 2".t)
   Batches: 1  Memory Usage: 40kB
   ->  Result  (cost=4126.03..5278.03 rows=28800 width=24) (actual time=105334.260..105371.736 rows=86401 loops=1)
         ->  Subquery Scan on "*SELECT* 2"  (cost=4126.03..4918.03 rows=28800 width=24) (actual time=105334.259..105361.643 rows=86401 loops=1)
               ->  HashAggregate  (cost=4126.03..4630.03 rows=28800 width=28) (actual time=105334.257..105355.015 rows=86401 loops=1)
                     Group Key: time_bucket('00:00:01'::interval, my_hypertable."time"), my_hypertable.category
                     Batches: 1  Memory Usage: 11281kB
                     ->  Result  (cost=0.00..3838.03 rows=28800 width=20) (actual time=1.534..105124.739 rows=86401 loops=1)
                           ->  Custom Scan (ChunkAppend) on my_hypertable  (cost=0.00..3478.03 rows=28800 width=20) (actual time=1.532..105040.150 rows=86401 loops=1)
                                 Chunks excluded during startup: 0
                                 ->  Seq Scan on _hyper_10_1_chunk  (cost=0.00..3478.03 rows=28800 width=20) (actual time=1.531..104999.193 rows=86401 loops=1)
                                       Filter: (("time" >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(15)), '-infinity'::timestamp with time zone)) AND (time_bucket('00:00:01'::interval, "time") >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(16)), '-infinity'::timestamp with time zone)))
 Planning Time: 7.851 ms
 Execution Time: 105389.417 ms
(15 rows)

Time: 105398.398 ms (01:45.398)

FYI

Query against ca1 with same query is fast.

postgres=# explain analyze SELECT
  time_bucket('30 minute', t) as t2,
  max(total1) as max1,
  max(total2) as max2
FROM ca1
GROUP BY  t2;
                                                                                       QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=4417.44..4419.94 rows=200 width=24) (actual time=114.685..114.694 rows=49 loops=1)
   Group Key: time_bucket('00:30:00'::interval, "*SELECT* 2".t)
   Batches: 1  Memory Usage: 40kB
   ->  Result  (cost=4007.04..4352.64 rows=8640 width=24) (actual time=66.558..100.683 rows=86401 loops=1)
         ->  Subquery Scan on "*SELECT* 2"  (cost=4007.04..4244.64 rows=8640 width=24) (actual time=66.557..90.462 rows=86401 loops=1)
               ->  HashAggregate  (cost=4007.04..4158.24 rows=8640 width=28) (actual time=66.556..83.503 rows=86401 loops=1)
                     Group Key: time_bucket('00:00:01'::interval, my_hypertable."time"), my_hypertable.category
                     Batches: 1  Memory Usage: 11281kB
                     ->  Result  (cost=0.00..3143.03 rows=86401 width=20) (actual time=0.011..29.196 rows=86401 loops=1)
                           ->  Custom Scan (ChunkAppend) on my_hypertable  (cost=0.00..2063.02 rows=86401 width=20) (actual time=0.010..17.620 rows=86401 loops=1)
                                 Chunks excluded during startup: 0
                                 ->  Seq Scan on _hyper_1_1_chunk  (cost=0.00..2063.02 rows=86401 width=20) (actual time=0.010..12.489 rows=86401 loops=1)
                                       Filter: ("time" >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(2)), '-infinity'::timestamp with time zone))
 Planning Time: 1.506 ms
 Execution Time: 116.609 ms
(15 rows)

Time: 119.064 ms