Hi, I’m not very familiar with PostgreSQL, but I want to know why the query for continuous aggregation in my deployment is very slow.
My TimescaleDB deployment is as follows:
- I have 1 AN and 3 DNs, they all have 8 CPU cores, and 32G memory.
- I have a hypertable like this:
CREATE TABLE sample
(
smpl_time TIMESTAMPTZ NOT NULL,
nanosecs INTEGER NOT NULL,
channel_id BIGINT NOT NULL,
severity_id SMALLINT NOT NULL,
status_id SMALLINT NOT NULL,
float_val DOUBLE PRECISION NULL,
);
SELECT create_distributed_hypertable('sample', 'smpl_time', 'channel_id',
data_nodes => '{ "dn1", "dn2", "dn3"}');
SELECT set_chunk_time_interval('sample', INTERVAL '2 hours');
SELECT set_replication_factor('sample', 2);
ALTER TABLE sample SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'channel_id'
);
SELECT add_compression_policy('sample', INTERVAL '2 hours');
- I want to configure continuous aggregation strategies with three granularities: 1 minute, 1 hour, and 1 day. Regarding continuous aggregation, I set it up like this:
CREATE MATERIALIZED VIEW sample_summary_minute
WITH (timescaledb.continuous, timescaledb.create_group_indexes=true) AS
SELECT channel_id,
time_bucket(INTERVAL '1 minute', smpl_time) AS bucket,
COUNT(1) AS total_nums,
AVG(float_val) AS avg_value,
MAX(float_val) AS max_value,
MIN(float_val) AS min_value
FROM sample
GROUP BY channel_id, bucket;
-- --
SELECT add_continuous_aggregate_policy('sample_summary_minute',
start_offset => INTERVAL '1 hours',
end_offset => INTERVAL '10 minutes',
schedule_interval => INTERVAL '10 minutes');
-- --
ALTER MATERIALIZED VIEW sample_summary_minute set (timescaledb.compress = true);
-- --
-- --
CREATE MATERIALIZED VIEW sample_summary_hour
WITH (timescaledb.continuous, timescaledb.create_group_indexes=true) AS
SELECT channel_id,
time_bucket(INTERVAL '1 hour', smpl_time) AS bucket,
COUNT(1) AS total_nums,
AVG(float_val) AS avg_value,
MAX(float_val) AS max_value,
MIN(float_val) AS min_value
FROM sample
GROUP BY channel_id, bucket;
-- --
SELECT add_continuous_aggregate_policy('sample_summary_hour',
start_offset => INTERVAL '24 hours',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour');
-- --
ALTER MATERIALIZED VIEW sample_summary_hour set (timescaledb.compress = true);
-- --
-- --
CREATE MATERIALIZED VIEW sample_summary_day
WITH (timescaledb.continuous, timescaledb.create_group_indexes=true) AS
SELECT channel_id,
time_bucket(INTERVAL '1 day', smpl_time) AS bucket,
COUNT(1) AS total_nums,
AVG(float_val) AS avg_value,
MAX(float_val) AS max_value,
MIN(float_val) AS min_value
FROM sample
GROUP BY channel_id, bucket;
-- --
SELECT add_continuous_aggregate_policy('sample_summary_day',
start_offset => INTERVAL '3 day',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 day');
-- --
ALTER MATERIALIZED VIEW sample_summary_day set (timescaledb.compress = true);
My data volume is about 18000 rows per second, written every 30 seconds.However, querying three consecutively aggregated tables is very slow, even if the query data is only a few rows.
Here’s the query plan for one of the queries:
explain(analyze, buffers) select * from sample_summary_day where channel_id=110 and bucket >= '2023-07-28 20:00:00.000000+08' and bucket <= '2023-08-01 23:00:00.000000+08';
The execution results are as follows:
GroupAggregate (cost=79510824.54..79511135.29 rows=1636 width=48) (actual time=1520.780..1966.610 rows=3 loops=1)
Group Key: sample.channel_id, (time_bucket('1 day'::interval, sample.smpl_time))
Buffers: temp read=8336 written=8353
-> Sort (cost=79510824.54..79510865.43 rows=16355 width=24) (actual time=1407.248..1638.489 rows=2000644 loops=1)
Sort Key: (time_bucket('1 day'::interval, sample.smpl_time))
Sort Method: external merge Disk: 66688kB
Buffers: temp read=8336 written=8353
-> Result (cost=10000.00..79509679.90 rows=16355 width=24) (actual time=52.508..1004.774 rows=2000644 loops=1)
-> Custom Scan (DataNodeScan) on sample (cost=10000.00..79509475.46 rows=16355 width=24) (actual time=52.505..740.500 rows=2000644 loops=1)
Filter: (smpl_time >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(123)), '-infinity'::timestamp with time zone))
Planning:
Buffers: shared hit=1678
Planning Time: 15.020 ms
Execution Time: 1981.185 ms
I would like to know where am I going wrong and how can I modify it to improve performance.