I was running a simple SELECT query
SELECT
time_bucket('5m', time) AS bucket,
AVG(value)
FROM
data_points
WHERE
run_id IN (1884, 1885, 1886, 1887, 1888, 1889, 1890, 1891, 1892)
AND time >= NOW() - INTERVAL '60m'
AND time <= NOW() - INTERVAL '30m'
GROUP BY
bucket
Which took over three minutes just to get several rows. But then when reduce the number of values in the line run_id IN
to under 8 values, the query completes in a split second. Here are the explain analyze results:
Fast:
HashAggregate (cost=354234.98..363680.07 rows=381379 width=44) (actual time=1.249..1.251 rows=0 loops=1)
Group Key: time_bucket('00:05:00'::interval, d.""time""), d.run_id
Planned Partitions: 4 Batches: 1 Memory Usage: 3097kB
-> Result (cost=0.58..339650.21 rows=381379 width=16) (actual time=0.323..0.325 rows=0 loops=1)
-> Custom Scan (ChunkAppend) on data_points d (cost=0.58..334882.97 rows=381379 width=16) (actual time=0.322..0.323 rows=0 loops=1)
Chunks excluded during startup: 0
-> Index Scan using _hyper_1_143_chunk_ix_device_time on _hyper_1_143_chunk d_1 (cost=0.58..334882.97 rows=381379 width=16) (actual time=0.317..0.317 rows=0 loops=1)
Index Cond: ((run_id = ANY ('{1884,1885,1886,1887,1888,1889,1890}'::integer[])) AND (""time"" >= (now() - '01:00:00'::interval)) AND (""time"" <= (now() - '00:30:00'::interval)))
Planning Time: 1.811 ms
Execution Time: 4.623 ms
Slow:
Finalize GroupAggregate (cost=345807.08..387824.05 rows=425172 width=44) (actual time=208425.622..208532.798 rows=0 loops=1)
Group Key: (time_bucket('00:05:00'::interval, d.""time"")), d.run_id
-> Gather Merge (cost=345807.08..379570.72 rows=250101 width=44) (actual time=208425.605..208532.779 rows=0 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Partial GroupAggregate (cost=344807.07..350434.34 rows=250101 width=44) (actual time=208201.992..208201.997 rows=0 loops=2)
Group Key: (time_bucket('00:05:00'::interval, d.""time"")), d.run_id
-> Sort (cost=344807.07..345432.32 rows=250101 width=16) (actual time=208201.987..208201.991 rows=0 loops=2)
Sort Key: (time_bucket('00:05:00'::interval, d.""time"")), d.run_id
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
-> Result (cost=0.58..322382.83 rows=250101 width=16) (actual time=208201.862..208201.865 rows=0 loops=2)
-> Parallel Custom Scan (ChunkAppend) on data_points d (cost=0.58..319256.56 rows=250101 width=16) (actual time=208201.859..208201.862 rows=0 loops=2)
Chunks excluded during startup: 0
-> Parallel Index Scan using _hyper_1_143_chunk_data_points_time_idx on _hyper_1_143_chunk d_1 (cost=0.58..319256.56 rows=250101 width=16) (actual time=208201.823..208201.824 rows=0 loops=2)
Index Cond: ((""time"" >= (now() - '01:00:00'::interval)) AND (""time"" >= '2025-01-03 11:51:27.887799+07'::timestamp with time zone) AND (""time"" <= (now() - '00:30:00'::interval)))
Filter: (run_id = ANY ('{1884,1885,1886,1887,1888,1889,1890,1891}'::integer[]))
Rows Removed by Filter: 4907252
Planning Time: 1.254 ms
Execution Time: 208533.089 ms
Why does adding a few more values make the query engine do so much more work? Any help is appreciated.