timescaledb continuous aggregate materialized only = true is about 100 times faster than materialized only = false
i expect to continuous aggregate be faster than calculate from real time. but real time calculate is faster than continuous aggregate( when materialized only = false )
continuous aggregate is one hour and refresh policy is set
i have a real time crypto data on top of that i build continuse aggregate which refresh each minute then for real time data i just need 1minute last data from real time table. so there is must not be any diffrence between materialized_only=ture vs materialized_only =false. but there is huge diffrence and materialized_only=false is 100 times slower
Do you use a distributed hypertable by any chance? If not, can you provide a few hints like the query, the execution plan (explain (analyze, buffers and verbose)) and specs on the data. Otherwise it’s all guessing.
According to the provided execution plans, there isn’t a massive difference. Yes the real-time CAGG spends 57ms in planning and 115ms in execution, but that’s not 100x more than 6ms and 78ms.
What happens when you leave out the analyze parameter for explain (using the existing statistics)? Maybe the stats are out of date
Is the result for both queries actually what you expect them to be?
the result for each query is true there is no massive diffrence in explain analyze but in query run
for matirilized only false it take 100 times slower to return result.
i dont get your mean by this. -What happens when you leave out the analyze parameter for explain (using the existing statistics)? Maybe the stats are out of date -
note: when i stress test my website if matirilized only is true i can handle about 1000 request but when matirilized only=false i barely can handle more than 150-200
when i run it with materialized only = true query run in 4.618 ms
and when i run it when materialized only = false query run in 5783.558 ms (first time. - second time 609.657 ms)
query:explain analyze select * from history_data_crypto_1m where slug=‘cardano’ and time_1m > ‘2023-02-01’ and time_1m< ‘2023-11-02’ order by time_1m desc limit 100;
matrilized only true
Limit (cost=0.56..101.05 rows=100 width=58) (actual time=0.134..1.442 rows=100 loops=1)
-> Custom Scan (ChunkAppend) on _materialized_hypertable_103 srt (cost=0.56..82273.33 rows=81871 width=58) (actual time=0.131..1.418 rows=100 loops=1)
Order: srt.time_1m DESC
-> Index Scan Backward using _hyper_103_2229_chunk_crypto_slug_time_1m_idx on _hyper_103_2229_chunk srt_1 (cost=0.56..82273.33 rows=81871 width=58) (actual time=0.129..1.389 rows=100 loops=1)
Index Cond: ((slug = 'cardano'::text) AND (time_1m > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-11-02 00:00:00+00'::timestamp with time zone))
-> Custom Scan (DecompressChunk) on _hyper_103_2208_chunk srt_2 (cost=0.51..50.62 rows=100000 width=58) (never executed)
Filter: ((time_1m > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-11-02 00:00:00+00'::timestamp with time zone))
-> Index Scan Backward using compress_hyper_108_2558_chunk__compressed_hypertable_108_slug__ on compress_hyper_108_2558_chunk (cost=0.29..50.62 rows=100 width=225) (never executed)
Index Cond: (slug = 'cardano'::text)
Filter: ((_ts_meta_max_1 > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (_ts_meta_min_1 < '2023-11-02 00:00:00+00'::timestamp with time zone))
-> Custom Scan (DecompressChunk) on _hyper_103_2207_chunk srt_3 (cost=25.56..51.12 rows=2000 width=58) (never executed)
Filter: ((time_1m > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-11-02 00:00:00+00'::timestamp with time zone))
-> Index Scan Backward using compress_hyper_108_2401_chunk__compressed_hypertable_108_slug__ on compress_hyper_108_2401_chunk (cost=0.29..51.12 rows=2 width=225) (never executed)
Index Cond: (slug = 'cardano'::text)
Filter: ((_ts_meta_max_1 > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (_ts_meta_min_1 < '2023-11-02 00:00:00+00'::timestamp with time zone))
Planning Time: 7.185 ms
Execution Time: 1.707 ms
matrilized only false
Limit (cost=88777.99..88778.24 rows=100 width=58) (actual time=901.865..901.888 rows=100 loops=1)
-> Sort (cost=88777.99..89237.69 rows=183881 width=58) (actual time=901.863..901.877 rows=100 loops=1)
Sort Key: srt.time_1m DESC
Sort Method: top-N heapsort Memory: 53kB
-> Append (cost=25.56..81750.19 rows=183881 width=58) (actual time=0.566..817.439 rows=187862 loops=1)
-> Custom Scan (ChunkAppend) on _materialized_hypertable_103 srt (cost=25.56..78975.75 rows=183875 width=58) (actual time=0.565..797.310 rows=187861 loops=1)
Chunks excluded during startup: 0
-> Custom Scan (DecompressChunk) on _hyper_103_2207_chunk srt_1 (cost=25.56..51.12 rows=2000 width=58) (actual time=0.564..1.312 rows=1439 loops=1)
Filter: ((time_1m > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-11-02 00:00:00+00'::timestamp with time zone) AND (time_1m < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(103)), '-infinity'::timestamp with time zone)))
Rows Removed by Filter: 361
-> Index Scan using compress_hyper_108_2401_chunk__compressed_hypertable_108_slug__ on compress_hyper_108_2401_chunk (cost=0.29..51.12 rows=2 width=221) (actual time=0.124..0.130 rows=2 loops=1)
Index Cond: (slug = 'cardano'::text)
Filter: ((_ts_meta_max_1 > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (_ts_meta_min_1 < '2023-11-02 00:00:00+00'::timestamp with time zone))
Rows Removed by Filter: 99
-> Custom Scan (DecompressChunk) on _hyper_103_2208_chunk srt_2 (cost=0.51..50.62 rows=100000 width=58) (actual time=0.353..54.405 rows=100715 loops=1)
Filter: ((time_1m > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-11-02 00:00:00+00'::timestamp with time zone) AND (time_1m < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(103)), '-infinity'::timestamp with time zone)))
-> Index Scan using compress_hyper_108_2558_chunk__compressed_hypertable_108_slug__ on compress_hyper_108_2558_chunk (cost=0.29..50.62 rows=100 width=221) (actual time=0.056..0.380 rows=101 loops=1)
Index Cond: (slug = 'cardano'::text)
Filter: ((_ts_meta_max_1 > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (_ts_meta_min_1 < '2023-11-02 00:00:00+00'::timestamp with time zone))
-> Bitmap Heap Scan on _hyper_103_2229_chunk srt_3 (cost=943.09..78874.01 rows=81875 width=58) (actual time=51.909..715.434 rows=85707 loops=1)
Recheck Cond: (slug = 'cardano'::text)
Filter: ((time_1m > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-11-02 00:00:00+00'::timestamp with time zone) AND (time_1m < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(103)), '-infinity'::timestamp with time zone)))
Heap Blocks: exact=85381
-> Bitmap Index Scan on _hyper_103_2229_chunk_crypto_slug_1m_idx (cost=0.00..922.62 rows=81875 width=0) (actual time=30.187..30.187 rows=85816 loops=1)
Index Cond: (slug = 'cardano'::text)
-> HashAggregate (cost=16.12..16.23 rows=6 width=58) (actual time=0.284..0.285 rows=1 loops=1)
Group Key: time_bucket('00:01:00'::interval, srt_4."time"), srt_4.slug
-> Custom Scan (ChunkAppend) on history_data_crypto_1s srt_4 (cost=0.43..16.02 rows=6 width=66) (actual time=0.089..0.223 rows=8 loops=1)
Chunks excluded during startup: 5
-> Index Scan using _hyper_19_15927_chunk_idx_unique_slug_time on _hyper_19_15927_chunk srt_5 (cost=0.43..2.67 rows=1 width=58) (actual time=0.086..0.215 rows=8 loops=1)
Index Cond: ((slug = 'cardano'::text) AND ("time" >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(103)), '-infinity'::timestamp with time zone)) AND ("time" > '2023-02-01 00:00:00+00'::timestamp with time zone) AND ("time" < '2023-11-02 00:01:00+00'::timestamp with time zone))
Filter: ((time_bucket('00:01:00'::interval, "time") > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (time_bucket('00:01:00'::interval, "time") < '2023-11-02 00:00:00+00'::timestamp with time zone))
Planning Time: 51.654 ms
Execution Time: 902.408 ms
The issue is the change in index scanning, especially for _hyper_103_2229_chunk_crypto_slug_1m_idx. Can you try to flip the order of the where clause?
explain analyze select * from history_data_crypto_1m where time_1m > ‘2023-02-01’ and time_1m< ‘2023-11-02’ and slug='cardano' order by time_1m desc limit 100;
edit: Also, how did you set up the compression? Any segmentby configuration?
For some reason the materialized_only=false seems to evaluate the chunks in backwards order (which prevents the compressed chunks from actually being evaluated at all (Index Scan Backward using compress_hyper_108_2558_chunk__compressed_hypertable_108_slug__ on compress_hyper_108_2558_chunk (cost=0.29..50.62 rows=100 width=225) (never executed) – see the “never executed”). In addition the materialized_only=false uses a different index scan (not sure why). You may want to try and update the statistics using analyze, if that helps, I don’t know tbh.
If you are ok with only seeing materialized values, yes, go ahead and just enable it.
One last potential fix, you may be able to add a composite index on slug and time_1m and see if the planner wants to use the composite index to speed up the uncompressed chunk reading.
It certainly is using the “wrong” index crypto_slug_1m_idx which I think is due to the fact that there wouldn’t be any use in the time filter, since the chunk is fully contained in the requested time range. Anyways, I’d try to drop that one. In general, are you sure you need all those different indexes? You almost have every mutation possible. I’d recommend to go with less mutations and adjust queries accordingly (like using the same order for where clauses and order bys). The more indexes you have, the less reliable the query planner (as there is more options to go wrong).
no surely i dont that indexes but im stuck at continues aggregate low speed and i have this problem for almost so long time. i create them to see any process or something. but i completely stuck. my continues aggregate is 100 time slower it seems so wrong
I’d drop all custom indexes and start with an index (slug, time_1m desc) and see if that helps. If it doesn’t, delete the index and try the next. Just dropping more indexes will probably make it worse. Every potential index has influence and the planner’s decision and depending on the statistics you may get very bad results.
Anyhow, looking at the query plan, I am not sure why reading the uncompressed (but materialized chunk) _hyper_103_2229_chunk yields all 85k rows, instead of limiting it to 100 like the already compressed chunks before.