Hi, I am seeing some surprisingly slow query times when trying to select the latest events on my continuous aggregate.
These timeseries are about 0.1Hz, so 1 event every 10 sec, and new data is coming in all the time. The db in question is running timescale 2.6.0, self hosted in AKS.
This query to get the latest 2 events takes 88 seconds:
SELECT * FROM vector_events_agg_5m WHERE vector_stream_id = 12194 ORDER BY agg_interval DESC limit 2;
https://explain.depesz.com/s/VRBi7
So the query seems to cause the planner to go and get hundreds of thousands of aggregates, before returning the 2 that are needed.
By comparison, this similar query, just filtering that the events must be in the last 10 days takes 2.5 seconds (still not great, but a lot better)
SELECT * FROM vector_events_agg_5m WHERE vector_stream_id = 10615 AND agg_interval > '20230101' ORDER BY agg_interval DESC limit 2;
https://explain.depesz.com/s/m4ci
I have also tested on another server running 2.9.0, where querying without the extra filter is clearly better than 2.6.0
https://explain.depesz.com/s/pAdj
While I don’t think 4.5 sec is all that fast, it is certainly a huge improvement over timescale 2.6.0
The db servers are generally healthy, policies are running as they should, and the continuous agg high watermark is always about 10-15 min in the past.
The continuous aggregate policy has a 10 minute end_offset, 30 day start_offset, 10 minute schedule_interval.
There is an ongoing ingest of data, so in most cases the latest 2 events will be within the 10 minutes that have not yet been materialized. Some ingest will be late-arriving but never older than 30 days.
The definition is as follows:
CREATE MATERIALIZED VIEW vector_events_agg_5m
WITH (timescaledb.continuous) AS
SELECT ev.vector_stream_id, time_bucket('5 minute', ev.event_time) AS agg_interval, SUM(ev.event_data0) as sum0, MIN(ev.event_data0) as min0, MAX(ev.event_data0) as max0, COUNT(ev.event_data0) as count
FROM vector_events ev
WHERE ev.event_data0 != 'NaN'
GROUP BY vector_stream_id, agg_interval;
While in most cases that latest events will be very recent, will be a few where the latest aggregate may be some weeks or months in the past - so it’s not possible to always include the ‘after now - 1 week’ filter.
If I test on another db running 2.6.0, where there is no live ingest of data then the query response when getting latest events is quite good, even without the lower bound.
https://explain.depesz.com/s/63lp
I suppose that in this case, there are stats to guide the planner to look in the latest materialized chunk.
So my question is, how can I best improve my query, or continuous agg definition, or even my planner settings to improve my results? Taking 90 seconds for a query like this is worse than useless, as the query puts a big load on the server, while the dashboards relying on it will timeout or cause users to spam refresh.
I hope our servers will all be running 2.9.0 in the next couple of weeks but even then, 4.5 sec is quite a lot worse than I’d have hoped for. Intuitively I feel like if the query planner made better choices it should usually be possible to return in under 1 sec.
I don’t know how to formulate this, but can I write a select query to say
SELECT * FROM vector_events_agg_5m WHERE stream_id=1234 AND agg_interval > *cagg_highwatermark* LIMIT 2
*if this does not yield 2 rows then union with*
SELECT * FROM vector_events_agg_5m WHERE stream_id=1234 AND agg_interval < *cagg_highwatermark* LIMIT 2