Hi, I was hoping to get some help with some slow query performance - I have to support a query to select items within a time range, as well as a further n items beyond the time range (which end to take depends on the sort order)
Right now I am solving this by doing a union of 2 queries - for example:
(SELECT event_time,event_data FROM test_events
WHERE series_id = 1
AND event_time >= ‘20200101T00:00:01Z’
AND event_time < ‘20200101T00:00:05Z’
ORDER by event_time)
UNION ALL
(SELECT event_time,event_data FROM test_events
WHERE series_id = 1
AND event_time >= ‘20200101T00:00:05Z’
ORDER by event_time
LIMIT 3); – could be any number, usually 1-10
But my performance is poor and gets even worse when there are many millions of rows.
(my only index comes from my primary key definition:
CONSTRAINT vector_events_pkey PRIMARY KEY (series_id, event_time)
giving me a btree index as well as a uniqueness constraint needed for upsert )
I have a (non timescale db fiddle, showing schema, indexes and an example query here: DB Fiddle - SQL Database Playground )
Taken alone, the first part of the above query performs pretty well, the second part is not so good. So I have a couple of questions
-
Is there a better way of writing a query like this? Perhaps a timescale extension that can be used? I see that both parts of the union are doing their own scan of the index. Intuitively, I wish that the index scan of the bounded query could just continue a bit further to take the extra rows, but I don’t know if / how that can be done. I have heard of cursors, but never used them I wonder if that might be an approach to take?
Maybe there is something smarter to be done with the sorting? -
I generally see in explain analyze that querying a hypertable with no upper bound on time such as
SELECT * FROM some_hypertable WHERE event_time > @lowerBound ORDER by event_time LIMIT 5
Seems to do a parallel index scan on all chunks after the lower bound - for our queries, the LIMIT is low, and the number of chunks very high, so this is quite costly - is there some way of giving timescale a hint to scan chunks sequentially? I think that in almost all cases, we should reach our result limit within 1-2 chunks of the start timestamp.
I really hope that this can be done better as this query is really letting down what’s otherwise a pretty good solution.
btw I’m using PostgreSQL 13 and Timescale 2.5 but am working to upgrade to postgres 14 and timescale 2.6. If any more info is needed I’ll be glad to provide it.