Glacial query speeds, looking for guidance [beginner]

bearded_owl · August 5, 2024, 3:32pm

Dear community,

We are using timescale self-hosted inside a docker container. Generally, we are very happy with the performance and features, but we recently stumbled into a problem that I’m unable to track down.

The container is running on an AWS EC2 instance with 2 vCPUs (3,1 GHz) and 4 GB RAM.
So definitely not the strongest machine out there … memory access is not restricted, so all 4 GB of RAM and all CPU resources can be used from the container.

We have a table, lets call it device_state, containing 200 million rows. We write approx. 2000 rows per minute.

It was created using these statements:

CREATE TABLE device_state (
	"time" timestamptz NOT NULL,
	 id_part_1 uuid NOT NULL,
	 id_part_2 varchar NOT NULL,
	 id_part_3 uuid NOT NULL,
	... 16 more columns ...
);
SELECT create_hypertable('device_state', 'time', chunk_time_interval => INTERVAL '1 day');
CREATE UNIQUE INDEX ON device_state(id_part_1, id_part_2, id_part_3, time DESC);
CREATE INDEX ON device_state(id_part_1, id_part_2, time DESC);
ALTER TABLE device_state SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'id_part_1, id_part_2'
);
SELECT add_compression_policy('device_state', compress_after => INTERVAL '90d');

So, we have 2 indexes on the hypertable, and compression enabled after 90days. Chunks get created every day. Chunks are around 900MB in size.

Now, queries that go narrow and deep, for example:

SELECT avg(some_column) as avg
FROM device_state
WHERE time < (now() - INTERVAL '5 days')
	  AND id_part_1 = 'abc' 
	  AND id_part_2 = 'def' 
	  AND id_part_3 = 'ghi' 
GROUP BY id_part_1, id_part_2, id_part_3

This takes very long to complete (2 minutes 30 seconds) for 120’000 rows that needed to be found and averaged.

My question is: Why ? How can I check why it takes so long, and how can I fix it ?
I would have expected that the index is used and the query is very fast.
Is my hardware (massively) under-sized for the size of table, or is something else at work here ?

Thank you all in advance!

jonatasdp · August 5, 2024, 3:56pm

Hi Fabian, do you know if it’s toasting your data?

You may have an issue with too much data for a row and it needs to use the TOAST mechanism from postgresql.

Can you share what explain analyze says about it?

Also, on compression details, I see you’re not adding id_part_3 to the segment. Any reason to skip it? Did you see any difference if you’re just querying on the old data that is already compressed?

bearded_owl · August 5, 2024, 10:31pm

Thank you Jônatas !

You know, this is the first time that I hear about the TOAST mechanism, so that’s already nice
I don’t think that it applies here, as I cannot imagine one row having more than 8kb of data in it, they only contain

11 float4 entries
4 varchar
1 timestamptz
1 bool

The varchars are not huge, but just single words. But I post the EXPLAIN ANALYZE output here, I hope we see something.

But first about the missing id_part_3 column in the compression: I should have been better with my naming, sorry. The table contains device data. We have a building_id, a floor_id and a device_id, and then a bunch of data for that device. The building_id is unique, the floor_id is unique only within a building, and the device_id is only unique within the floor.
For business reasons, the data on individual devices lose value over time, and after 90 days, we would generally query only for the floor or building but not individual devices.
The documentation about compression talks about that you should segment by the things you query and not more, to not risk having small compressed batches, so I chose to omit the third part, the device_id, or, as I named it originally, ‘id_part_3’ .

HERE is the EXPLAIN ANALYZE output, I already adjusted the column names to building_id, floor_id and device_id to make it a bit clearer what we talk about.

The output is too big for a post here. I uploaded it as a .txt file to OneDrive, I hope that’s ok, if not I can split it in 2 post here or do something different.

Anyway, thank you already in advance !
Greetings
Fabian

jonatasdp · August 7, 2024, 7:15pm

Thanks Fabian! good enough for this!

You can also use this services like despesz and share a temporary link here

Let’s check what is the part that need to get improved here:

 Custom Scan (DecompressChunk) on _hyper_25_11580_chunk  (cost=1.09..1.09 rows=1000 width=61) (actual time=0.005..0.006 rows=0 loops=1)"
"              Filter: ((device_id = '713c7a86-8bee-4375-ba9f-c49e72cab875'::uuid) AND (""time"" < (now() - '5 days'::interval)))"
"              ->  Seq Scan on compress_hyper_30_13221_chunk  (cost=0.00..1.09 rows=1 width=150) (actual time=0.005..0.005 rows=0 loops=1)"
"                    Filter: ((floor_id = 'c160942e-1113-4a1b-b2c6-71ac7ca4d4a0'::uuid) AND ((building_id)::text = '657b1792b9a2970019fbcb79'::text))"
"                    Rows Removed by Filter: 6"

When it says Seq Scan it means it will need to go row by row.

Also, can understand here why:

 ((building_id)::text = '657b1792b9a2970019fbcb79'::text))"

It’s converting both sides to text instead of typecast to uuid. You should probably find a way to typecast your param and avoid this casting during the execution.

I’d even recommend you to create a simple table like:

create table installation {
   id bigint primary key serial,
   device_id references devices,
  building_id references building, 
  floor_id references floor,
}

and then you can refer to installation.id into your hypertable which will aggregate all cases and be much easier indexed.