Performance of updates on compressed table chunks

Jarno · December 20, 2023, 4:55pm

Hi All,

I’m having a couple of issues with the performance of compressed chunks, maybe my expectations are too high or (hopefully) I’m doing something wrong here.

Situation/setup:

hypertable with chunksize of a day and (now) around 200M records per chunk(day).
The size is around 27-30GB per chunk (uncompressed)
Index on timestamp and device_id (different names in reality, but principle is very much the same)
There’s around 2M devices each having about a 100 records a day
compression is setup on timestamp desc and device_id as segment

Now I compress the table and I see the stats going from 27gb to around 1gb when compressed which is very good I’m happy with that!

However I have couple of concerns:

Compression of a chunk (doing it manually now) takes around 5 minutes for an individual chunk, which is OK
When compressing multiple chunks using the command found here: it doesnt take days*5 minutes it’s more like forever (yes yes memory at some point is full of course). It this expected? Should it not commit it per chunk?
Inserts seem to be OK
Updates on individual device_id are OK, takes a couple of ms regardless if it’s on a specific timestamp or on a > timestamp
Updates on more than a single device_id (where device_id IN(1,2,3) or where (device_id = 1 or device_id = 2) regardless of specific or > and < timestamp take horribly long and it seems it first decompresses the whole chunk before actually doing the update. I see this is HW statistics, but in the timescale views/functions it still claims the chunk is compressed.

 update timeseries_compressed_2 set val=60001000 where (register_id = 836434) and read_ts > '2020-01-01 00:00:00+00';
-- couple of MS for 96 updates
 update timeseries_compressed_2 set val=60001000 where (register_id = 836434 or register_id = 9999999) and read_ts > '2020-01-01 00:00:00+00';

-- will takes 5 to 8 minutes

jonatasdp · December 20, 2023, 6:37pm

Hey Jarno, very nice story and good to see that you figured out several details yourself.

Probably the transaction is too big and it’s paginating memory on disk.

Yes, that’s what’s happening. You can confirm with explain analyze that will show that DecompressChunk is part of the plan.

This is because it will need to recreate the entire chunk only with the new data. Probably delete command will have similar behavior.

If you’re going to do several updates/deletes or backfill, think about decompressing the area and recompressing later. You can also see we have an official backfill script: timescaledb-extras/backfill.sql at master · timescale/timescaledb-extras · GitHub

You should control the transactions. If you send it all in the same transaction, that’s the effect. Just trigger it from a connection and you can even parallelize each chunk.

Jarno · December 20, 2023, 9:53pm

Thanks Jônatas!

So to confirm:

Compressing (or decompressing) should be done per chunk and ensuring to commit per chunk, so better to make a loop into a procedure or function instead of using the script like this:

SELECT compress_chunk(i, if_not_compressed => true)
    FROM show_chunks(
        'example',
        now()::timestamp - INTERVAL '1 week',
        now()::timestamp - INTERVAL '3 weeks'
    ) i;

as that will cause it to do so in a single transaction causing huge memory / temp/rollback logs as it will compress all chunks in the same transaction.

I did some more test with the updates, but beforehand also tested with inserts in larger amounts.

A large amount of inserts works actually fine, 10.000 inserts on a compressed table takes like a second.

The updates on a single device/register are fine, both when doing a single record as well as when doing > hitting 100 records.

However when I put these into a loop (wrote a simple procedure to do the updates) even with hitting single record updates it just halts after the first 5 records and starts to decompress everything before finishing hte job (taking 5-10 minutes) and afterwards it seems to be as if the chunk remains uncompressed (recompress takes 3-4 minutes afterwards)

NOTICE:  update for register_id: 10001
NOTICE:  UPDATE took -00:00:00.013511 for 1 records
NOTICE:  update for register_id: 10002
NOTICE:  UPDATE took -00:00:00.014259 for 1 records
NOTICE:  update for register_id: 10003
NOTICE:  UPDATE took -00:00:00.014987 for 1 records
NOTICE:  update for register_id: 10004
NOTICE:  UPDATE took -00:00:00.015579 for 1 records
NOTICE:  update for register_id: 10005
NOTICE:  UPDATE took -00:00:00.016273 for 1 records
NOTICE:  update for register_id: 10006
ERROR:  canceling statement due to user request

Funny thing is that it always halts after 5 loops, regardless of which device/register_id I use or if I update 1 record or 96 records at the time.

Out of the 200M records I’m only updating a couple of thousand at the time, should it really decompress fully, it’s quite different behaviour compared to the inserts I must say.

I’m hoping I missed something in the documentation as we’re really keen on the compression and cannot really handle compress and decompress of multiple chunks all the time. My current code is just small scale, of course on full scale I’ll use a bigger server, but hardware doesn’t seem to be the bottleneck at all.

I’ll gladly have somewhat poorer compression like 70% for example if this would solve the extremely slow updates, but I haven’t found any options to set the compression algorithms or settings manually.

Would love some advice on this

jonatasdp · October 11, 2024, 1:43pm

Hi Jarno, I really missed your reply, sorry for not coming back. But last month, Timescale released a series of improvements on upserts:

Please, update to latest versions and share your experience with us.

Also great info here: Allowing DML Operations in Highly Compressed Data in Postgres

luisaweber · October 15, 2024, 7:20am

Hi,
According me. it’s normal for compressing multiple chunks to take significantly longer than compressing a single chunk due to resource limitations. Compression usually processes in batches, and when you exceed memory capacity, it may struggle to manage multiple chunks simultaneously. Consider compressing chunks in smaller batches to avoid memory overflow.

Thanks