Hi @jonatasdp, I have tried your suggestion. However, it doesn’t compress in parallel.
I started a psql and tried to compress a chunk:
=> select compress_chunk(‘_timescale_internal._hyper_148_3030_chunk’);
Then I started another psql and tried to compress another chunk:
=> select compress_chunk('_timescale_internal._hyper_148_3031_chunk");
I observed only a single core is used at a time and the 2nd compress_chunk seems to actually start after the 1st one completed. Eventually, the 2nd compress_chunk took a double time to complete.
It looks like there is a lock at table or index level for compressing chunks at TimeScale (2.13.1, Postgres 15.3). Could you confirm if this behavior is expected?
My current use case took 1.5 hours to compress a day of data. With a multi-core machine, I hope we could reduce this time significantly.
So I re-tested it with the latest TimescaleDB 2.14.1.
I can confirm 2.14.1 can compress_chunk(‘1_chunk’) in parallel in 2 connections.
Now the caveat is I can’t trigger the following command in 2 connections to have 2 parallel compression:
select compress_chunk(i, if_not_compressed => true) from show_chunks(‘table’) i;
The reason is if I trigger above, they will attempt to compress the same outstanding uncompressed chunk. Effectively this will serialize again.
My current thought is we can implement parallel compress by having multiple connections, but each needs to compress different chunks. Ideally I hope TimescaleDB will make it easier.
Can we add a skip_if_already_compressing or max_outstanding_compression parameter to compress_chunk()? What do you think?