How to improve compression?

labs · April 13, 2023, 7:58pm

Hi all,

We have a 31Gb hypertable that we want to compress.

This table has 102 million rows/month, with chunks with ~3.2 million rows/day.

We use the following query to know about the table size:

SELECT 
        pg_size_pretty(total_bytes) as "Total",
	pg_size_pretty(table_bytes) as "Table",
	pg_size_pretty(index_bytes) as "Index",
	pg_size_pretty(toast_bytes) as "Toast"
FROM 
	hypertable_detailed_size('table');

Before compression we get these results:

Total	Table	Index	Toast
31 GB	30 GB	630 MB	48 kB

And after compression (segmented by one of our columns) we get this:

Total	Table	Index	Toast
30 GB	3005 MB	129 MB	27 GB

So we get a compression of just 1Gb out of 30 Gb (total size), since the data in the table (30Gb) has moved to the “Toast” part (27Gb).

We think this is due to the nature of the columns involved, that have wild changes between consecutive rows, making the compression algorithms not very effective.

Questions are:

Is there any way to compress the “Toast” part even further? Any ideas?
Is a chunk with 3.2 million rows too large for compression? If we created smaller chunks will compression improve?

Thanks in advance!

jonatasdp · April 13, 2023, 8:17pm

Hi @labs! compression will have a good compression ratio on large datasets while it will delay more. The problem is more on the data domain that you’re compressing is not that suitable for the actual algorithms. Please, see how the compression algorithms works here.

Can you share your data structure and an anonymized sample of the data you have?

labs · April 15, 2023, 10:01am

Hi @jonatasdp , thanks for your quick reply.

It’s difficult to share the exact data strutcure/anonymized table that reproduces the problem.

Here’s a first try: https://file.io/zYctYPx3t9XV (this is a “test.sql” with a sample table structure and hypertable settings and a “test.csv” file with some data)

With this simple test we’re having an initial sizing of:

Total	Table	Index	Toast
10 MB	9600 kB	776 kB	0 bytes

After the compression job runs we’re getting higher sizes instead

Total	Table	Index	Toast
138 MB	73 MB	2120 kB	63 MB

I’ll try to build a sample that reproduces the problem, but it’s complicated.

Meanwhile, are there any alternative compression techniques you can suggest that we could consider?

LuxCore · July 2, 2023, 9:12am

Hi, @labs ! Have you solved the issue of increasing the size of tables after compression? If so then could you share the solution?

antonio · July 9, 2023, 10:01am

Hi @LuxCore . I don’t thinks it’s solvable: This is the way Timescale works: some data will not compress at all and size will increase.