How to improve compression?

Hi all,

We have a 31Gb hypertable that we want to compress.

This table has 102 million rows/month, with chunks with ~3.2 million rows/day.

We use the following query to know about the table size:

SELECT 
        pg_size_pretty(total_bytes) as "Total",
	pg_size_pretty(table_bytes) as "Table",
	pg_size_pretty(index_bytes) as "Index",
	pg_size_pretty(toast_bytes) as "Toast"
FROM 
	hypertable_detailed_size('table');

Before compression we get these results:

Total Table Index Toast
31 GB 30 GB 630 MB 48 kB

And after compression (segmented by one of our columns) we get this:

Total Table Index Toast
30 GB 3005 MB 129 MB 27 GB

So we get a compression of just 1Gb out of 30 Gb (total size), since the data in the table (30Gb) has moved to the “Toast” part (27Gb).

We think this is due to the nature of the columns involved, that have wild changes between consecutive rows, making the compression algorithms not very effective.

Questions are:

  1. Is there any way to compress the “Toast” part even further? Any ideas?
  2. Is a chunk with 3.2 million rows too large for compression? If we created smaller chunks will compression improve?

Thanks in advance!

Hi @labs! compression will have a good compression ratio on large datasets while it will delay more. The problem is more on the data domain that you’re compressing is not that suitable for the actual algorithms. Please, see how the compression algorithms works here.

Can you share your data structure and an anonymized sample of the data you have?

Hi @jonatasdp , thanks for your quick reply.

It’s difficult to share the exact data strutcure/anonymized table that reproduces the problem.

Here’s a first try: https://file.io/zYctYPx3t9XV (this is a “test.sql” with a sample table structure and hypertable settings and a “test.csv” file with some data)

With this simple test we’re having an initial sizing of:

Total Table Index Toast
10 MB 9600 kB 776 kB 0 bytes

After the compression job runs we’re getting higher sizes instead

Total Table Index Toast
138 MB 73 MB 2120 kB 63 MB

I’ll try to build a sample that reproduces the problem, but it’s complicated.

Meanwhile, are there any alternative compression techniques you can suggest that we could consider?

Hi, @labs ! Have you solved the issue of increasing the size of tables after compression? If so then could you share the solution?

Hi @LuxCore . I don’t thinks it’s solvable: This is the way Timescale works: some data will not compress at all and size will increase.

1 Like