We have a 31Gb hypertable that we want to compress.
This table has 102 million rows/month, with chunks with ~3.2 million rows/day.
We use the following query to know about the table size:
SELECT
pg_size_pretty(total_bytes) as "Total",
pg_size_pretty(table_bytes) as "Table",
pg_size_pretty(index_bytes) as "Index",
pg_size_pretty(toast_bytes) as "Toast"
FROM
hypertable_detailed_size('table');
Before compression we get these results:
Total
Table
Index
Toast
31 GB
30 GB
630 MB
48 kB
And after compression (segmented by one of our columns) we get this:
Total
Table
Index
Toast
30 GB
3005 MB
129 MB
27 GB
So we get a compression of just 1Gb out of 30 Gb (total size), since the data in the table (30Gb) has moved to the “Toast” part (27Gb).
We think this is due to the nature of the columns involved, that have wild changes between consecutive rows, making the compression algorithms not very effective.
Questions are:
Is there any way to compress the “Toast” part even further? Any ideas?
Is a chunk with 3.2 million rows too large for compression? If we created smaller chunks will compression improve?
Hi @labs! compression will have a good compression ratio on large datasets while it will delay more. The problem is more on the data domain that you’re compressing is not that suitable for the actual algorithms. Please, see how the compression algorithms works here.
Can you share your data structure and an anonymized sample of the data you have?
It’s difficult to share the exact data strutcure/anonymized table that reproduces the problem.
Here’s a first try: https://file.io/zYctYPx3t9XV (this is a “test.sql” with a sample table structure and hypertable settings and a “test.csv” file with some data)
With this simple test we’re having an initial sizing of:
Total
Table
Index
Toast
10 MB
9600 kB
776 kB
0 bytes
After the compression job runs we’re getting higher sizes instead
Total
Table
Index
Toast
138 MB
73 MB
2120 kB
63 MB
I’ll try to build a sample that reproduces the problem, but it’s complicated.
Meanwhile, are there any alternative compression techniques you can suggest that we could consider?