Ideal compression and compress_chunk performance for huge tables

Francesco · November 16, 2023, 10:10am

Hello everyone,

We have recently installed a TimescaleDB instance on our AWS account to test it and evaluate the possibility to adopt it for our data signals analysis, instead of Athena or other tools.
The problem is that we have thousands of sensors with a sampling rate of 10Hz and this bring to have tons of data (GBs of compressed parquet files, that becomes TBs of uncompressed data for many days or weeks).
The data is originally in the format with columns: timestamp, signal_1, …, signal_N (with N that could be from 500 to 10000 for example).

Since I checked TimescaleDB compression capabilities, I thought it would be better to have data in long format, like: “timestamp”, “signal”, “value” in order to compress data segmenting by the signal field.
So we have a job to flip data like that and load it to TimescaleDB using COPY statements, but of course we have millions of rows per each hour of data like that.
But in this way we can greatly compress data, also the query performances looks very interesting.

Our problem now is that compressing data could take an incredible amount of time, using the compress_chunk function.
Some chunks take few tens of seconds, but other can take tens of minutes for some reason!
Each chunk should have 1 hour of data as we set the definition, for my understanding.
I have no idea why, do you have some suggestion?
Is it needed to adopt distributed tables for huge amount of data like that?
Is there something off in our current definition of the table for our kind of data?

Currently we have a single table defined as:

-- Create table
CREATE TABLE public.signals_numeric
(
    "time" timestamp without time zone NOT NULL,
    signal text NOT NULL,
    value double precision,
    PRIMARY KEY ("time", signal)
);

-- Create hypertable
SELECT create_hypertable(
	'signals_numeric', 'time',
	migrate_data => TRUE,
	chunk_time_interval => INTERVAL '1 hour'
);

-- Activate compression for the hypertable
ALTER TABLE signals_numeric SET (
   timescaledb.compress,
   timescaledb.compress_orderby = 'time DESC',
   timescaledb.compress_segmentby = 'signal',
   timescaledb.compress_chunk_time_interval = '1 day'
);

Thank you very much in advance for your support.