We are assessing migration from InfluxDB to Timescale and are testing different aspects of that. This question is about data size on disk.
So. Out data has ~1000 different “columns” or “measurements” depending how you would design the tables. We currently have 9 months of 1 second precision data. 5 year of 1 minute precision data (avg+min+max).
This amount of data takes usually <10Gb by IndfluxDB (probably a lot of similar values).
I am trying to measure our size with Timescale. I am using timescale/timescaledb:2.11.2-pg15
as Docker image.
Used single table for all measurements CREATE TABLE ts.measurements ( time TIMESTAMPTZ NOT NULL, measurement varchar, value double precision);
. Created hypertable for it SELECT create_hypertable('ts.measurements', 'time', chunk_time_interval => INTERVAL '1 day');
and tested by adding 12 months of values
INSERT INTO ts.measurements SELECT time, 'SYMBOL', (random()*30)::int FROM generate_series(TIMESTAMP '2023-01-01 00:00:00', TIMESTAMP '2023-02-01 00:00:00' + INTERVAL '-1 second', INTERVAL '1 second') AS time;
INSERT INTO ts.measurements SELECT time, 'SYMBOL', (random()*30)::int FROM generate_series(TIMESTAMP '2023-02-01 00:00:00', TIMESTAMP '2023-04-01 00:00:00' + INTERVAL '-1 second', INTERVAL '1 second') AS time;
INSERT INTO ts.measurements SELECT time, 'SYMBOL', (random()*30)::int FROM generate_series(TIMESTAMP '2023-04-01 00:00:00', TIMESTAMP '2023-06-01 00:00:00' + INTERVAL '-1 second', INTERVAL '1 second') AS time;
INSERT INTO ts.measurements SELECT time, 'SYMBOL', (random()*30)::int FROM generate_series(TIMESTAMP '2023-06-01 00:00:00', TIMESTAMP '2023-08-01 00:00:00' + INTERVAL '-1 second', INTERVAL '1 second') AS time;
INSERT INTO ts.measurements SELECT time, 'SYMBOL', (random()*30)::int FROM generate_series(TIMESTAMP '2023-08-01 00:00:00', TIMESTAMP '2023-10-01 00:00:00' + INTERVAL '-1 second', INTERVAL '1 second') AS time;
INSERT INTO ts.measurements SELECT time, 'SYMBOL', (random()*30)::int FROM generate_series(TIMESTAMP '2023-10-01 00:00:00', TIMESTAMP '2023-12-01 00:00:00' + INTERVAL '-1 second', INTERVAL '1 second') AS time;
INSERT INTO ts.measurements SELECT time, 'SYMBOL', (random()*30)::int FROM generate_series(TIMESTAMP '2023-12-01 00:00:00', TIMESTAMP '2024-01-01 00:00:00' + INTERVAL '-1 second', INTERVAL '1 second') AS time;
This is ~31m rows.
Added compression
ALTER TABLE ts.measurements SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'measurement'
);
SELECT add_compression_policy('ts.measurements', INTERVAL '7 days');
Now Postgress pgdata is 3.1gb. After full vacuum it is 2.1gb.
SELECT hypertable_name, hypertable_size(format('%I.%I', hypertable_schema, hypertable_name)::regclass)
FROM timescaledb_information.hypertables;
hypertable_name | hypertable_size
-----------------+-----------------
measurements | 2170134528
(1 row)
If this is size is linear we would need 2100Gb of data just for 1 year of data (2.1gb * 1000). This would be order of magnitudes bigger than InfluxDB had.
Am I doing something wrong?