Hi,
Could you please help me to choose right chunking strategy for my conditions?
My initial conditions:
We have 10k devices sending data with a nonlinear frequency of 2 to 2k records per second (let’s take 10 records per second as an average).
Saved data not changing in time (no update or delete operations).
We need to keep this data forever (no TTL usage for this data).
Reading requests are very simple (without joins or groupings), the specific device, period, sorting and number of rows are always specified:
select *
from devices_signals
where device_id = $1
and dtime between $2 and $3
order by dtime desc
limit 300
I assume this structure and settings at the moment:
create table devices_signals (
device_id int2 not null,
dtime timestamp not null,
a double precision,
b double precision,
c double precision,
d double precision,
e double precision,
f double precision,
g double precision
);
ALTER TABLE devices_signals SET (autovacuum_enabled = false);
SELECT create_hypertable('devices_signals', by_range('device_id', 1), create_default_indexes => False);
SELECT add_dimension('devices_signals', by_range('dtime', interval '1 week'));
ALTER TABLE devices_signals
SET (timescaledb.compress,
timescaledb.compress_orderby = 'dtime DESC',
timescaledb.compress_chunk_time_interval = '1 hours');
CREATE INDEX devices_signals_dtime_brin_idx ON devices_signals USING BRIN(dtime);
Several dilemmas confuse me.
It is assumed that the system will be loaded with readings and since during readings we always know the device_id, it seems appropriate not to mix data from different devices into one chunk. But from a writing point of view, it would be more economical to write everything in one chunk.
My questions about the best organization of tables and partitioning for these requirements:
- Should I use 10k tables or 1 for all devices?
If I use 1 table it is easy to insert data for backend application but in this case PG + TS should decide what chunk right for each row. But if I use 10k tables, I need 10k inserts (for each table) for each moment - it doesn’t look scalable and fast. - Maybe would it be better to use 1 chunk for all devices and use
timescaledb.compress_segmentby
from
Timescale Docs
?
Will TS fast seek to right segment in the chunk and use it? If yes it should work only for compressed chunk, right? What then should we do while the data is not compressed and is in the same chunk for different devices? - After I load data from my data sample file and compress chunks I noted BTree index takes 0 bytes in
SELECT *
FROM chunk_compression_stats('devices_signals ')
Do I understand correctly that if a chunk is packed, then the index is no longer needed?
4) When I created table, added BRIN index and loaded data then index does not work for select but BTree index works fine, why?
5) Why by default TS use BTree index instead of BRIN?
6) Am I right that if we don’t have any update and delete operations, I can disable autovacuum for the table?