Configuration Data Retention - Narrow Table Schema

Jason · January 3, 2023, 11:30am

Hey,
We are using a narrow DB Schema to store arbitrary double values (changed based).

 CREATE TABLE IF NOT EXISTS historian.Ms(
                timestamp timestamptz,
                tag_id smallint NOT NULL, value double precision NULL) WITH ( OIDS = FALSE )
                TABLESPACE pg_default;
                ALTER TABLE historian.Ms
                    OWNER to dbadmin;
            DROP INDEX IF EXISTS historian.tag_id_timestamp_idx;
            CREATE INDEX IF NOT EXISTS Ms_tag_id_idx ON historian.Ms (tag_id);

            SELECT create_hypertable('historian.Ms', 'timestamp', if_not_exists => TRUE);

            select set_chunk_time_interval('historian.Ms', INTERVAL '{LogDataTableType.Process_Data_MS.GetChunkIntervalInHours()} hours')
;

A new requirement has come up where we want each tag to be able to have a configurable rentention date. E.g. Tag1 should be kept for 3 months and Tag2 should be kept for 1 year. The typical way that the TimeScaleDb documentation recommends data deletion is via dropping chunks, but this is all or nothing.

We have tried using spatial dimensions on Tag_Id, but we could not get the chunks to only contain 1 tag.

Does anyone have any experience with this?

Any help is greatly appreciated.

Thanks,
Jason

James · January 4, 2023, 8:30am

Welcome to the forum Jason!

You can absolutely do this with space partitioning:

a=# select add_dimension('historian.Ms', 'tag_id', chunk_time_interval=>1);
       add_dimension       
---------------------------
 (2,historian,ms,tag_id,t)
(1 row)

a=# insert into historian.ms values(now(), 1, 1);
INSERT 0 1
a=# insert into historian.ms values(now(), 1, 2);
INSERT 0 1
a=# insert into historian.ms values(now() - interval '1 day', 1, 1);
INSERT 0 1
a=# insert into historian.ms values(now(), 2, 1);
INSERT 0 1
a=# insert into historian.ms values(now(), 5, 1);
INSERT 0 1
a=# \d+ historian.ms
                                                    Table "historian.ms"
  Column   |           Type           | Collation | Nullable | Default | Storage | Compression | Stats target | Description 
-----------+--------------------------+-----------+----------+---------+---------+-------------+--------------+-------------
 timestamp | timestamp with time zone |           | not null |         | plain   |             |              | 
 tag_id    | smallint                 |           | not null |         | plain   |             |              | 
 value     | double precision         |           |          |         | plain   |             |              | 
Indexes:
    "ms_tag_id_idx" btree (tag_id)
    "ms_timestamp_idx" btree ("timestamp" DESC)
Triggers:
    ts_insert_blocker BEFORE INSERT ON historian.ms FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Child tables: _timescaledb_internal._hyper_1_1_chunk,
              _timescaledb_internal._hyper_1_2_chunk,
              _timescaledb_internal._hyper_1_3_chunk
              _timescaledb_internal._hyper_1_4_chunk
Access method: heap

Now should you do this - that’s a harder question. How many chunks are you likely to have if you enable space partitioning (worst case it would be total tags * total time chunks)?

We recommend extreme caution when choosing to use space partitioning.

-James

filip · January 9, 2023, 10:08am

Hi James.
Thank you very much for your response - very helpful.
I work with Jason, and have some follow-up questions I hope you can resolve:
We have circa 500 id’s (in our case Tag-id) that we would need to space-partition, meaning we would get 500 chunks per interval. Maybe that is too much to handle efficiently… but I have the following questions:

It is not possible to set the chunk_time_interval individually for each Id? The data refresh rate varies greatly, so we would end up with some chunks only having a couple datapoints, and other having millions, which isn’t very query efficient.
The chunk_time_interval parameter is set to 1 in your example - what does that mean? I assumed it would mean, that space-partitioning uses the same chunk interval as the hypertable (which is set by set_chunk_time_interval function). Correct?
Since we are using space-partitioning, is there an efficient way to locate what tagid’'s belong to which chunks? I can see there is a chunk table in the information db, and i have also noticed the tableoid::regclass function to get the chunk from the hypertable, but this will run very slowly once the database get filled up.

I hope you understand these question - if not I would be happy to elaborate.
-Filip