As far as I understand, “chunks” are part of the overall database cluster. I can compress or delete a chunk, but you cannot separately backup, un-link and later re-load chunks. They’re not like individual files that you can back up, remove, then later restore from backup. With other databases there’s a “big file” approach that results in one file per “partition”, allowing to add/remove partitions that way. Is there anything in Postgres or Timescale that can help to remove chunks and later add them back in?
Chunks are Postgres partitions at their core. We put a lot of automatic management around them - and then build out a ton of other features like Continuous Aggregates, Compression, and data retention.
To answer the specific question, there is not anything specifically available to backup a chunk separately and restore it as a chunk in TimescaleDB. Data retention is straightforward, but to “restore” that data, you would have to export it first from the chunk (using the range_start
and range_end
of the chunk) using something like COPY
to CSV, and then drop_chunk()
.
If you need to “restore” the data, you could COPY it back in and TimescaleDB will create chunks for the data automatically.
That said, there might be other considerations. Can you maybe share a little more about what you’re trying to achieve? Many TimescaleDB users can live for a long time with compressed data and/or downsampling the data long-term through Continuous Aggregates. That data can stick around much longer (pre-aggregated) and then drop the raw data.
We’re storing control system data, basically channel ID, time stamp and value, going back to ~2006. Would be nice to have chunks by for example month. So as the disk fills up, you could then take older chunks offline, because in reality you seldom need that older data. But when you do, would like to quickly restore a chunk from backup and “link” it back into the database. Looks like that’s not directly possible based on chunks. I understand the option to “export” data into some other format, for example CSV. If that’s done for the exact time range of a chunk, can then delete the original chunk. To restore, can then “import” data back from for example CSV. Was just wondering if there’s a more direct way to unlink and later restore chunks directly without export/import that I had missed.
Right. If you need to retain the raw values (not aggregated) and storage is a limitation, you could consider using PostgreSQL tablespaces, creating one on cheaper storage and moving compressed, older chunks into them. Depending on what kind of storage you use, that might achieve something similar.
I realize there are a lot of other things to consider (full backups would still get the entire table for example) - but it’s an option.
If you could “backup” the chunk to something like S3 or a Parquet file that still was “attached” in some way as part of the table (like through a foreign data wrapper), would that be interesting here? Mostly just curious as we’ve talked about exploring opportunities like that through the extension somehow.
I came here because I watched your video on using a stored proc to compress older chunks rather than a compression policy. In other timescale documentation, we are warned to back up our data prior to compression so I thought if you were suggesting a way to compress each chunk individually then maybe you also had a suggestion for backing up each chunk as part of that stored proc.
I think what I’m hearing here is that continuous compression of data via stored procedure is not a practice you’re suggesting for live production data, but for develper / QA environments under load?