In TimescaleDB, one of the primary configuration settings for a Hypertable is the chunk_time_interval
value. A TimescaleDB hypertable is an abstraction that helps maintain PostgreSQL table partitioning based on time and optionally space dimensions. Because time is the primary component of time-series data, chunks (partitions) are created based on the interval of time specified for each hypertable.
TimescaleDB determines which chunk new data should be inserted into based on the time dimension value in each row of data. If it falls outside of the range_start
and range_end
parameters for any existing chunk, a new chunk will be created. It doesn’t matter when the data is inserted (for example in real-time or via a large bulk data COPY). It is the timestamp that determines which chunk the row will be inserted into or if a new chunk needs to be created to store the data.
As noted in the Hypertable Best Practices documentation, PostgreSQL (and therefore TimescaleDB), tend to perform best when the most recent, more queried data is able to reside in the memory cache. General PostgreSQL server tuning guidance suggests that active data reside in about 25% of the configured server memory. Pay special attention to the details that this includes all active tables and hypertables.
With all of that said, here are some additional things to consider as you set (or update) the chunk_time_interval
for a hypertable.
Default setting
By default, a hypertable has a 7-day chunk_time_interval
which may, or may not, be appropriate for your workload. If you have never specifically set this value, your hypertable creates chunks for 7-day intervals.
Changing the chunk time on existing hypertables
As noted in the documentation for set_chunk_time_interval()
, modifying the interval on existing hypertables does not modify existing chunks. Only newly created chunks will have an updated chunk interval.
This also means that chunk intervals are retained in the TimescaleDB configuration catalog unless chunks are dropped. Simply DELETING data from a hypertable will not change the chunk interval of previously created chunks. The chunks must be dropped if you want to reimport data to a hypertable with an updated chunk time. TRUNCATING data from a table will also work. However, all data will be lost for any dropped chunks or truncated tables. Please use with caution.
Alternatively, you could create a new hypertable with the desired (updated) chunk_time_interval
, insert the data into the new hypertable, drop the original, and then rename your new hypertable.
Key Takeaway: Chunk times are stored in the TimescaleDB catalog once a chunk is created. Deleting data will not change a pre-existing chunk interval. Dropping chunks or truncating the hypertable are the only current method for updating existing chunk sizes on an existing hypertable. ALWAYS be cautious and mindful of any actions that delete data!!
Many TimescaleDB features depend on chunks
If you use TimescaleDB compression, continuous aggregates, or data retention, all of these features depend on the range_start
and range_end
of a chunk. Therefore, there is likely a balancing act on the amount of data each chunk contains (time interval) and the number of chunks that will fit into the 25% memory guidance.
For example, if you wanted to perform compression or data retention on data that is older than 1-month, all actions are taken on the entire chunk. If the entire chunks interval (range_start
/range_end
) doesn’t fit within the specified interval (1-month in this example), the chunk will not be compressed or dropped, even if there is data in the chunk that is older than 1 month.
Likewise, continuous aggregates store data in a materialized hypertable. Therefore, all continuous aggregates have a default chunk_time_interval
that is 10 times the chunk_time_interval
of the hypertable they inherit from. If your hypertable has a 7 day chunk_time_interval
, any continuous aggregate that is based on that hypertable will default to a 70 day chunk_time_interval
. If you then set up a data retention policy on the continuous aggregate, realize that all data in the materialized hypertable chunk will have to be older than the interval specified before data will be dropped.
Key Takeaway:
chunk_size_interval
impacts the way other features interact with your hypertables and materialized hypertables (continuous aggregates).
Smaller chunks is not always the right answer
Depending on your use case, smaller chunks may negatively impact your use of TimescaleDB (and PostgreSQL) in a few ways. Consider these as you design your schema and determine the best chunk_time_interval
for your application and data.
Query performance
The more chunks a table has, the more (potential) planning the PostgreSQL query planner has to account for. If you create chunks that are very small (1 hour) and retain a year’s worth of data, the hypertable will have nearly 9,000 chunks. If you filter a query with a table time-based INTERVAL like now() - INTERVAL '1 week'
, the query planner still needs to consider all of the chunks and planning time will be increased. Although TimescaleDB will be able to exclude nearly a year of chunks during execution, planning cannot use STABLE
or VOLATILE
predicates to filter out chunks during the planning stage.
If you believe you really need small chunk intervals, strongly consider finding ways to filter your time-series data with constified parameters, not STABLE
filters like now() - INTERVAL...
Key Takeaway: Lots of smaller chunks impact the PostgreSQL/TimescaleDB planning phase of query execution. Weigh the benefits of smaller chunks (able to compress or drop chunks more quickly) with the number of chunks the planner may have to consider when planning a query.
You may see suboptimal compression
Depending on how you setup TimescaleDB compression (with or without a segmentby
option), smaller chunks may not have very many rows per chunk, per segmentby column. Compression works best when there are at least 100 rows per segment, per chunk and so creating chunks that are too small may mean you see less benefit from compression.
Key Takeaway: If you use compression with smaller chunks, consider how many rows per segment exist per chunk and consider adjusting the
chunk_time_interval
until you generally expect at least 100 rows of data per segment.