Chunk skipping metadata in _timescaledb_catalog.chunk_column_stats for chunks with NULL values

kunjmehta · November 21, 2024, 4:21pm

Hi,

I am experimenting with a use case where the hypertable H I have can have a new integer column C added at some point in time with only the newer data having values for C. This will mean that the older rows in H will have a NULL value in C.

Now, if I enable chunk skipping on C, and call compress_chunk(show_chunks(H)), I notice that the default value in _timescaledb_catalog.chunk_column_stats for the older chunks (with NULL values) is the full range of the integer variable spanning (-9223372036854775808, 9223372036854775807)

This means the these chunks will always be included in the query and effectively means that there is no benefit of enabling chunk skipping on C

My question - Is this expected behaviour and are improvements in pipeline to the chunk skipping functionality as I feel this is suboptimal when dealing with NULL values?

jonatasdp · November 21, 2024, 5:47pm

That’s a great point @kunjmehta, my understanding is that enable this is useful for sequential values that are useful for a specific period, so you can skip other chunks in case you’re not looking for a specific range of ids.

Let’s invite @sven to join the conversation as he is actively improving this feature.

kunjmehta · December 5, 2024, 6:54pm

Hi @jonatasdp @sven any updates on this please?