The Timescale Toolkit team is proud to announce the 1.8.0 version of the TimescaleDB Toolkit extension! This version is now available on Timescale Cloud, as part of our HA docker image, as a DEB or RPM package, or as source code at our github repository.
New experimental count_min_sketch
for approximate count
One of the big features we’re particularly proud to introduce in the new toolkit version is the count_min_sketch
aggregate and associated approx_count
function. This aggregate, available under the toolkit_experimental
schema for now, uses the Count-min sketch algorithm to apply a series of hash functions to an input column. It can then use the resulting counts to estimate the count for a particular value with much lower risk of being defeated by a hash collision. Here’s an example:
WITH t AS (
SELECT toolkit_experimental.count_min_sketch(symbol, 0.01, 0.01) AS symbol_sketch
FROM stocks_real_time
)
SELECT toolkit_experimental.approx_count('AAPL', symbol_sketch)
FROM t;
New interpolating accessor functions to work with PostgreSQL windows functions
We’ve also added several interpolating functions that can be used to correctly compute results with PostgreSQL window functions. This problem arises when trying to compute a value over an interval where PostgreSQL has partitioned some of the data we need in another group. Consider a data stream grouped by day and then fed into a time weighted average to get a daily weighted average. Without the last point of the previous day or the first point of the next day, each average is going to fail to correctly account for the interval before its first data point and after its last data point of the day. With the interpolating accessor, we can work around this:
SELECT
time,
toolkit_experimental.interpolated_average(
time_weight,
time,
'1 day',
LAG(time_weight) OVER (ORDER BY time),
LEAD(time_weight) OVER (ORDER BY time)
)
FROM (
SELECT
time_bucket('1 day', time) as time,
time_weight(
'LOCF',
time,
value
)
FROM tbl
GROUP BY time_bucket('1 day', time)
) s
Interpolating accessors have been added for rate
and delta
in the counter_agg
and gauge_agg
aggregates, average
in the time_weight
aggregate, and duration_in
in the state_agg
aggregate.
Experimental saturating math functions
We’ve also introduced a new set of saturating math functions. These allow you to perform integer math but will snap to boundary values rather than overflow.
More convenient approximate count distinct
Further, we’ve also made improvements to our HyperLogLog functionality, improving the documentation and cleaning up some error messages. As a big improvement to usability, we’ve also introduced a new approx_count_distinct
aggregate which will create a HyperLogLog with a default sizing that should work well for most use cases.
1.8.0 also includes numerous fixes and quality of life adjustments for toolkit functionality. The most noteworthy among these are support for PostgreSQL environments with a missing default collation, preliminary support for the aarch64 platform, ability to have explicit NULL values in timevector objects, and upgrading our pgx and rust versions for the extension.
Conclusion
We encourage everyone to try out this new version of the toolkit and let us know what you think! We’d love to hear from you any improvements or adjustments you’d like to make to the experimental functions, or even if you like them the way they are and would like to see them moving out of our experimental schema. You can leave us a response here to this forum post, send us a note in our slack channel, or add an issue to the toolkit repository. Thank you!