Nightmares of Time Zone Downsampling: Why I’m Excited About the New time_bucket Capabilities in TimescaleDB
Time-series data is generally collected at a much higher rate than it will be displayed. That’s why we fall back on downsampling or rolling up the data into lower granularities (or time buckets), a technique that helps you manage the endless flows of time-series data you’re collecting. However, when resorting to downsampling data collected at high frequency, there is one important element to keep in mind: the user’s time zone.
Before I joined Timescale, I had the same issue in my own startup, as most customers were located in Central Europe (Europe/Berlin, to be precise) and not in the UTC+0 time zone. I can imagine that a lot of you are struggling with this too. If your project requires exact daily overviews and you need to define what midnight means in your customer’s time zone, I know you feel this pain!
Trust me, we are not alone. The support for time zones in time_bucket
was one of the most requested (and upvoted) features in the TimescaleDB GitHub repo and in polls conducted in Timescale Community Slack. But now, the days of hoop-jumping to fix this issue are over: in the last few months—and with the help of its community—Timescale re-implemented the time_bucket
functionality with full time zone support, which is now merged into the latest TimescaleDB 2.8 release.
Why is this such a big deal? Because it will save you time, no matter what time zone you or your customers are in.
Time Sucks, time_bucket, and Time Zones
As I shared earlier, while at my own startup, we had to force TimescaleDB to make time_bucket
“support” time zones. It was way too ugly, though.
The gist is to force the input timestamptz
to be converted into timestamp
without time zone information (but adjusting it accordingly to the requested time zone). Then you need to pass it to the time_bucket
function and have it bucketed. The resulting timestamptz
needs to be ripped off of its time zone information again and finally re-interpreted as a timestamp inside the requested time zone. Sounds complicated?
In terms of code, it looks like this:
time_bucket('86400s', TIMEZONE('Europe/Berlin', created)::timestamptz)::TIMESTAMP WITHOUT TIME ZONE AT TIME ZONE 'Europe/Berlin' AS bucket
Anyway, this workaround is now a thing of the past! The new TimescaleDB 2.8 release ships with a revamped version of time_bucket
, which offers full support for time zones. The new implementation was developed with the help of the community over the last couple of months (huge shout out to all of you who contributed!) and lived in Timescale’s experimental namespace as time_bucket_ng
. That means that many of you are probably familiar with it already.
The change in TimescaleDB 2.8 is that we graduated the new implementation out of the timescale_experimental
namespace and replaced the existing time_bucket
altogether with the implementation of time_bucket_ng
.
That said, we are happy to announce that we “fixed” the most requested feature to date.
New Functionality and Migration
While the new functionality is rather simple, its impact is immense and simplifies analytical requests requiring time zone information.
As I mentioned before, most of your customers are probably not located in a UTC time zone since there aren’t a lot of countries in that time zone anyway, or customers live in one of the many countries or economic areas spread across multiple time zones. That means you have to ensure that a user’s time zone is taken into account when rolling up over day, week, month, or year boundaries.
From a migration standpoint, nothing will change for users not using timezone
. The new implementation of time_bucket
behaves like the old version, as long as you do not provide a timezone
parameter. That means the default behavior is still bucketing on UTC boundaries, and the origin
parameter is unchanged.
When provided with a timezone
parameter, though, time_bucket
will adjust the origin
to the given time zone. That means that day boundaries, as well as months and years, are adjusted too. With that change, providing a time zone is as simple as the following query:
SELECT
time_bucket('1 day', created, 'Europe/Berlin') AS bucket,
avg(value)
FROM metrics
GROUP BY 1
The result is time buckets adjusted to midnight in the Central European Time area (represented by the Europe/Berlin time zone identifier), including changes due to summer or winter (CET/CEST).
When talking about the migration of existing code, all your current queries will work just as they did before the change. All parameter combinations of time_bucket
are available as implementations under the covers, and only extended versions of the method signatures are added to bring time zone support.
With time_bucket
graduating from the timescale_experimental
namespace, the first major step is done. The time_bucket_gapfill_ng
function is still experimental, and we will be adding time zone support to time_bucket_gapfill
in the very near future.
Getting Started With TimescaleDB 2.8
Timescale is happy to release the new version of time_bucket
after months of experimental status, and we want to thank everyone in the community for their help and effort in testing this feature, making it production-ready, and giving feedback!
Looking back, this functionality would’ve made things much simpler and more accurate in my startup (we never handled stuff like leap seconds), but you can and should use the new functionality immediately!
There is, however, more to the release of TimescaleDB 2.8 than just a new time_bucket
. Other features include experimental support for the new policy management for continuous aggregates, and speed improvements for distributed hypertables using the COPY
protocol for SELECTs
. For more details on these changes, see What’s New in TimescaleDB 2.8 (for a complete list of all changes, see the Release Notes.)
If you are using Timescale, upgrades are automatic, and you’ll be upgraded to TimescaleDB 2.8 in your next maintenance window.
New to Timescale? Start a free 30-day trial, no credit card required, and get your new database journey started in five minutes.
If you’re self-hosting TimescaleDB, follow the upgrade instructions in our documentation.