Nov 25, 2024
At Timescale, we have a track record of constant innovation, moving quickly to solve painful problems that developers who work with time-series data experience (For example, we recently shipped 12 launches in a single month!)
But on the other hand, we also value stability—and so do our users. We cannot change our API every time we release a new database version, nor can we expect TimescaleDB users to constantly make changes to their production code in order to keep their database working or risk breakages if they don’t. The potential damage caused by “moving fast and breaking things” could alienate the very users who rely on our product: developers who expect stability and reliability from their database.
At Timescale, we build critical infrastructure components, and our users expect and rely on stability (see our aptly titled “When boring is awesome” launch announcement). As the makers of TimescaleDB, the leading relational database for time-series data, we recognize that nearly three million active TimescaleDB databases are running mission-critical time-series workloads across industries.
Our prime directives are to safeguard data and ensure our customers’ applications do not need to be rewritten with every release and to continuously build features big and small to make developers’ lives easier.
We made the decision early in the design of TimescaleDB to build on top of PostgreSQL. We believed then, as we do now, that building on the world’s fastest-growing database would have numerous benefits for our customers.
Among these benefits is the rock-solid dependability developers have come to expect from a database with 20+ years of production usage behind it. Building on top of PostgreSQL enables us to innovate rapidly with solutions for ingesting and analyzing time-series data while also leveraging (and contributing to) a database with a reputation for stability.
Striking a balance between innovation and stability is not a problem unique to TimescaleDB, but it’s perhaps the most important question in our software development process. We strive to achieve the dependability developers expect from PostgreSQL, yet also aim to deliver rapid innovation and progress for customers building solutions in an industry that constantly changes.
So, how do we resolve the tension between innovation and stability so that we can continue to ship new functionality quickly while retaining the trust we’ve earned with our customers?
Today, to reinforce our commitment to moving fast and not breaking things, we are introducing a new experimental schema for TimescaleDB as part of our release of TimescaleDB 2.4.
The experimental schema is where we aim to develop, and ship features at an even faster pace than we normally do. The general objective is that these new functions will “graduate” out of the experimental schema when they reach full maturity for normal production usage.
TimescaleDB 2.4 introduces the following experimental functionality in the new experimental schema:
time_bucket
function, with support for bucketing using months and years and support for time-zone use in continuous aggregates. You asked, and we delivered.Read on for more about how we introduced experimental functionality in TimescaleDB, details about new experimental functionality, and examples of when and why to use them.
TimescaleDB 2.4 is available today for Timescale and self-managed users and will be available in the coming weeks for Managed Service for TimescaleDB users.
If you’re new to TimescaleDB and want to try out our new experimental functionality right away, create a free account to get started (30-day trial 🔥).
Join our Slack community to share your results, ask questions, get advice, and connect with 7K+ other developers (our co-founders, engineers, and passionate community members are active on all channels).
Visit our GitHub to learn more (and, as always, ⭐️ are appreciated!). And, if these are the types of challenges you’d like to help solve, we are hiring!
Shoutout to all the engineers who worked on these features: Mats Kindahl, Erik Nordström, Nikhil Sontakke, Aleksander Alekseev, Dmitry Simonenko, and the entire team of reviewers and testers!
We’d like to give a special shoutout to all community members who’ve asked for improvements to the time_bucket
function and participated in our discussions on GitHub, both of which informed our choice of experimental functionalities introduced today.
Experimental schemas enable us to validate new functionality, get feedback from real users, and enable developers to test the latest and greatest of what TimescaleDB has to offer.
Moreover, the schema’s explicit experimental name and the “experimental” component of API functions act as a “buyer beware” label and set expectations, telling users to anticipate breaking changes and to not use these features in production until they’ve “graduated” and are stable. This frees our team to make changes to existing features without breaking compatibility with functionality that users rely on and mature features through iterative improvements until they are ready for wider consumption.
The experimental schema is an important milestone for our users and us, demonstrating Timescale’s commitment to innovation, not just in the features and products we release but in the process of how we actually build and deliver our products to delight developers. It also gives us a sandbox where users can test the latest features and more actively shape the development of TimescaleDB.
We first released experimental functionality in TimescaleDB hyperfunctions, a series of SQL functions within TimescaleDB that make it easier to manipulate and analyze time-series data in PostgreSQL with fewer lines of code.
In addition to production-ready hyperfunctions for calculating time-weighted averages and percentile approximations, we released several hyperfunctions in public preview for developers to trial through our hyperfunctions experimental schema and offer feedback to improve the features so that they may be released for production usage soon. These experimental hyperfunctions included functions for downsampling, smoothing, approximate count-distinct, working with counters, and working with more advanced forms of averaging.
As noted previously, functionality under the experimental schema is experimental in nature and not recommended for production use at this time. You should expect that several of these functions may change their APIs, change their names, change views, or disappear. Additionally, when experimental features “graduate” from the experimental schema, they’re not guaranteed to allow moment-in-time upgrades.
Here’s how features graduate from the experimental schema to the main schema (and are thus deemed ready for production use):
Ultimately, customer feedback drives the pace at which features graduate through each step and into production.
The first new experimental feature we are releasing today is one of our most requested features and tops many of our community members' wishlists: an improved version of time_bucket
, with support for month, year, and time zones. Thanks to Jean-Francois Labbé for getting the ball rolling 3 years ago:
For those new to TimescaleDB, time_bucket
is among the most popular TimescaleDB hyperfunctions and is used for bucketing and analyzing data for arbitrary time intervals in SQL. For readers familiar with PostgreSQL, you can think of time_bucket
as a more powerful version of the PostgreSQL date_trunc
function. time_bucket
allows for arbitrary time intervals, rather than the standard day, minute, hour provided by date_trunc
, enabling more flexibility in doing analysis using the time periods which most matter to your use case.
For example, you could use time_bucket
to find the average temperature in 15-day buckets for a certain city, using the following simple query:
-- time_bucket
-- Average temp per 15 day period
-- for past 6 months, per city
-----------------------------------
SELECT time_bucket('15 days', time) as "bucket"
,city_name, avg(temp_c)
FROM weather_metrics
WHERE time > now() - (6* INTERVAL '1 month')
GROUP BY bucket, city_name
ORDER BY bucket DESC;
We love hearing from our customers and community members. Our entire team regularly reads and answers questions in our public TimescaleDB Slack community. Our Eon Bot (named after our mascot, Eon) helps us identify long-requested functionality, inviting members of our community to vote on feature requests.
We also opened up our GitHub discussions as a community exercise on what we should scope related to the time_bucket
feature. The most voted solution was:
I would like to specify a timezone like "UTC+3" for a given continuous aggregate so that any time from a source table will be converted to the given timezone and end up in a right bucket (in terms of when the next day starts, etc). Multiple continuous aggregates can be created, each with its own time zone, e.g “UTC” one and “UTC+3” one. I also would like to use timezones like "Europe/Berlin". Unlike simple “UTC+3”, such timezones account for daylight saving time and will automatically change time. This means that there can be days with 23 or 25 hours, which is not the case when specifying a specific time zone like CEST and CET.
As you can see, the most common requests we received are time_bucket
related, specifically: support for months/days/years in time_bucket
, support for timezones in time_bucket
, and being able to use both in continuous aggregates. The original issues are as old as three years old, with more than 65 upvotes on the original comment. Although providing the requested functionality may sound simple, it’s not trivial to implement it.
There are several reasons for this, but the most important one is that the time_bucket
function and continuous aggregates were originally implemented for buckets that are fixed in size, like 10 seconds, 3 hours, or 5 days. However, months and years are variable in size: months range between 28 and 31 days, years range between 365 and 366 days. Breaking this constraint requires many changes throughout a large part of the TimescaleDB code base, and you have to keep backward compatibility and performance in mind. All in all, this is doable, but it takes time.
Time zone support is also a case of variable-sized buckets: If there is a daylight saving time change in a given timezone, it means there are days that have between 23 and 25 hours.
We had to re-engineer time_bucket
for these kinds of variable-sized buckets.
Enter time_bucket_ng
(Next Generation) with implemented support for: Years, Months, Weeks, Days, Hours, Minutes, and Seconds. time_bucket_ng
also includes support for use with continuous aggregates. The new support for years and months enables a host of capabilities for analysis that looks over longer time periods (e.g., business dashboards showing month-over-month growth).
To make this a bit more concrete, we’ve included a few examples of time_bucket_ng
in action.
Here’s how to use time_bucket_ng
to create bucket data in 3-month intervals:
-- for our readers from North America: the date is in YYYY-MM-DD format
SELECT timescaledb_experimental.time_bucket_ng('3 month', date '2021-08-05');
time_bucket_ng
----------------
2021-07-01
(1 row)
Here’s how to use time_bucket_ng
to bucket data in 1-year intervals:
SELECT timescaledb_experimental.time_bucket_ng('1 year', date '2021-08-05');
time_bucket_ng
----------------
2021-01-01
(1 row)
Here’s how to use time_bucket_ng
when creating continuous aggregates. In this case, we track the temperature in Moscow over 7-day intervals:
CREATE TABLE conditions(
day DATE NOT NULL,
city text NOT NULL,
temperature INT NOT NULL);
SELECT create_hypertable(
'conditions', 'day',
chunk_time_interval => INTERVAL '1 day'
);
INSERT INTO conditions (day, city, temperature) VALUES
('2021-06-14', 'Moscow', 26),
('2021-06-15', 'Moscow', 22),
('2021-06-16', 'Moscow', 24),
('2021-06-17', 'Moscow', 24),
('2021-06-18', 'Moscow', 27),
('2021-06-19', 'Moscow', 28),
('2021-06-20', 'Moscow', 30),
('2021-06-21', 'Moscow', 31),
('2021-06-22', 'Moscow', 34),
('2021-06-23', 'Moscow', 34),
('2021-06-24', 'Moscow', 34),
('2021-06-25', 'Moscow', 32),
('2021-06-26', 'Moscow', 32),
('2021-06-27', 'Moscow', 31);
CREATE MATERIALIZED VIEW conditions_summary_weekly
WITH (timescaledb.continuous) AS
SELECT city,
timescaledb_experimental.time_bucket_ng('7 days', day) AS bucket,
MIN(temperature),
MAX(temperature)
FROM conditions
GROUP BY city, bucket;
SELECT to_char(bucket, 'YYYY-MM-DD'), city, min, max
FROM conditions_summary_weekly
ORDER BY bucket;
to_char | city | min | max
------------+--------+-----+-----
2021-06-12 | Moscow | 22 | 27
2021-06-19 | Moscow | 28 | 34
2021-06-26 | Moscow | 31 | 32
(3 rows)
(See our continuous aggregates docs for more information about continuous aggregates, and check out our getting started guide for tips on how to use them and other TimescaleDB advanced features.)
time_bucket_ng
demonstrates the power of using the experimental schema to solve long-standing community requests without introducing breaking changes. It also gives community members the opportunity to give feedback about the new functionality so we make sure our implementation solves real users’ problems. (Please let us know the results of your testing and new functionality you’d like on our GitHub issues page.)
In the next iterations of time_bucket_ng
, we will release support for one month in continuous aggregates; N-months and N-Years with continuous aggregates; and time zones.
The following chart indicates the differences in time_bucket
functionality:
We invite – and encourage – all community members to try the experimental schema in a test-safe environment and to tell us about your experience.
We want to hear it all: let us know about the glitches you encountered, anything that surprised and delighted you, and everything in between.
To share feedback, create a GitHub issue (using the experimental-schema label), describe what you found, and tell us the steps—or share a code snippet—to recreate it.
We’re committed to developing these and future experimental features—and to giving the community a chance to provide early feedback and influence the direction of TimescaleDB’s development. We’ll travel faster with your input, and we truly appreciate your contribution(s) (thank you in advance 🙌!).
If you’re new to Timescale and want to get started with our new experimental functionality today, you can self-host to get started. If you want to try our engineered PostgreSQL cloud platform, try our 30-day free trial (no credit card required).
If you are an existing user:
As a reminder, you can join our Slack community to share your results, ask questions, get advice, and connect with other developers (our co-founders, engineers, and passionate community members are active on all channels).
You can also visit our GitHub to learn more (and, as always, ⭐️ are appreciated!) And, if these are the types of challenges you’d like to help solve, we are hiring!