Nov 28, 2024
Posted by
Diana Hsieh
Time-series analytics refers to the process of analyzing and extracting insights from data that is organized and collected over time, typically at regular intervals—or, in other words, time-series data.
If you are working with time-series data, you need a way to be able to easily manipulate, query, and visualize that data to perform time-series analytics. What you may or may not already know is that TimescaleDB provides a number of time-oriented functions that aren't found in traditional relational databases.
These functions are meant to provide two key benefits: improved ease of use for time-series analytics and improved performance. The functions live alongside full SQL and can be viewed as extensions of the SQL language. Since many developers today already know SQL, the learning curve is greatly reduced.
Two critical TimescaleDB time-series functions are: time_bucket()
and time_bucket_gapfill()
. Time_bucket() is used for aggregating arbitrarily-sized time periods, and gapfill() is important when your time buckets have missing data or gaps, which is a very common occurrence when capturing thousands of time-series readings per second. Together, both of these are essential for analyzing and visualizing time-series data.
In this blog, we’ll discuss both of these capabilities and show them in action using Grafana.
time_bucket()
Essentially, time_bucket()
is a more powerful version of the standard PostgreSQL date_trunc()
function. date_trunc
“truncates” a TIMESTAMP or an INTERVAL value based on a specified date part (e.g., hour, week, or month) and returns the truncated timestamp or interval.
For example, date_trunc
can aggregate by one second, one hour, one day, or one week. However, users often want to see aggregates by five minutes or four hours, etc. This can get pretty complicated in SQL, but time_bucket()
makes it easy.
Time bucketing allows for arbitrary time intervals (e.g., five minutes, six hours, etc.), as well as flexible groupings and offsets, instead of just second, minute, hour, and so on.
In addition to allowing more flexible time-series queries, time_bucket()
also allows you to write these queries in a simpler way. In fact, it can infer the time range directly from the WHERE clause, which greatly simplifies the query syntax.
Here it is in action:
SELECT time_bucket('5 minutes', time) AS five_min, avg(cpu)
FROM metrics
GROUP BY five_min
ORDER BY five_min DESC LIMIT 12;
By the way, this time-series analytics feature has actually been a core function of TimescaleDB since its first release in April 2017, and our users love it, particularly for time-series analytics!
time_bucket()
for Time-Series AnalyticsAs you can imagine, time bucketing can be helpful for a number of scenarios. When it comes to creating dashboards or visualizations of time-series data, many rely on this function to turn their raw observations into fixed time intervals.
When graphing time-series data using a solution such as Grafana, aggregations can help identify trends over time by grouping raw data into higher-level aggregates. For example, you might want to average monthly raw data daily to achieve a smoother trend line or count the number of occurrences of non-numeric data.
One issue users often encounter when working with time-series data is recording measurements at irregular or mismatched intervals. This creates a time bucket interval with missing data or gaps.
Fortunately, TimescaleDB has a function called time_bucket_gapfill()
that allows you to aggregate your data into continuous time intervals. You can choose two different ways to fill in the gaps: locf()
which carries the last known value in the time range forward or interpolate()
which does a linear interpolation between gaps.
Time_bucket_gapfill
was introduced in TimescaleDB 1.2 as a Community feature. To learn more about gapfill, check out this blog post.
Now that you know what time_bucket()
and time_bucket_gapfill()
do, it’s time to get hands-on with a sample application for time-series analytics!
We’ve built a simple Python application that pulls data from the Open AQ Platform API (an open API air quality data). Essentially, the application reads the API to get air quality values for all cities in Great Britain over a period of time, parses the results, and stores all measurements collected from air quality sensors in TimescaleDB.
All the code for this application can be found on GitHub https://github.com/timescale/examples/tree/master/air-quality.
We also connected a Grafana instance to the TimescaleDB database that stores these results and wrote a step-by-step tutorial on how to use time_bucket()
and time_bucket_gapfill()
to visualize the data.
You can build your own dashboard using Timescale (which runs both TimescaleDB and Grafana)!
The time_bucket()
function is just one example of how TimescaleDB offers optimized SQL functions for working with time-series data and time-series analytics. To learn more about other TimescaleDB functions, check out our docs.
If you’re ready to get started, you can download TimescaleDB or sign up for Timescale to quickly get Timescale and Grafana instances up and running (free 30-day trial, no credit card required).
Like this post and are interested in learning more? Sign up for our mailing list below (right-hand side of the screen), or follow us on Twitter.