Category: All posts
Aug 23, 2023
Posted by
Ajay Kulkarni
(Update: Follow the discussion on this Hacker News thread.)
Today we are announcing the beta release of TimescaleDB, a new open-source time-series database optimized for fast ingest and complex queries, now available on GitHub under the Apache 2 license.
TimescaleDB is engineered up from PostgreSQL (packaged as an extension) and yet scales out horizontally, which means it supports normal SQL and all of the features you expect from a relational database: JOINs, secondary indexes, complex predicates and aggregates, window functions, CTEs, etc.
Key benefits:
(For more technical details, please refer to our documentation.)
In this age of new and shiny open source data projects, something as old as PostgreSQL can seem boring. But sometimes boring is awesome, especially when it’s your database. TimescaleDB is designed to just work, not wake you up at 3am.
If you have any kind of time-series data, and you like SQL/PostgreSQL, then please give TimescaleDB a whirl and let us know how it goes. We appreciate any feedback (and we’re pretty friendly folks).
You can install TimescaleDB via Homebrew, Docker, or from source. More information on GitHub.
But isn’t there already a glut of time-series databases? Did we really have to build yet another one?
Yes.
(Read on, padawan…)
Seems like time-series databases (i.e., databases optimized for data captured over time, for example, sensor data, financial data, DevOps data, etc.) are in vogue these days.
There have been a number of blog posts on the subject over the past few years (including these gems by Baron Schwartz (2014) and Jason Moiron (2015), and a plethora of new open-source time-series databases, each with their own trade-offs.
We can’t read minds, but we imagine that the developers behind each of those projects built their own time-series database because traditional RDBMS (e.g., PostgreSQL, MySQL) didn’t scale for their needs.
That was the same problem we faced a year ago, when we needed a database to store sensor data for the IoT platform we were building at the time. We loved PostgreSQL, but “knew” that it inherently wouldn’t scale for our needs.
We tested some of the options in the list above, and saw that they all scaled pretty well, but sacrificed query power in exchange, and failed to support a number of key SQL capabilities.
So we had a choice: scalability or query power. That made us sad. We needed both.
In particular, we needed:
When we looked at this list, we realized that we needed something like PostgreSQL. In fact, if we could only solve the PostgreSQL scalability problem, we’d have the perfect time-series database: scalable, easy to use, and reliable. (PostgreSQL even has support for geospatial data types and queries, via PostGIS.)
Could this be possible? Being a group of computer science PhDs and academics (including one tenured Professor), we decided to find out for ourselves, and determined that the nature of time-series workloads lend themselves to a new database architecture that could offer both scale and SQL.
And then we built it.
And then we benchmarked it, and found that our database outperformed PostgreSQL by more than 15x on inserts on large datasets. In particular, we found that as the dataset size grows, the insert rate for PostgreSQL drops off dramatically, while our insert rate remains high:
Note:
Finally, scale and SQL. This made us happy.
(So why build yet another time-series database? Because we had to.)
Get ready for the world’s most boring database demo, because TimescaleDB’s query language is just normal SQL:
This obviously is just a sample. For the full documentation on what kinds of queries we support, please refer here.
“So what’s the big deal? This looks just like normal PostgreSQL…”
There are actually 5 key things happening behind the scenes:
For example, each of the queries above is running against a hypertable, allowing the database to hide the complexity of the system from the user.
But this just scratches the surface. For more on our technical architecture, take a look at our documentation.
“Fine”, you might say, “you guys built something that only works for your weirdo IoT backend.”
That’s what we thought too. But as we made the rounds talking about our IoT platform, people would respond: “We’re building our own IoT platform, so we can’t use yours. But, tell us more about this time-series database you built?”
Then, they’d add, “You know, forget IoT, we have a lot of time-series data in general. Could your database help there too?”
And strangely, we heard the same thing from our friends in other industries: they had a growing amount of time-series data and needed something better than existing databases.
Eureka. Our time-series database was solving a bigger problem.
We realized that time-series data, which used to be this niche thing within finance and DevOps, was sprouting up everywhere. We realized that fundamental shifts in computing — more sources of data, fatter pipes, cheaper storage — were creating new currents of time-series data streams. And that analyzing these new datasets across time was powerful, enabling us to monitor the present, understand historical trends, troubleshoot the past, predict the future.
We also noticed that even traditional time-series data applications were becoming more complex: e.g., in DevOps, needing to correlate application performance across microservices; in finance, needing to monitor payment transactions and other customer interaction data in real-time.
There was also another trend at work: the resurgence of SQL. Recent posts like these from Percona (March 27, 2017), Baron Schwartz (March 19, 2017), and Paris Kasidiaris (March 13, 2017) capture the sentiment well. The pendulum is swinging back towards boring SQL. In fact, “NoSQL” databases seem to be rebranding themselves to mean “not only SQL”, rather than outright rejecting SQL.
We realized that our database, which sat at the intersection of the “rise of time-series data” and the “resurgence of SQL”, might actually be useful to other people.
That’s why we decided last fall to change directions and go all in on the database. After several months of heads-down work, we just open sourced it last month under the Apache 2 license.
That said, TimescaleDB can’t solve everyone’s problems. In particular, there are 3 time-series scenarios where there may be better alternatives:
TimescaleDB is the first open source time-series database that offers normal SQL at scale. It acts like a relational database yet scales linearly for time-series data.
TimescaleDB is in active development by a team of PhDs based in New York City, Stockholm, and Los Angeles. A single-node version is currently available for download. A clustered version is in the works.
We scratched our own itch, and hope it now scratches yours. If it does, or you think it might and want to learn more, we’d love to hear from you at [email protected].
For more: