Oct 21, 2024
Posted by
Attila Toth
People create, capture, and consume more data than ever before. A large portion of this data volume is timestamped, a.k.a. time-series data.
So it’s not surprising that the time-series database (TSDB) category has seen a lot of growth in popularity in the past five years.
In this post, we’ll introduce these databases and explain how they can work for you.
A time-series database is a type of database specifically designed for handling time-stamped or time-series data. Time-series data are simply measurements or events that are tracked, monitored, downsampled, and aggregated over time.
Time-series data includes server metrics, application performance monitoring data, network data, sensor data, and more.
Since time-series data is time-centric, recent, and normally append-only, a time-series database (TSDB) leverages these foundational characteristics to store time-series data more simply and efficiently than general databases.
Time-series databases are optimized to handle data that arrives in time order. They are particularly designed to store, query, and analyze data points with a timestamp associated with them.
In traditional relational databases, time is often treated as just another column, but in a time-series database, time is fundamental. Every piece of data is associated with a timestamp, which enables the database to handle the special characteristics of time-series data—such as:
You might ask: Why can’t I just use a “normal” (i.e., non-time-series) database?
Well, you can, and some people do. However, once tables start getting big, relational databases like PostgreSQL get too slow for time-series data.
Time-series data accumulates very quickly, and relational databases are not designed to handle that scale (at least not in an automated way). Traditionally, relational databases fare poorly with vast datasets, and NoSQL databases are hailed as the best performers at scale.
But the truth is that, with some engineering fine-tuning, PostgreSQL can be turned into a time-series database—actually performing much better than other specialized solutions (as we've shown in benchmarks versus InfluxDB and MongoDB).
TSDBs—whether relational or NoSQL-based—introduce efficiencies that are only possible when the time element is treated as a first-class citizen.
These efficiencies allow them to offer massive scale, from performance improvements, including higher ingest rates and faster queries at scale (although some support more queries than others) to better data compression.
Even if you don't often refer to your data as time-series data, they might be. A few examples of specific time-series data types:
If you want to know if your data is time-series data, this is the litmus test: does your data have some kind of timestamp or time element related to it, even if that may not be its main dimension? If the answer is “yes,” you’re dealing with time-series data.
Let’s say you’re monitoring servers for CPU and memory usage. Here’s how you might create a hypertable for storing these metrics in TimescaleDB, insert data, and run a query to analyze recent trends.
CREATE TABLE server_metrics (
time TIMESTAMPTZ NOT NULL,
server_id TEXT NOT NULL,
cpu_usage DOUBLE PRECISION NOT NULL,
memory_usage DOUBLE PRECISION NOT NULL
);
-- Convert the table into a hypertable
SELECT create_hypertable('server_metrics', 'time');
INSERT INTO server_metrics (time, server_id, cpu_usage, memory_usage)
VALUES
(NOW() - INTERVAL '5 minutes', 'server1', 0.85, 0.75),
(NOW() - INTERVAL '4 minutes', 'server2', 0.70, 0.65),
(NOW() - INTERVAL '3 minutes', 'server1', 0.90, 0.78),
(NOW() - INTERVAL '2 minutes', 'server3', 0.95, 0.80),
(NOW() - INTERVAL '1 minute', 'server2', 0.72, 0.68);
SELECT time, server_id, cpu_usage
FROM server_metrics
WHERE time > NOW() - INTERVAL '10 minutes'
ORDER BY time DESC;
This query quickly retrieves the most recent CPU usage trends, taking advantage of TimescaleDB’s time-range optimization.
SELECT server_id,
time_bucket('1 minute', time) AS bucket,
AVG(cpu_usage) AS avg_cpu_usage
FROM server_metrics
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY server_id, bucket
ORDER BY bucket DESC;
This allows you to bucket and analyze CPU usage trends across servers over the past hour, giving you valuable insights into performance.
For financial use cases, time-series databases are vital for tracking stock prices or currency rates with millisecond precision. Here's how you might use TimescaleDB to store and analyze stock prices:
CREATE TABLE stock_prices (
time TIMESTAMPTZ NOT NULL,
stock_id TEXT NOT NULL,
price NUMERIC(10, 2) NOT NULL
);
-- Convert the table into a hypertable
SELECT create_hypertable('stock_prices', 'time');
INSERT INTO stock_prices (time, stock_id, price)
VALUES
(NOW() - INTERVAL '5 minutes', 'AAPL', 172.50),
(NOW() - INTERVAL '4 minutes', 'GOOGL', 2850.30),
(NOW() - INTERVAL '3 minutes', 'AAPL', 173.00),
(NOW() - INTERVAL '2 minutes', 'GOOGL', 2848.90),
(NOW() - INTERVAL '1 minute', 'AAPL', 173.25);
SELECT time, stock_id, price
FROM stock_prices
WHERE time > NOW() - INTERVAL '10 minutes'
ORDER BY time DESC;
SELECT stock_id,
time_bucket('1 minute', time) AS bucket,
AVG(price) AS avg_price
FROM stock_prices
WHERE time > NOW() - INTERVAL '1 day'
GROUP BY stock_id, bucket
ORDER BY bucket DESC;
This query aggregates the average stock price in one-minute intervals over the past day, providing valuable insights for financial analysis.
There are several reasons for the growth of time-series databases:
A time-series database helps you worry less about your database infrastructure, so you can spend more time building your application or gaining insight from the data.
You can make it simpler or faster (or both) to ingest and query data by optimizing the database and implementing shortcuts. Here are some examples.
We typically generate time-series data at a high granularity. Going back to the previous finance example, there might be hundreds of data points generated every second or even more (depending on how many symbols you monitor). High volumes of data can be challenging to ingest efficiently. Time-series databases are equipped with internal optimizations like auto-partitioning and indexing, allowing you to scale up your ingestion rate.
When you query time-series data, you are often interested in how the data have been changing in the past five minutes, five days, five weeks, etc. You are also more likely to want to perform time-based calculations like time-based aggregations (time bucketing), moving averages, time-weighted averages, percentiles, and so on. A time-series database has specialized features to simplify and speed up the calculation of typical time-series queries.
time | symbol | price |
2023-02-06 15:05:26 | PFE | 44.19 |
2023-02-06 15:05:25 | WMT | 141.27 |
2023-02-06 15:05:24 | KO | 59.67 |
2023-02-06 15:05:24 | TSLA | 194.08 |
2023-02-06 15:05:24 | SNAP | 10.88 |
The value of time-series data comes from the fact that you can compare the most current state of your system with past situations. A time-series database has the tools and scale needed to store both historical (archives) and real-time data in one data store, making it easy for you to keep an eye on what’s happening right now while also having the ability to learn from the past. This gives you the power to simplify your data infrastructure and seamlessly analyze all your data in one place.
Time-series databases can also automate your time-based data management tasks. For example, you might want to get rid of all data that is older than one year to save disk space and because you don’t need old data anymore. Or you still need to keep old data around, but maybe it’s not as handy as more recent data, you can set up your database to automatically compress old data to save on storage costs.
There might be other valuable automations available. For example, in TimescaleDB you can use continuous aggregates to incrementally add data to a predefined materialized view—improving query performance and developer productivity.
According to DB-Engines, here are some of the top time-series databases:
The following table compares the main features of some of these databases. For a more detailed explanation and comparison of each of these TSDBs, be sure to check our article on the best time-series databases.
Database Model | Scalability | Deployment | Query Language | Pricing Models | |
Relational database | Vertically scalable, with automatic partitioning, columnar compression, optimized queries, and automatically updated materialized views | Self-managed or managed cloud service | SQL | TimescaleDB open-source can be self-hosted and is free Timescale for AWS follows a pay-as-you-go model | |
Custom, non-relational NoSQL, columnar database | Horizontally scalable | Self-managed or managed cloud service | SQL, InfluxQL, Flux | InfluxDB Open Source is free to use, and InfluxDB Cloud Serverless is a managed service with a pay-as-you-go pricing model. InfluxDB Cloud Dedicated and InfluxDB Clustered are for high-volume production workloads with costs depending on storage, CPU, and RAM | |
Pull-based model that scrapes metrics from targets | Scalable vertically or through federation | Deployed as single binary on server or on container platforms such as Kubernetes | PromQL | Open-source: no associated license cost | |
Columnar database with a custom data model | Horizontally scalable with multi-node support and multi-threading | On-premises, in the cloud, or hybrid | Q language | Free 32-bit version for non-commercial purposes For commercial deployments, pricing depends on the deployment model & number of cores | |
Whisper (file-based time series) database format | Horizontally scalable, supports replication and clustering | On-premises or in the cloud | GQL | Open-source: no associated licensing cost | |
Columnar database | Horizontally scalable | On-premises or in the cloud Also available as a managed service | SQL-based declarative query language, mostly identical to ANSI SQL standard | ClickHouse is open-source and doesn’t have an associated license cost ClickHouse Cloud follows a pay-as-you-go pricing model | |
No-SQL database with a JSON-like document model | Horizontally scalable - supports automatic load balancing, data sharding, and replication | Self-managed or managed cloud service | MQL (MongoDB Query Language) | MongoDB Community Edition is open-source and free to use MongoDB Enterprise: pricing depends on the features you choose MongoDB Atlas has a pay-as-you-go pricing model |
Once your applications start storing time-series data, you still have to pick a TSDB that best fits your data model, write/read pattern, and developer skill sets.
When evaluating database options, consider these factors:
Timescale has remained focused on product stability from day one—whether you're dealing with time series or vector data. PostgreSQL is and will remain the rock-solid foundation upon which we are building a solid cloud platform for your evolving data needs.
Here are some of TimescaleDB's superpowers:
If you're running your PostgreSQL database on your own hardware, you can simply add the TimescaleDB extension. If you prefer to try our mature PostgreSQL cloud in AWS, create a free Timescale Cloud account today. It only takes a couple of seconds, no credit card required.
If you are interested in how other developers are using TimescaleDB for time series and analytics, check out our developer stories:
If you want to keep reading about TimescaleDB, these articles will help you learn more: