Time-Series Data Analysis

PostgreSQL Materialized Views and Where to Find Them

Learn more about Timescale’s alternative to a PostgreSQL materialized view—continuous aggregates.

A time-series database (TSDB) is a computer system designed to store and retrieve data records that are part of a “time series,” a set of data points associated with timestamps. 

The timestamps provide a critical context for each data point in how they relate to others. Time-series data is often a continuous flow of data like sensor measurements and intraday stock prices. Any time-series database will let you store large volumes of timestamped data in a format that allows fast insertion and retrieval to support complex data analysis.

Incremental Materialized Views and Continuous Aggregates

Timescale is one of the prominent players in this world, unleashing the power of PostgreSQL alongside a set of features empowering time-series management.

One of the most important and used features is continuous aggregates (CAGGs).

Continuous aggregates speed up aggregate queries over large volumes. They can also be used for downsampling data and reducing the granularity of your data once it reaches a certain age. This will free some space while still enabling long-term analytics.

You might think of continuous aggregates as a PostgreSQL materialized view on steroids, with a lot of features:

  • Real-time aggregates

  • Compression

  • Data retention

  • Incremental and automatic refresh

Let’s start and see how they work. 

How Incremental Materialized Views Work in Timescale

Suppose we are storing data from sensors measuring temperature and humidity. We have one hypertable (conditions) storing this data and one regular PostgreSQL table (locations) storing sensor metadata.

Conditions:

tsdb=> \d+ conditions                                                   Table "public.conditions" Column    |        Type         | Collation | Nullable | Default | Storage  | Compression | Stats target | Description  -------------+--------------------------+-----------+----------+---------+----------+--- time      | timestamp with time zone |     | not null |       | plain    |             |              | device_id | text           |     |          |   | extended |            |              |  temperature| numeric        |       |        | | main     |             |          |  humidity   | numeric        |     |          | | main     |             |              |  Indexes:   "conditions_device_id_time_idx" btree (device_id, "time" DESC)   "conditions_time_idx" btree ("time" DESC) Triggers:   ts_insert_blocker BEFORE INSERT ON conditions FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker() Child tables: _timescaledb_internal._hyper_1_100_chunk,             _timescaledb_internal._hyper_1_101_chunk,             _timescaledb_internal._hyper_1_102_chunk,             _timescaledb_internal._hyper_1_103_chunk,             .........................................             _timescaledb_internal._hyper_1_97_chunk,             _timescaledb_internal._hyper_1_98_chunk,             _timescaledb_internal._hyper_1_99_chunk Access method: heap

And locations:

tsdb=> \d+ locations                                          Table "public.locations"   Column    | Type | Collation | Nullable | Default | Storage  | Compression | Stats target | Description  -------------+------+-----------+----------+---------+----------+-------------+--------------+------------- device_id   | text |           |          |         | extended |             |              |  location    | text |           |          |         | extended |             |              |  environment | text |           |          |         | extended |             |              |  Access method: heap

The conditions hypertable has 160 million records, and the locations table has two thousand.

Let’s start by calculating the average temperature and humidity aggregating by a 15-minute window:

EXPLAIN ANALYZE SELECT time_bucket(INTERVAL '15 minutes', time) AS bucket, device_id, avg(temperature), avg(humidity)             FROM conditions  WHERE time BETWEEN '2023-02-05' AND '2023-03-05'  GROUP BY bucket, device_id;

QUERY PLAN---------------------------------------------------------------------- Finalize HashAggregate  (cost=3552434.85..3856581.26 rows=3996996 width=91) (actual time=29280.792..41746.314 rows=5336000 loops=1) Group Key: (time_bucket('00:15:00'::interval, _hyper_7_195_chunk."time")), _hyper_7_195_chunk.device_id Planned Partitions: 256  Batches: 1305  Memory Usage: 10937kB  Disk Usage: 1294032kB ->  Gather  (cost=1820403.27..3015026.25 rows=7993992 width=91) (actual time=12912.397..23014.709 rows=6641075 loops=1) Workers Planned: 2 Workers Launched: 2 ->  Partial HashAggregate  (cost=1819403.27..2214627.05 rows=3996996 width=91) (actual time=12905.728..22321.716 rows=2213692 loops=3) Group Key: time_bucket('00:15:00'::interval, _hyper_7_195_chunk."time"), _hyper_7_195_chunk.device_id Planned Partitions: 256  Batches: 257  Memory Usage: 10641kB  Disk Usage: 1031816kB Worker 0:  Batches: 257  Memory Usage: 10641kB  Disk Usage: 1031576kB Worker 1:  Batches: 257  Memory Usage: 10641kB  Disk Usage: 1031496kB ->  Result  (cost=0.43..961974.82 rows=16654149 width=50) (actual time=0.694..6307.600 rows=13333333 loops=3) ->  Parallel Append  (cost=0.43..753797.95 rows=16654149 width=50) (actual time=0.691..4067.675 rows=13333333 loops=3) ->  Parallel Index Scan using _hyper_7_195_chunk_conditions_time_idx on _hyper_7_195_chunk  (cost=0.43..23164.76 rows=382303 width=50) (actual time=0.043..959.387 rows=918000 loops=1) Index Cond: (("time" >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND ("time" <= '2023-03-05 00:00:00+00'::timestamp with time zone)) ->  Parallel Index Scan using _hyper_7_190_chunk_conditions_time_idx on _hyper_7_190_chunk  (cost=0.43..36388.29 rows=601368 width=49) (actual time=2.014..532.092 rows=1440000 loops=1) Index Cond: (("time" >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND ("time" <= '2023-03-05 00:00:00+00'::timestamp with time zone)) ->  Parallel Seq Scan on _hyper_7_189_chunk  (cost=0.00..163635.00 rows=4200000 width=50) (actual time=0.011..1951.004 rows=10080000 loops=1) Filter: (("time" >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND ("time" <= '2023-03-05 00:00:00+00'::timestamp with time zone)) ->  Parallel Seq Scan on _hyper_7_188_chunk  (cost=0.00..163515.05 rows=4192737 width=50) (actual time=0.015..1933.225 rows=10080000 loops=1) Filter: (("time" >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND ("time" <= '2023-03-05 00:00:00+00'::timestamp with time zone)) ->  Parallel Seq Scan on _hyper_7_187_chunk  (cost=0.00..163508.61 rows=4193574 width=50) (actual time=0.007..633.274 rows=3360000 loops=3) Filter: (("time" >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND ("time" <= '2023-03-05 00:00:00+00'::timestamp with time zone)) ->  Parallel Seq Scan on _hyper_7_186_chunk  (cost=0.00..120315.50 rows=3084167 width=50) (actual time=0.015..1363.652 rows=7402000 loops=1) Filter: (("time" >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND ("time" <= '2023-03-05 00:00:00+00'::timestamp with time zone)) Planning Time: 12.377 ms Execution Time: 42291.580 ms (27 rows) Time: 42332,603 ms (00:42,333)

It takes roughly 42 seconds. As you can see, Timescale is smart enough not to consider chunks (data partitions within a hypertable) that don’t own data required by this specific query.

Creating a continuous aggregate over a 15-minute window

Now, let’s create a continuous aggregate, grouping data on 15-minute windows and calculating the temperature and humidity average.

CREATE MATERIALIZED VIEW conditions_summary_15mins WITH (timescaledb.continuous) AS SELECT     time_bucket(INTERVAL '15 minutes', time) AS bucket,      device_id,    AVG(temperature) AS avg_temp,   MAX(temperature) AS max_temp,   MIN(temperature) AS min_temp,   AVG(humidity) AS avg_humidity,   MAX(humidity) AS max_hunidity,   MIN(humidity) AS min_humidity FROM conditions GROUP BY bucket, device_id WITH NO DATA;

The continuous aggregate is created with no data (with no rows) and keeps the minimum and maximum temperature and humidity beside the average. Now let’s start the policy for refreshing the CAGG.

Continuous aggregates (like materialized views) need to be refreshed at regular intervals to keep query time efficient. Despite materialized views, continuous aggregates have an amazing feature called real-time aggregation, which returns results updated to the last transaction on the hypertable. 

How does this work?

Timescale is smart enough to retrieve already-aggregated data from the continuous aggregate and fetch yet-to-be-aggregated data from the hypertable. Then it aggregates it, merges the result, and returns it to the client. This, of course, requires some extra steps (and time), so keeping the continuous aggregate updated would significantly boost the query performance.

So, how can you keep the continuous aggregate up to date? We have a handy function for this:

SELECT add_continuous_aggregate_policy('conditions_summary_hourly', start_offset => INTERVAL '1 day', end_offset => INTERVAL '1 hour', schedule_interval => INTERVAL '1 hour');

What does it mean? We are going to add a refresh policy to conditions_summary_hourly continuous aggregate. We are refreshing an interval starting from one day back and ending an hour ago. If real-time aggregation is enabled, the last hour of data is fetched from the hypertable and merged (if needed by the query). Start_offset and end_offset can be NULL, but we strongly discourage using that value, especially for end_offset on heavy-writing hypertable.

And run the same query against the continuous aggregate:

EXPLAIN ANALYZE SELECT bucket, device_id, avg_temp,avg_humidity FROM conditions_summary_15mins WHERE bucket BETWEEN '2023-02-05' AND '2023-03-05';

QUERY PLAN ------------------------------------------------------------------------------- Append  (cost=0.43..377584.73 rows=5154662 width=50) (actual time=0.016..6323.419 rows=5338000 loops=1) ->  Subquery Scan on "*SELECT* 1"  (cost=0.43..303008.23 rows=5147365 width=50) (actual time=0.015..3458.459 rows=5144000 loops=1) ->  Result  (cost=0.43..251534.58 rows=5147365 width=178) (actual time=0.015..2779.848 rows=5144000 loops=1) ->  Custom Scan (ChunkAppend) on _materialized_hypertable_8  (cost=0.43..200060.93 rows=5147365 width=50) (actual time=0.014..2067.726 rows=5144000 loops=1) Chunks excluded during startup: 0 ->  Index Scan using _hyper_8_216_chunk__materialized_hypertable_8_bucket_idx on _hyper_8_216_chunk  (cost=0.43..5083.93 rows=127365 width=50) (actual time=0.013..34.494 rows=124000 loops=1) Index Cond: ((bucket < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(8)), '-infinity'::timestamp with time zone)) AND (bucket >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (bucket <= '2023-03-05 00:00:00+00'::timestamp with time zone)) ->  Seq Scan on _hyper_8_217_chunk  (cost=0.00..38431.00 rows=988000 width=50) (actual time=0.012..283.673 rows=988000 loops=1) Filter: ((bucket >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (bucket <= '2023-03-05 00:00:00+00'::timestamp with time zone) AND (bucket < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(8)), '-infinity'::timestamp with time zone))) ->  Seq Scan on _hyper_8_218_chunk  (cost=0.00..52175.00 rows=1344000 width=50) (actual time=0.010..412.740 rows=1344000 loops=1) Filter: ((bucket >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (bucket <= '2023-03-05 00:00:00+00'::timestamp with time zone) AND (bucket < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(8)), '-infinity'::timestamp with time zone))) ->  Seq Scan on _hyper_8_219_chunk  (cost=0.00..52184.00 rows=1344000 width=50) (actual time=0.009..437.345 rows=1344000 loops=1) Filter: ((bucket >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (bucket <= '2023-03-05 00:00:00+00'::timestamp with time zone) AND (bucket < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(8)), '-infinity'::timestamp with time zone))) ->  Seq Scan on _hyper_8_220_chunk  (cost=0.00..52187.00 rows=1344000 width=50) (actual time=0.007..431.213 rows=1344000 loops=1) Filter: ((bucket >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (bucket <= '2023-03-05 00:00:00+00'::timestamp with time zone) AND (bucket < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(8)), '-infinity'::timestamp with time zone))) ->  Subquery Scan on "*SELECT* 2"  (cost=47717.46..48803.19 rows=7297 width=91) (actual time=964.021..2342.694 rows=194000 loops=1) ->  Finalize GroupAggregate  (cost=47717.46..48730.22 rows=7297 width=219) (actual time=964.021..2314.030 rows=194000 loops=1) Group Key: (time_bucket('00:15:00'::interval, conditions."time")), conditions.device_id ->  Gather Merge  (cost=47717.46..48511.23 rows=6086 width=91) (actual time=963.992..1666.272 rows=413099 loops=1) Workers Planned: 2 Workers Launched: 2 ->  Partial GroupAggregate  (cost=46717.44..46808.73 rows=3043 width=91) (actual time=929.071..1400.633 rows=137700 loops=3) Group Key: (time_bucket('00:15:00'::interval, conditions."time")), conditions.device_id ->  Sort  (cost=46717.44..46725.05 rows=3043 width=49) (actual time=929.053..1015.758 rows=484667 loops=3) Sort Key: (time_bucket('00:15:00'::interval, conditions."time")), conditions.device_id Sort Method: external merge  Disk: 28976kB Worker 0:  Sort Method: external merge  Disk: 28240kB Worker 1:  Sort Method: external merge  Disk: 28840kB ->  Result  (cost=0.44..46541.38 rows=3043 width=49) (actual time=0.027..300.722 rows=484667 loops=3) ->  Parallel Custom Scan (ChunkAppend) on conditions  (cost=0.44..46503.35 rows=3043 width=49) (actual time=0.026..212.001 rows=484667 loops=3) Chunks excluded during startup: 5 ->  Parallel Index Scan using _hyper_7_190_chunk_conditions_time_idx on _hyper_7_190_chunk  (cost=0.43..46490.03 rows=3038 width=49) (actual time=0.025..165.745 rows=484667 loops=3) Index Cond: (("time" >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(8)), '-infinity'::timestamp with time zone)) AND ("time" >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND ("time" <= '2023-03-05 00:15:00+00'::timestamp with time zone)) Filter: ((time_bucket('00:15:00'::interval, "time") >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (time_bucket('00:15:00'::interval, "time") <= '2023-03-05 00:00:00+00'::timestamp with time zone)) Planning Time: 2.943 ms Execution Time: 6585.342 ms (36 rows) Time: 6618,197 ms (00:06,618)

It’s now taking 6.6 seconds—not bad! As you can see from the execution plan, the query is split into two parts: "*SELECT* 1" and "*SELECT* 2".

The first is against the CAGG, while the second is against the hypertable with the latest data. Rows are fetched, calculated, and merged in the request format. 

Turning off real-time aggregation

If you are not interested in getting the latest transactions and just need data for analytics, you can disable the feature and retrieve rows only from the CAGG:

ALTER MATERIALIZED VIEW conditions_summary_15mins SET (timescaledb.materialized_only = true);

QUERY PLAN  ---------------------------------------------- Append  (cost=0.43..187829.34 rows=5147365 width=50) (actual time=0.015..1760.471 rows=5144000 loops=1)   ->  Index Scan using _hyper_8_216_chunk__materialized_hypertable_8_bucket_idx on _hyper_8_216_chunk  (cost=0.43..4765.52 rows=127365 width=50) (actual time=0.015..43.875 rows=124000 loops=1)         Index Cond: ((bucket >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (bucket <= '2023-03-05 00:00:00+00'::timestamp with time zone))   ->  Seq Scan on _hyper_8_217_chunk  (cost=0.00..31021.00 rows=988000 width=50) (actual time=0.011..239.407 rows=988000 loops=1)         Filter: ((bucket >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (bucket <= '2023-03-05 00:00:00+00'::timestamp with time zone))   ->  Seq Scan on _hyper_8_218_chunk  (cost=0.00..42095.00 rows=1344000 width=50) (actual time=0.010..343.470 rows=1344000 loops=1)         Filter: ((bucket >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (bucket <= '2023-03-05 00:00:00+00'::timestamp with time zone))   ->  Seq Scan on _hyper_8_219_chunk  (cost=0.00..42104.00 rows=1344000 width=50) (actual time=0.008..348.315 rows=1344000 loops=1)         Filter: ((bucket >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (bucket <= '2023-03-05 00:00:00+00'::timestamp with time zone))   ->  Seq Scan on _hyper_8_220_chunk  (cost=0.00..42107.00 rows=1344000 width=50) (actual time=0.010..330.572 rows=1344000 loops=1)         Filter: ((bucket >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (bucket <= '2023-03-05 00:00:00+00'::timestamp with time zone)) Planning Time: 0.981 ms Execution Time: 1985.474 ms (13 rows) Time: 2016,746 ms (00:02,017)

Now we’re down to 2 seconds, even better! As you can see, data is retrieved only from the CAGG.

And switching it on again

Let’s re-enable real-time aggregation:

ALTER MATERIALIZED VIEW conditions_summary_15mins SET (timescaledb.materialized_only = false);

Usually, people aggregate their data on different time windows for different kinds of analysis or dashboards. For instance, in the financial field, you can generally look at the data over a 15-minute view, hourly and daily. So, we can create a new CAGG with a different aggregation window:

CREATE MATERIALIZED VIEW conditions_summary_4hrs  WITH (timescaledb.continuous) AS SELECT time_bucket(INTERVAL '4 hours', time) AS bucket,           device_id,  AVG(temperature) AS avg_temp, MAX(temperature) AS max_temp, MIN(temperature) AS min_temp, AVG(humidity) AS avg_humidity, MAX(humidity) AS max_hunidity, MIN(humidity) AS min_humidity FROM conditions GROUP BY bucket, device_id; NOTICE:  refreshing continuous aggregate "conditions_summary_4hrs" HINT:  Use WITH NO DATA if you do not want to refresh the continuous aggregate on creation. CREATE MATERIALIZED VIEW Time: 544078,223 ms (09:04,078)

It took 9 minutes to get refreshed for the first time with aggregated data.

Hierarchical continuous aggregates (materialized views on materialized views)

Since 2.9.0, Timescale allows you to create hierarchical continuous aggregates (a.k.a. a continuous aggregate on top of another continuous aggregate).

Let’s see how it works:

CREATE MATERIALIZED VIEW conditions_summary_4hrs WITH (timescaledb.continuous) AS SELECT time_bucket(INTERVAL '4 hours', bucket) AS bucket_4hrs, device_id,   AVG(avg_temp) AS avg_temp, MAX(max_temp) AS max_temp, MIN(min_temp) AS min_temp, AVG(avg_humidity) AS avg_humidity, MAX(max_hunidity) AS max_hunidity, MIN(min_humidity) AS min_humidity FROM conditions_summary_15mins GROUP BY bucket_4hrs, device_id;

The new CAGG has a 4-hour window and is created over the 15-minute CAGG.

NOTICE:  refreshing continuous aggregate "conditions_summary_4hrs" HINT:  Use WITH NO DATA if you do not want to refresh the continuous aggregate on creation. CREATE MATERIALIZED VIEW Time: 70160,808 ms (01:10,161)

The result is 70 seconds to refresh the view—that’s amazing! Also, this will benefit the refresh policy, reducing the refresh time and resource usage.

And now, let’s see what the EXECUTION PLAN looks like against a hierarchical CAGG:

EXPLAIN ANALYZE SELECT bucket_4hrs, device_id, avg_temp,avg_humidity FROM conditions_summary_4hrs WHERE bucket_4hrs BETWEEN '2023-02-05' AND '2023-03-07';

QUERY PLAN  -------------------------------------------------------------------------- Append  (cost=0.00..25793.82 rows=350092 width=51) (actual time=0.010..319.095 rows=352000 loops=1) ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..21655.56 rows=350000 width=51) (actual time=0.009..217.240 rows=350000 loops=1) ->  Result  (cost=0.00..18155.56 rows=350000 width=179) (actual time=0.009..175.949 rows=350000 loops=1) ->  Custom Scan (ChunkAppend) on _materialized_hypertable_11  (cost=0.00..14655.56 rows=350000 width=51) (actual time=0.007..131.150 rows=350000 loops=1) Chunks excluded during startup: 0 ->  Seq Scan on _hyper_11_244_chunk  (cost=0.00..3255.00 rows=84000 width=51) (actual time=0.007..24.047 rows=84000 loops=1) Filter: ((bucket_4hrs >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (bucket_4hrs <= '2023-03-07 00:00:00+00'::timestamp with time zone) AND (bucket_4hrs < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(11)), '-infinity'::timestamp with time zone))) ->  Seq Scan on _hyper_11_246_chunk  (cost=0.00..3255.00 rows=84000 width=51) (actual time=0.010..23.718 rows=84000 loops=1) Filter: ((bucket_4hrs >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (bucket_4hrs <= '2023-03-07 00:00:00+00'::timestamp with time zone) AND (bucket_4hrs < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(11)), '-infinity'::timestamp with time zone))) ->  Seq Scan on _hyper_11_251_chunk  (cost=0.00..3256.00 rows=84000 width=51) (actual time=0.009..23.184 rows=84000 loops=1) Filter: ((bucket_4hrs >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (bucket_4hrs <= '2023-03-07 00:00:00+00'::timestamp with time zone) AND (bucket_4hrs < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(11)), '-infinity'::timestamp with time zone))) ->  Index Scan using _hyper_11_252_chunk__materialized_hypertable_11_bucket_4hrs_idx on _hyper_11_252_chunk  (cost=0.30..1403.56 rows=8000 width=51) (actual time=0.024..4.620 rows=8000 loops=1) Index Cond: ((bucket_4hrs < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(11)), '-infinity'::timestamp with time zone)) AND (bucket_4hrs >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (bucket_4hrs <= '2023-03-07 00:00:00+00'::timestamp with time zone)) ->  Seq Scan on _hyper_11_254_chunk  (cost=0.00..2406.00 rows=62000 width=51) (actual time=0.007..17.351 rows=62000 loops=1) Filter: ((bucket_4hrs >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (bucket_4hrs <= '2023-03-07 00:00:00+00'::timestamp with time zone) AND (bucket_4hrs < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(11)), '-infinity'::timestamp with time zone))) ->  Seq Scan on _hyper_11_259_chunk  (cost=0.00..1080.00 rows=28000 width=51) (actual time=0.007..7.673 rows=28000 loops=1) Filter: ((bucket_4hrs >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (bucket_4hrs <= '2023-03-07 00:00:00+00'::timestamp with time zone) AND (bucket_4hrs < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(11)), '-infinity'::timestamp with time zone))) ->  Subquery Scan on "*SELECT* 2"  (cost=2384.12..2387.80 rows=92 width=91) (actual time=60.909..71.279 rows=2000 loops=1) ->  GroupAggregate  (cost=2384.12..2386.88 rows=92 width=219) (actual time=60.909..71.024 rows=2000 loops=1) Group Key: (time_bucket('04:00:00'::interval, "*SELECT* 1_1".bucket)), "*SELECT* 1_1".device_id ->  Sort  (cost=2384.12..2384.35 rows=92 width=52) (actual time=60.897..61.946 rows=18000 loops=1) Sort Key: (time_bucket('04:00:00'::interval, "*SELECT* 1_1".bucket)), "*SELECT* 1_1".device_id Sort Method: quicksort  Memory: 2452kB ->  Result  (cost=0.44..2381.12 rows=92 width=52) (actual time=2.047..41.311 rows=18000 loops=1) ->  Append  (cost=0.44..2379.97 rows=92 width=53) (actual time=2.046..38.692 rows=18000 loops=1) ->  Subquery Scan on "*SELECT* 1_1"  (cost=0.44..863.69 rows=86 width=50) (actual time=2.045..13.640 rows=16000 loops=1) ->  Result  (cost=0.44..862.83 rows=86 width=178) (actual time=2.045..11.803 rows=16000 loops=1) ->  Custom Scan (ChunkAppend) on _materialized_hypertable_8  (cost=0.44..861.97 rows=86 width=50) (actual time=2.044..9.681 rows=16000 loops=1) Chunks excluded during startup: 5 ->  Index Scan using _hyper_8_258_chunk__materialized_hypertable_8_bucket_idx on _hyper_8_258_chunk  (cost=0.43..848.61 rows=81 width=50) (actual time=2.043..8.155 rows=16000 loops=1) Index Cond: ((bucket < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(8)), '-infinity'::timestamp with time zone)) AND (bucket >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(11)), '-infinity'::timestamp with time zone)) AND (bucket >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (bucket <= '2023-03-07 04:00:00+00'::timestamp with time zone)) Filter: ((time_bucket('04:00:00'::interval, bucket) >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (time_bucket('04:00:00'::interval, bucket) <= '2023-03-07 00:00:00+00'::timestamp with time zone)) ->  Subquery Scan on "*SELECT* 2_1"  (cost=1515.57..1515.81 rows=6 width=91) (actual time=17.266..23.544 rows=2000 loops=1) ->  GroupAggregate  (cost=1515.57..1515.75 rows=6 width=219) (actual time=17.265..23.291 rows=2000 loops=1) Group Key: (time_bucket('00:15:00'::interval, conditions."time")), conditions.device_id ->  Sort  (cost=1515.57..1515.59 rows=6 width=50) (actual time=17.251..17.728 rows=8000 loops=1) Sort Key: (time_bucket('00:15:00'::interval, conditions."time")), conditions.device_id Sort Method: quicksort  Memory: 934kB ->  Result  (cost=0.44..1515.49 rows=6 width=50) (actual time=2.067..8.169 rows=8000 loops=1) ->  Custom Scan (ChunkAppend) on conditions  (cost=0.44..1515.42 rows=6 width=50) (actual time=2.066..6.806 rows=8000 loops=1) Chunks excluded during startup: 5 ->  Index Scan using _hyper_7_190_chunk_conditions_time_idx on _hyper_7_190_chunk  (cost=0.43..1507.46 rows=1 width=49) (actual time=2.065..6.109 rows=8000 loops=1) Index Cond: ("time" >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(8)), '-infinity'::timestamp with time zone)) Filter: ((time_bucket('00:15:00'::interval, "time") >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (time_bucket('00:15:00'::interval, "time") <= '2023-03-07 04:00:00+00'::timestamp with time zone) AND (time_bucket('04:00:00'::interval, time_bucket('00:15:00'::interval, "time")) >= '2023-02-05 00:00:00+00'::timestamp with time zone) AND (time_bucket('04:00:00'::interval, time_bucket('00:15:00'::interval, "time")) <= '2023-03-07 00:00:00+00'::timestamp with time zone) AND (time_bucket('00:15:00'::interval, "time") >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(11)), '-infinity'::timestamp with time zone))) Planning Time: 29.508 ms Execution Time: 355.948 ms

Pretty cool, 355 ms! Now you can see the query is fetching data from three different hypertables and merging them all together.

->  Subquery Scan on "*SELECT* 1"  (cost=0.00..21655.56 rows=350000 width=51) (actual time=0.009..217.240 rows=350000 loops=1) ->  Result  (cost=0.00..18155.56 rows=350000 width=179) (actual time=0.009..175.949 rows=350000 loops=1) ->  Custom Scan (ChunkAppend) on _materialized_hypertable_11  (cost=0.00..14655.56 rows=350000 width=51) (actual time=0.007..131.150 rows=350000 loops=1) ………………………………………………………………………………………… ->  Subquery Scan on "*SELECT* 1_1"  (cost=0.44..863.69 rows=86 width=50) (actual time=2.045..13.640 rows=16000 loops=1) ->  Result  (cost=0.44..862.83 rows=86 width=178) (actual time=2.045..11.803 rows=16000 loops=1) ->  Custom Scan (ChunkAppend) on _materialized_hypertable_8  (cost=0.44..861.97 rows=86 width=50) (actual time=2.044..9.681 rows=16000 loops=1) ………………………………………………………………………………………… ->  Result  (cost=0.44..1515.49 rows=6 width=50) (actual time=2.067..8.169 rows=8000 loops=1) ->  Custom Scan (ChunkAppend) on conditions  (cost=0.44..1515.42 rows=6 width=50) (actual time=2.066..6.806 rows=8000 loops=1)

Adding JOINs to the mix

And what about JOINs? Starting from 2.10.0, CAGGs support JOINs, meaning you can create them by joining with another regular table. 

To see how it works, we will join conditions and locations aggregating over a 15-minute window:

CREATE MATERIALIZED VIEW conditions_summary_15mins_join WITH (timescaledb.continuous) AS SELECT  time_bucket(INTERVAL '15 minutes', c.time) AS bucket, c.device_id, l.location, l.environment, AVG(c.temperature) AS avg_temp, MAX(c.temperature) AS max_temp, MIN(c.temperature) AS min_temp, AVG(c.humidity) AS avg_humidity, MAX(c.humidity) AS max_hunidity, MIN(c.humidity) AS min_humidity FROM conditions c JOIN locations l USING (device_id) GROUP BY bucket, device_id, l.location, l.environment

The newly created CAGG has joined data, as shown below, and you don’t have to use JOIN queries over the CAGG:

tsdb=> SELECT * from conditions_summary_15mins_join limit 10;         bucket         |     device_id      |   location    | environment |      avg_temp       |      max_temp      |     min_temp       ------------------------+--------------------+---------------+-------------+---------------------+--------------------+-------------- 2023-02-01 23:45:00+00 | weather-pro-000000 | field-000000  | outside     | 89.2000000000000463 |  89.40000000000003 | 89.00000000000006  2023-02-01 23:45:00+00 | weather-pro-000001 | office-000000 | inside      | 68.3000000000000050 |  68.50000000000001 |              68.1  2023-02-01 23:45:00+00 | weather-pro-000002 | field-000001  | outside     | 85.7875000000002375 |  85.90000000000023 | 85.60000000000025  2023-02-01 23:45:00+00 | weather-pro-000003 | arctic-000000 | outside     | 39.3499999999999605 | 39.499999999999964 | 39.19999999999996  2023-02-01 23:45:00+00 | weather-pro-000004 | door-00000    | doorway     | 65.5250000000002563 |  65.60000000000025 | 65.40000000000026  2023-02-01 23:45:00+00 | weather-pro-000005 | office-000001 | inside      | 71.4249999999999450 |  71.69999999999995 | 71.19999999999995  2023-02-01 23:45:00+00 | weather-pro-000006 | field-000002  | outside     | 86.7750000000001850 |  86.90000000000018 | 86.70000000000019  2023-02-01 23:45:00+00 | weather-pro-000007 | swamp-000000  | outside     | 87.8750000000001775 |  88.00000000000017 | 87.80000000000018  2023-02-01 23:45:00+00 | weather-pro-000008 | field-000003  | outside     | 87.1500000000001588 |  87.30000000000015 | 87.00000000000017  2023-02-01 23:45:00+00 | weather-pro-000009 | door-00001    | doorway     | 62.7750000000003226 | 62.900000000000325 | 62.70000000000032  (10 rows) Time: 65,128 ms

Bonus: What About Compression?

Now that we finished explaining some amazing features related to continuous aggregates, you might say, “Wait, it comes at the cost of extra disk space, right?”

Yes, CAGGs significantly improve CPU, memory, and disk I/O usage but require extra disk space. To help with that, Timescale brings another thrilling feature to the table: compression.

You can compress data on hypertables and CAGGs; in most cases, it is quite performant. Let’s see an example:

SELECT * FROM hypertable_detailed_size('_timescaledb_internal._materialized_hypertable_8'); table_bytes | index_bytes | toast_bytes | total_bytes | node_name  -------------+-------------+-------------+-------------+----------- 2822823936 |  1946337280 |      155648 |  4769316864 |

The CAGG is 4.7 GB—let’s enable automatic refresh and compression:

SELECT add_continuous_aggregate_policy('conditions_summary_15mins',   start_offset => INTERVAL '6 hours',   end_offset => INTERVAL '1 hour',   schedule_interval => INTERVAL '2 hours');

ALTER MATERIALIZED VIEW conditions_summary_15mins set (timescaledb.compress = true);

SELECT add_compression_policy('conditions_summary_15mins', compress_after=>'7 day'::interval);

Two new jobs are created, one for refreshing and one for compression. Compression will start automatically if there are chunks to compress. A few minutes later, you have your compressed CAGG. Let’s see its new size:

SELECT * FROM hypertable_detailed_size('_timescaledb_internal._materialized_hypertable_14'); table_bytes | index_bytes | toast_bytes | total_bytes | node_name  -------------+-------------+-------------+-------------+----------   61325312 |    13205504 |   850911232 |   925442048 |

Wow, 900 MB. It’s an 80 percent compression, and it looks terrific!

Learn More

Continuous aggregates, Timescale’s version of automatically updated incremental materialized views, allow the creation of a table that reflects the output of a time-based aggregation query. 

This table is automatically updated periodically (when the source data is changed). These are said to be real-time because when you read from a continuous aggregate, the most recent data will come from the original dataset. The older data will come from the continuous aggregate itself.

CAGGs help by running the time-based query up-front, speeding up any future runs. This feature relies on users knowing the common queries or aggregates they want to build on ahead of time.

Learn how continuous aggregates can dramatically simplify real-time analytics using PostgreSQL.