TimescaleDB continuous aggregation wrong time buckets

Sanprof · June 29, 2023, 10:56am

We have a device for temperature collecting, it is configured to collect the temperature of the refrigerator box each second in the buffer (but can collect even fewer periods 10ms, 100ms) and at the end of each minute it sends an average value for the last minute (but it can be required to send average temperature per each second or per every 10 seconds). We have a hyper_table for these data. And we want to create some amount of continuous aggregation (with required intervals but for now it is 5min) for this table as it grows continuously. The issue we faced during the creation of materialized view is that the time_bucket function uses the date_bin function which always uses the start_date as basis. It means that time_bucket will always use the next statement time >= start_date and time < end_date which is wrong for us because we need time > start_date and time <= end_date . You can see it in the picture.

I created a very simple sample to show what I meant

-- create table
drop table if exists temp_measure;
create table if not exists temp_measure(
    time timestamptz,
    "avg_temp" double precision
);

insert into temp_measure select '2023-06-27 05:59:00+00', 6.9;
insert into temp_measure select '2023-06-27 06:00:00+00', 7.2;
insert into temp_measure select '2023-06-27 06:01:00+00', 15.1;
insert into temp_measure select '2023-06-27 06:02:00+00', 10.6;
insert into temp_measure select '2023-06-27 06:03:00+00', 8.5;
insert into temp_measure select '2023-06-27 06:04:00+00', 12;
insert into temp_measure select '2023-06-27 06:05:00+00', 13.2;
insert into temp_measure select '2023-06-27 06:06:00+00', 9.7;
insert into temp_measure select '2023-06-27 06:07:00+00', 8.1;
insert into temp_measure select '2023-06-27 06:08:00+00', 7.9;
insert into temp_measure select '2023-06-27 06:09:00+00', 8.5;
insert into temp_measure select '2023-06-27 06:10:00+00', 7.7;

-- create hypertable
SELECT create_hypertable('temp_measure', 'time', if_not_exists => TRUE, chunk_time_interval => INTERVAL '1 day', migrate_data => true);

-- create materialized view
CREATE MATERIALIZED VIEW temp_measure_5min
WITH (timescaledb.continuous)
AS
SELECT
    time_bucket('5 minutes', time) as bucket,
    ROUND(AVG(avg_temp)::decimal,1)::double precision as avg_temp
FROM
    temp_measure
GROUP BY bucket
WITH NO DATA;

Now lets see what we have at the end

SELECT bucket, avg_temp FROM public.temp_measure_5min order by bucket;

bucket  avg_temp
2023-06-27 07:55:00+02  6.9
2023-06-27 08:00:00+02  10.7
2023-06-27 08:05:00+02  9.5
2023-06-27 08:10:00+02  7.7

This is wrong for us and we have own window function which is correct for us

CREATE OR REPLACE FUNCTION public.time_window(
    _interval interval,
    _time timestamptz,
    _date_from timestamptz)
    RETURNS timestamptz
    LANGUAGE 'plpgsql'
AS $BODY$
begin

    RETURN
        CASE
            WHEN date_bin(_interval, _time, _date_from) = _time THEN _time
            ELSE (date_bin(_interval, _time, _date_from) + _interval)::timestamptz
        END;

end;
$BODY$;

And now if we use our time_window function instead of time_bucket I see the correct result for us

SELECT
    time_window('5 minutes', time, '2001-01-01') as bucket,
    ROUND(AVG(avg_temp)::decimal,1)::double precision as avg_temp
FROM
    temp_measure
GROUP BY bucket
order by bucket;

bucket  avg_temp
2023-06-27 08:00:00+02  7.1
2023-06-27 08:05:00+02  11.9
2023-06-27 08:10:00+02  8.4

System info:

psql (15.3 (Debian 15.3-1.pgdg110+1))
timescaledb - 2.10.3

jonatasdp · July 4, 2023, 4:17pm

Hi @Sanprof , very interesting case. Thanks for sharing the snippets.

Have you checked our official docs explaining how it works? Timescale Documentation | About time buckets
It seems you can use the origin param as described in the documentation.

Sanprof · July 4, 2023, 4:40pm

yes, I read it and it seems origin works for selection from raw data table with the correct aggregated values

SELECT
    time_bucket('5 minutes', time, '2000-01-01 00:01:00'::TIMESTAMPTZ) as bucket,
    ROUND(AVG(avg_temp)::decimal,1)::double precision as avg_temp
FROM
    temp_measure
GROUP BY bucket
order by bucket;

bucket  avg_temp
2023-06-27 08:00:00+02  7.1
2023-06-27 08:05:00+02  11.9
2023-06-27 08:10:00+02  8.4

But when I’m trying to use it for the creation of materialized view I’m getting an error

ERROR: continuous aggregate view must include a valid time bucket function

CREATE MATERIALIZED VIEW temp_measure_5min
WITH (timescaledb.continuous)
AS
SELECT
    time_bucket('5 minutes', time, '2000-01-01 00:01:00'::timestamptz) as bucket,
    ROUND(AVG(avg_temp)::decimal,1)::double precision as avg_temp
FROM
    temp_measure
GROUP BY bucket
WITH NO DATA;

and even using a new function timescaledb_experimental.time_bucket_ng

CREATE MATERIALIZED VIEW temp_measure_5min
WITH (timescaledb.continuous)
AS
SELECT
    timescaledb_experimental.time_bucket_ng('5 minutes', time, origin => '2000-01-01 00:01:00'::timestamptz) as bucket,
    ROUND(AVG(avg_temp)::decimal,1)::double precision as avg_temp
FROM
    temp_measure
GROUP BY bucket
WITH NO DATA;

jonatasdp · July 5, 2023, 4:14pm

@Sanprof, thanks for sharing all the details. It seems a bug. Please report it as a bug on the issues and let’s confirm it.
Just confirming the typeof:

playground=# select pg_typeof(time_bucket('5 minutes', now())) as bucket;
┌──────────────────────────┐
│          bucket          │
├──────────────────────────┤
│ timestamp with time zone │
└──────────────────────────┘
(1 row)

playground=# select pg_typeof(time_bucket('5 minutes', now()::timestamp)) as bucket;
┌─────────────────────────────┐
│           bucket            │
├─────────────────────────────┤
│ timestamp without time zone │
└─────────────────────────────┘
(1 row)

Interesting that it seems we’re with time zone is problematic and we also have it with the origin param:

playground=# select   pg_typeof(time_bucket('5 minutes', now()::timestamp, '2000-01-01 00:01:00'::timestamptz)) as bucket;
┌──────────────────────────┐
│          bucket          │
├──────────────────────────┤
│ timestamp with time zone │
└──────────────────────────┘
(1 row)

Sanprof · July 5, 2023, 4:39pm

An interesting thing that I can use timescaledb_experimental.time_bucket_ng with origin parameter when time column has type timestamp without time zone but the result is still weird, aggregation works correct but time value is something that I don’t expect - take a look at the result

DROP MATERIALIZED VIEW IF EXISTS temp_measure_5min;
-- create table
drop table if exists temp_measure;
create table if not exists temp_measure(
    time timestamp without time zone,
    "avg_temp" double precision
);

insert into temp_measure select '2023-06-27 05:59:00', 6.9;
insert into temp_measure select '2023-06-27 06:00:00', 7.2;
insert into temp_measure select '2023-06-27 06:01:00', 15.1;
insert into temp_measure select '2023-06-27 06:02:00', 10.6;
insert into temp_measure select '2023-06-27 06:03:00', 8.5;
insert into temp_measure select '2023-06-27 06:04:00', 12;
insert into temp_measure select '2023-06-27 06:05:00', 13.2;
insert into temp_measure select '2023-06-27 06:06:00', 9.7;
insert into temp_measure select '2023-06-27 06:07:00', 8.1;
insert into temp_measure select '2023-06-27 06:08:00', 7.9;
insert into temp_measure select '2023-06-27 06:09:00', 8.5;
insert into temp_measure select '2023-06-27 06:10:00', 7.7;

-- create hypertable
SELECT create_hypertable('temp_measure', 'time', if_not_exists => TRUE, chunk_time_interval => INTERVAL '1 day', migrate_data => true);

-- create continuous aggregation
CREATE MATERIALIZED VIEW temp_measure_5min
WITH (timescaledb.continuous)
AS
SELECT
    timescaledb_experimental.time_bucket_ng('5 minutes', time, origin => '2000-01-01 00:01:00') as bucket,
    ROUND(AVG(avg_temp)::decimal,1)::double precision as avg_temp
FROM
    temp_measure
GROUP BY bucket
WITH NO DATA;

-- review data
SELECT * FROM public.temp_measure_5min

bucket	avg_temp
2023-06-27 5:56:00	7.1
2023-06-27 6:01:00	11.9
2023-06-27 6:06:00	8.4

jonatasdp · July 6, 2023, 2:01pm

If you can isolate the POC which you see the weird behavior, please, bring it as a issue for the toolkit project:
Issues · timescale/timescaledb-toolkit · GitHub

That’s why it’s still experimental and we need feedback.

welchums · February 18, 2024, 4:28pm

Have there been any efforts to fix this problem? I’m encountering the same issue using time origins when trying to create a materialized view.