We have a device for temperature collecting, it is configured to collect the temperature of the refrigerator box each second in the buffer (but can collect even fewer periods 10ms, 100ms) and at the end of each minute it sends an average value for the last minute (but it can be required to send average temperature per each second or per every 10 seconds). We have a hyper_table for these data. And we want to create some amount of continuous aggregation (with required intervals but for now it is 5min) for this table as it grows continuously. The issue we faced during the creation of materialized view is that the time_bucket function uses the date_bin function which always uses the start_date as basis. It means that time_bucket will always use the next statement time >= start_date and time < end_date
which is wrong for us because we need time > start_date and time <= end_date
. You can see it in the picture.
I created a very simple sample to show what I meant
-- create table
drop table if exists temp_measure;
create table if not exists temp_measure(
time timestamptz,
"avg_temp" double precision
);
insert into temp_measure select '2023-06-27 05:59:00+00', 6.9;
insert into temp_measure select '2023-06-27 06:00:00+00', 7.2;
insert into temp_measure select '2023-06-27 06:01:00+00', 15.1;
insert into temp_measure select '2023-06-27 06:02:00+00', 10.6;
insert into temp_measure select '2023-06-27 06:03:00+00', 8.5;
insert into temp_measure select '2023-06-27 06:04:00+00', 12;
insert into temp_measure select '2023-06-27 06:05:00+00', 13.2;
insert into temp_measure select '2023-06-27 06:06:00+00', 9.7;
insert into temp_measure select '2023-06-27 06:07:00+00', 8.1;
insert into temp_measure select '2023-06-27 06:08:00+00', 7.9;
insert into temp_measure select '2023-06-27 06:09:00+00', 8.5;
insert into temp_measure select '2023-06-27 06:10:00+00', 7.7;
-- create hypertable
SELECT create_hypertable('temp_measure', 'time', if_not_exists => TRUE, chunk_time_interval => INTERVAL '1 day', migrate_data => true);
-- create materialized view
CREATE MATERIALIZED VIEW temp_measure_5min
WITH (timescaledb.continuous)
AS
SELECT
time_bucket('5 minutes', time) as bucket,
ROUND(AVG(avg_temp)::decimal,1)::double precision as avg_temp
FROM
temp_measure
GROUP BY bucket
WITH NO DATA;
Now lets see what we have at the end
SELECT bucket, avg_temp FROM public.temp_measure_5min order by bucket;
bucket avg_temp
2023-06-27 07:55:00+02 6.9
2023-06-27 08:00:00+02 10.7
2023-06-27 08:05:00+02 9.5
2023-06-27 08:10:00+02 7.7
This is wrong for us and we have own window function which is correct for us
CREATE OR REPLACE FUNCTION public.time_window(
_interval interval,
_time timestamptz,
_date_from timestamptz)
RETURNS timestamptz
LANGUAGE 'plpgsql'
AS $BODY$
begin
RETURN
CASE
WHEN date_bin(_interval, _time, _date_from) = _time THEN _time
ELSE (date_bin(_interval, _time, _date_from) + _interval)::timestamptz
END;
end;
$BODY$;
And now if we use our time_window function instead of time_bucket I see the correct result for us
SELECT
time_window('5 minutes', time, '2001-01-01') as bucket,
ROUND(AVG(avg_temp)::decimal,1)::double precision as avg_temp
FROM
temp_measure
GROUP BY bucket
order by bucket;
bucket avg_temp
2023-06-27 08:00:00+02 7.1
2023-06-27 08:05:00+02 11.9
2023-06-27 08:10:00+02 8.4
System info:
- psql (15.3 (Debian 15.3-1.pgdg110+1))
- timescaledb - 2.10.3