Gapfilling on continous aggregates

Kurtaxum · September 9, 2022, 1:10pm

Hi

I have the following:

A raw measurement hypertable with points at irregular timestamps.
A continous aggregate aggregating (average) values from the raw measurement table into time buckets of 1 minute.
Timescale version 2.8.0 (docker).

Sometimes there are gaps (no data) for various reasons, so I would like to use the time_bucket_gapfill() function to fill the gaps. For example in the table below I’m missing data between 13:11-13:14 and a larger gap between the two last points of the table.

Continous aggregate (1 minute) data:

time_bucket	sensor_name	value
2022-09-07 13:03:00+00	my_sensor	13.079369544999999
2022-09-07 13:04:00+00	my_sensor	13.364346265000002
2022-09-07 13:05:00+00	my_sensor	13.120381764285714
2022-09-07 13:06:00+00	my_sensor	13.307350793333333
2022-09-07 13:07:00+00	my_sensor	13.323299936666666
2022-09-07 13:08:00+00	my_sensor	13.02146966142857
2022-09-07 13:09:00+00	my_sensor	13.0645930775
2022-09-07 13:10:00+00	my_sensor	13.269353985
2022-09-07 13:11:00+00	my_sensor	13.427673814999999
2022-09-07 13:14:00+00	my_sensor	13.45933795
2022-09-07 13:15:00+00	my_sensor	12.729422676666665
2022-09-07 13:16:00+00	my_sensor	11.900959663600002
2022-09-07 13:17:00+00	my_sensor	10.042585991189192
2022-09-07 13:18:00+00	my_sensor	8.013347342243247
2022-09-07 13:19:00+00	my_sensor	6.830994115628569
2022-09-07 13:20:00+00	my_sensor	4.737388410105263
2022-09-07 13:21:00+00	my_sensor	1.8275442532812496
2022-09-07 13:22:00+00	my_sensor	-0.1109750237457143
2022-09-09 08:55:00+00	my_sensor	0.029553126545000002

Query to get the data in the table above:
SELECT time_bucket, sensor_name, values FROM my_table WHERE sensor_name = 'my_sensor' AND time_bucket > now() - INTERVAL '3 days';

Trying to use time_bucket_gapfill combined with locf to fill the gaps with the following query:
SELECT time_bucket_gapfill('1 minute', time_bucket), sensor_name, values FROM my_table WHERE sensor_name = 'my_sensor' AND time_bucket > now() - INTERVAL '3 days';

From this I get:

time_bucket_gapfill	sensor_name	locf
2022-09-07 13:03:00+00	my_sensor	13.079369544999999
2022-09-07 13:04:00+00	my_sensor	13.364346265000002
2022-09-07 13:05:00+00	my_sensor	13.120381764285714
2022-09-07 13:06:00+00	my_sensor	13.307350793333333
2022-09-07 13:07:00+00	my_sensor	13.323299936666666
2022-09-07 13:08:00+00	my_sensor	13.02146966142857
2022-09-07 13:09:00+00	my_sensor	13.0645930775
2022-09-07 13:10:00+00	my_sensor	13.269353985
2022-09-07 13:11:00+00	my_sensor	13.427673814999999
2022-09-07 13:14:00+00	my_sensor	13.45933795
2022-09-07 13:15:00+00	my_sensor	12.729422676666665
2022-09-07 13:16:00+00	my_sensor	11.900959663600002
2022-09-07 13:17:00+00	my_sensor	10.042585991189192
2022-09-07 13:18:00+00	my_sensor	8.013347342243247
2022-09-07 13:19:00+00	my_sensor	6.830994115628569
2022-09-07 13:20:00+00	my_sensor	4.737388410105263
2022-09-07 13:21:00+00	my_sensor	1.8275442532812496
2022-09-07 13:22:00+00	my_sensor	-0.1109750237457143
2022-09-09 08:55:00+00	my_sensor	0.029553126545000002

Which is exactly the same as in the first table.

In the docs it says “It can only be used in an aggregation query with time_bucket_gapfill”, is that what I am doing wrong? I do not have any aggregate function in my time_bucket_fill query such as avg(), because the data have already been aggregated in the continous aggregate.

However, according to [this comment] on Github (Support time_bucket_gapfill in continuous aggregates · Issue #1324 · timescale/timescaledb · GitHub) it should be possible to first create the continous aggregate, then use the time_bucket_gapfill function (exactly what I am trying to do at the moment).

Clearly there must be something I am doing wrong or not understanding about this functionality, so any help would be greatly appreciated. I hope I have included all relevant info for this problem, if you need more let me know.

ryanbooz · September 9, 2022, 6:29pm

Hello @Kurtaxum,

What if you do the following? (assuming “my_table” is your CAGG)

SELECT time_bucket_gapfill('1 minute', time_bucket) as bucket, sensor_name, values 
FROM my_table 
WHERE sensor_name = 'my_sensor' 
 AND time_bucket < now() and time_bucket > now() - INTERVAL '3 days'
group by bucket;

Kurtaxum · September 9, 2022, 10:49pm

Thanks @ryanbooz

Yes, “my_table” is my CAGG.
I tried what you suggested, but I have to include sensor_name and values in the group by clause then. Doing so gave me some wierd results, therefore I tried and aggregate function(avg()) combined with locf:

SELECT time_bucket_gapfill('1 minute', time_bucket) as bucket, sensor_name, locf(avg(values))
FROM my_table 
WHERE sensor_name = 'my_sensor' 
 AND time_bucket < now() and time_bucket > now() - INTERVAL '3 days'
group by bucket, sensor_name;

This seems to give me what I wanted, but it feels abit wierd that I have to use avg() when this have already been averaged in my CAGG.