First, apologies if this is not the correct place to post this.
I’m seeing weirdness with some window queries in TimescaleDB. I am inserting data into the table at one minute intervals, the time stamp is aligned to the minute. The data consists of a sensor id and its value, in this case the temperature of a tank of water in Celsius. I’m computing the power for a change in temperature over 5 minute periods.
For this, I’m using the query:
SELECT
time_bucket('5m',time) AS time,
id,
(AVG(value) - LAG(AVG(value)) OVER w) * 55 * 4186 / (5 * 60)
FROM sensor
WHERE (id = '3c32e38150f2' OR id = '3ce104571ea2') AND
time BETWEEN '2024-10-06T09:00:00Z' AND '2024-10-06T13:10:00Z'
GROUP BY 1, 2
WINDOW w AS (ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
ORDER BY 1, 2;
This was converted from my Influx v1 query:
SELECT
derivative(mean("value"), 1s) * 55 * 4186
FROM "rp_2d"."sensor"
WHERE "id"::tag =~ /3c32e38150f2|3ce104571ea2/ AND
time >= 1728169200000ms AND time <= 1728255599999ms
GROUP BY time(5m), "id"::tag
FILL(null)
ORDER BY time ASC
which I’m plotting with Grafana - and while transitioning, I’ve been plotting the results from Influx and TimescaleDB on the same graph allowing me to confirm my queries are properly converted. This highlighted the following issue as the results were different depending on the time range.
If I request a time period of 12:00 to 13:10, then I get:
time | id | ?column?
------------------------+--------------+---------------------
...
2024-10-06 13:20:00+01 | 3ce104571ea2 | -19.032346666665934
2024-10-06 13:25:00+01 | 3ce104571ea2 | 9.516173333332967
2024-10-06 13:30:00+01 | 3ce104571ea2 | 9.516173333332967
2024-10-06 13:35:00+01 | 3ce104571ea2 | 0
2024-10-06 13:40:00+01 | 3ce104571ea2 | 0
2024-10-06 13:45:00+01 | 3ce104571ea2 | 9.516173333332967
2024-10-06 13:50:00+01 | 3ce104571ea2 | -9.516173333332967
2024-10-06 13:55:00+01 | 3ce104571ea2 | 0
2024-10-06 14:00:00+01 | 3ce104571ea2 | 0
2024-10-06 14:05:00+01 | 3ce104571ea2 | 0
2024-10-06 14:10:00+01 | 3ce104571ea2 | 0
This corresponds to the same data loaded in Influx and queried using derivative(value,1s) * 55 * 4186, and is what I expect. Bringing the start time earlier to 11:00 gives the same values for these times. As does 10:00. So, everything seems fine.
However, if I query from 09:00, then the results from Postgresql/TimescaleDB go haywire:
time | id | ?column?
------------------------+--------------+------------------------
...
2024-10-06 13:20:00+01 | 3ce104571ea2 | 76.89682000000073
2024-10-06 13:25:00+01 | 3ce104571ea2 | 0
2024-10-06 13:30:00+01 | 3ce104571ea2 | 47.5808666666662
2024-10-06 13:35:00+01 | 3ce104571ea2 | 86.25950666666603
2024-10-06 13:40:00+01 | 3ce104571ea2 | 0
2024-10-06 13:45:00+01 | 3ce104571ea2 | 47.73435333333251
2024-10-06 13:50:00+01 | 3ce104571ea2 | 0
2024-10-06 13:55:00+01 | 3ce104571ea2 |
2024-10-06 14:00:00+01 | 3ce104571ea2 | 0
2024-10-06 14:05:00+01 | 3ce104571ea2 | 95.92916666666666
2024-10-06 14:10:00+01 | 3ce104571ea2 | -9.516173333332967
It’s the same query, only the start time has been changed in the FROM clause. Notice that not only are the values for each time completely different, but also there is an apparent NULL value at 13:55.
Trying different start times, it seems the critical point is 09:50 vs 09:55. 09:55 gives the first results, 09:50 gives the second set of results - trying within the 5 minute period, 09:54 produces the same weirdness as 09:50.
Verifying the underlying data, no data is missing from the table - each minute in these ranges is populated. To prove that the number of data points is correct, there should be 251 datapoints, and:
SELECT COUNT(time), COUNT(id) FROM sensor
WHERE (id = '3c32e38150f2' OR id = '3ce104571ea2') AND
time BETWEEN '2024-10-06T09:00:00Z' AND '2024-10-06T13:10:00Z';
count | count
-------+-------
251 | 251
(1 row)
This is using Postgresql 15.8-0+deb12u1 with the pre-built TimescaleDB 2.16.1~debian12.
Can someone help me understand what is going on please? Thanks.