Result weirdness with window query above

rmk · October 7, 2024, 11:46pm

Thanks for the response.

I’ve tried modifying the query in Grafana to:

SELECT
  time_bucket('5m',time) AS time,
  id,
  (AVG(value) - LAG(AVG(value)) OVER w) * 55 * 4186 / (5 * 60) AS "Calorifier power (timescale)"
 FROM sensor
 WHERE $__timeFilter("time") AND (id = '3c32e38150f2' OR id = '3ce104571ea2')
 GROUP BY 1, 2
 WINDOW w AS (ORDER BY sensor.time RANGE BETWEEN INTERVAL '5 minutes' PRECEDING AND CURRENT ROW)
 ORDER BY 1, 2

The $__timeFilter() part is expanded to the appropriate time BETWEEN ... clause.

PostgreSQL complained about the lack of ORDER BY:

RANGE with offset PRECEDING/FOLLOWING requires exactly one ORDER BY column

so I added ORDER BY time and then tried ORDER BY sensor.time as per the above, but PostgreSQL still doesn’t want to know:

column "sensor.time" must appear in the GROUP BY clause or be used in an aggregate function

GROUP BY doesn’t seem to be usable inside the window, and the time-bucketed column is already in the GROUP BY. Either I don’t understand your suggestion properly, or the syntax is wrong.

Moving on to your second potential solution. I’ve tried:

WITH subquery AS (
  SELECT
   time_bucket('5m',time) AS time,
   id,
   AVG(value) AS average
  FROM sensor
  WHERE $__timeFilter("time") AND (id = '3c32e38150f2' OR id = '3ce104571ea2')
  GROUP BY 1, 2
  ORDER BY 1, 2)
SELECT
  time,
  (average - LAG(average) OVER w) * 55 * 4186 / (5 * 60) AS "Power"
 FROM subquery
 GROUP BY 1
 WINDOW w AS (ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
 ORDER BY 1

Both with my original window and your window, both cause PostgreSQL to spit out:

column "subquery.average" must appear in the GROUP BY clause or be used in an aggregate function

As a last ditch attempt, I tried:

WITH subquery AS (
  SELECT
   time_bucket('5m',time) AS time,
   id,
   AVG(value) AS average
  FROM sensor
  WHERE $__timeFilter("time") AND (id = '3c32e38150f2' OR id = '3ce104571ea2')
  GROUP BY 1, 2
  ORDER BY 1, 2)
SELECT
  time,
  (average - LAG(average) OVER (PARTITION BY id ORDER BY time)) * 55 * 4186 / (5 * 60) AS "Power"
 FROM subquery
 GROUP BY 1
 ORDER BY 1

and that gives me the same error. This is very similar to what I’ve seen elsewhere, such as https://www.timescale.com/forum/t/migrate-grafana-dashboards-from-influxdb-datasource-to-timescaledb/2395/3

You suggest using time_bucket_gapfill() - yes, this may be useful in the future, but at the moment there are no gaps in the data as previously confirmed. There are also no duplicates either, and there are the correct number of data points for the time range - I already confirmed that. At one minute intervals, there should be 251 data points between 9:00 and 13:10 inclusive (that’s 4 * 60 + 10 + 1 data points). However, I will probably need to add the gap filling at some point, thanks for the suggestion.

As for time zones, data is inserted using UTC. It is also my understanding from having read the PostgreSQL documentation that timestamptz columns are stored as UTC with microseconds.

For continuous aggregates, I attempted to configure an materialized aggregate view, but I haven’t been able to get it to work - so it’s now been dropped, and dropping it hasn’t made any difference.

I’ve now tried:

SELECT
  time_bucket('5m',time) AS time,
  (AVG(value) - LAG(AVG(value)) OVER w) * 55 * 4186 / (5 * 60) AS "Power"
 FROM sensor
 WHERE $__timeFilter("time") AND (id = '3c32e38150f2' OR id = '3ce104571ea2')
 GROUP BY 1
 WINDOW w AS (ORDER BY time_bucket('5m',time))
 ORDER BY 1

after reading your minimal implementation - and this appears to mostly work. It looks like there’s differences between the original Influx query and this query for the first and last data points, but the rest of the data seems correct, which is a vast improvement.

Thanks!