This is an awesome question! And a more difficult one than it might seem.
Here’s the TLDR:
You should use a counter_agg
with an interpolated_delta
to get this result, the ending query to get 15 minute deltas will look something like this:
WITH bucketed as (
SELECT
meter_id,
time_bucket('15 min', ts) as bucket,
counter_agg(ts, total_kwh)
FROM meter_readings
GROUP BY meter_id, bucket
ORDER BY meter_id, bucket)
SELECT
bucket,
meter_id,
toolkit_experimental.interpolated_delta(
counter_agg,
bucket,
'15 min'::interval,
lag(counter_agg) OVER ordered_meter,
lead(counter_agg) OVER ordered_meter) as kwh_used
FROM bucketed
WINDOW ordered_meter AS (PARTITION BY meter_id ORDER BY bucket)
ORDER BY meter_id, bucket;
Here’s the longer answer building all this up piece by piece:
In order to answer it let’s start by generating a little bit of data to try to set the stage. I’m going to generate data for two meters, one that reads every minute and one that reads every 5 minutes. They’re also going to start from different points, but use the same amount of energy in the hour that I’m generating data for, 30 kWh. (This is a lot! I guess our mock data is for a big building ). I’m using generate_series
to do this:
INSERT INTO meter_readings SELECT 1, generate_series('2023-01-01 00:00:00 +00', '2023-01-01 01:00:00 +00', '1 minute'::interval), generate_series(1003,1003 + 30, .50);
INSERT INTO meter_readings SELECT 2, generate_series('2023-01-01 00:00:00 +00', '2023-01-01 01:00:00 +00', '5 minute'::interval), generate_series(3024,3024 + 30, 2.5);
Now we have a bit of data, let’s take a look at the shape of it by getting the min and max values of total_kwh
for each:
SELECT meter_id, min(total_kwh), max(total_kwh)
FROM meter_readings
GROUP BY meter_id
ORDER BY meter_id;
meter_id | min | max
----------+------+------
1 | 1003 | 1033
2 | 3024 | 3054
So the first meter increases from 1003 to 1033 kWh, and the second increases from 3024 to 3054. Both the same amount. This also demonstrates the naive way of getting the total kWh used by each - just subtract the max from the min!
SELECT meter_id, max(total_kwh) - min(total_kwh) as kwh_used
FROM meter_readings
GROUP BY meter_id
ORDER BY meter_id;
meter_id | kwh_used
----------+----------
1 | 30
2 | 30
But there are some problems with this approach, as I’m sure the original poster realized, and they have to do with all of the complications they mentioned.
Let’s start by looking at what happens if we want to figure out what’s going on with each 15 minute period, using this approach:
SELECT meter_id, time_bucket('15 min', ts) as bucket, max(total_kwh) - min(total_kwh) as kwh_used
FROM meter_readings
GROUP BY meter_id, bucket
ORDER BY meter_id, bucket;
meter_id | bucket | kwh_used
----------+------------------------+----------
1 | 2023-01-01 00:00:00+00 | 7
1 | 2023-01-01 00:15:00+00 | 7
1 | 2023-01-01 00:30:00+00 | 7
1 | 2023-01-01 00:45:00+00 | 7
1 | 2023-01-01 01:00:00+00 | 0
2 | 2023-01-01 00:00:00+00 | 5
2 | 2023-01-01 00:15:00+00 | 5
2 | 2023-01-01 00:30:00+00 | 5
2 | 2023-01-01 00:45:00+00 | 5
2 | 2023-01-01 01:00:00+00 | 0
Now we know that both of these meters measured the same amount of energy use overall, but a) they’re showing different amounts used, and b) neither of them add up to 30 kWh. (30/4 is 7.5 so if we get this right, that’s about what we’d expect)
Let’s look at the min and max in each period to understand why:
SELECT meter_id, time_bucket('15 min', ts) as bucket, min(total_kwh) as min_kwh, max(total_kwh) as max_kwh
FROM meter_readings
GROUP BY meter_id, bucket
ORDER BY meter_id, bucket;
meter_id | bucket | min_kwh | max_kwh
----------+------------------------+---------+---------
1 | 2023-01-01 00:00:00+00 | 1003 | 1010
1 | 2023-01-01 00:15:00+00 | 1010.5 | 1017.5
1 | 2023-01-01 00:30:00+00 | 1018 | 1025
1 | 2023-01-01 00:45:00+00 | 1025.5 | 1032.5
1 | 2023-01-01 01:00:00+00 | 1033 | 1033
2 | 2023-01-01 00:00:00+00 | 3024 | 3029
2 | 2023-01-01 00:15:00+00 | 3031.5 | 3036.5
2 | 2023-01-01 00:30:00+00 | 3039 | 3044
2 | 2023-01-01 00:45:00+00 | 3046.5 | 3051.5
2 | 2023-01-01 01:00:00+00 | 3054 | 3054
In each period, we’ve sampled some points, but they’re not all the way at the edges of the bucket, so we’re not quite getting the right answer. Essentially this is because the values we read aren’t quite at the edges of the buckets, so the min and max within a bucket don’t quite work for meter 2 it looks something like this:
|---- first bucket ----|--- next bucket ---
|-------*-------*------|-----*
| | | |
val 3024 3026.5 3029 3031.5
minute 0 5 10 15
Here we can see that the max inside the first bucket occurs at 10 minutes, but we lose the 5 minutes at the end, which, in the case of meter 2 is 2.5 kWh from each bucket, essentially.
To handle this, we could use window functions something like this and instead of subtracting max - min we could subtract the max from the previous max using the lag
window function like so:
-- this is just the previous query in a common table expression or CTE so we can use it below
WITH bucketed as (SELECT meter_id, time_bucket('15 min', ts) as bucket, min(total_kwh) as min_kwh, max(total_kwh) as max_kwh
FROM meter_readings
GROUP BY meter_id, bucket
ORDER BY meter_id, bucket)
SELECT *, max_kwh - lag(max_kwh) OVER (ordered_meter) as usage -- lag gets the previous value based on the ordering in the window clause
FROM bucketed
WINDOW ordered_meter AS (PARTITION BY meter_id ORDER BY bucket) -- this is the window clause
ORDER BY meter_id, bucket;
meter_id | bucket | min_kwh | max_kwh | usage
----------+------------------------+---------+---------+-------
1 | 2023-01-01 00:00:00+00 | 1003 | 1010 |
1 | 2023-01-01 00:15:00+00 | 1010.5 | 1017.5 | 7.5
1 | 2023-01-01 00:30:00+00 | 1018 | 1025 | 7.5
1 | 2023-01-01 00:45:00+00 | 1025.5 | 1032.5 | 7.5
1 | 2023-01-01 01:00:00+00 | 1033 | 1033 | 0.5
2 | 2023-01-01 00:00:00+00 | 3024 | 3029 |
2 | 2023-01-01 00:15:00+00 | 3031.5 | 3036.5 | 7.5
2 | 2023-01-01 00:30:00+00 | 3039 | 3044 | 7.5
2 | 2023-01-01 00:45:00+00 | 3046.5 | 3051.5 | 7.5
2 | 2023-01-01 01:00:00+00 | 3054 | 3054 | 2.5
This is closer, but we’re missing the first values for each meter, we really want to use the min if the previous doesn’t exist, so we can use coalesce to deal with that:
WITH bucketed as (SELECT meter_id, time_bucket('15 min', ts) as bucket, min(total_kwh) as min_kwh, max(total_kwh) as max_kwh
FROM meter_readings
GROUP BY meter_id, bucket
ORDER BY meter_id, bucket)
SELECT *, max_kwh - coalesce(lag(max_kwh) OVER (ordered_meter), min_kwh) as usage
FROM bucketed
WINDOW ordered_meter AS (PARTITION BY meter_id ORDER BY bucket)
ORDER BY meter_id, bucket;
meter_id | bucket | min_kwh | max_kwh | usage
----------+------------------------+---------+---------+-------
1 | 2023-01-01 00:00:00+00 | 1003 | 1010 | 7
1 | 2023-01-01 00:15:00+00 | 1010.5 | 1017.5 | 7.5
1 | 2023-01-01 00:30:00+00 | 1018 | 1025 | 7.5
1 | 2023-01-01 00:45:00+00 | 1025.5 | 1032.5 | 7.5
1 | 2023-01-01 01:00:00+00 | 1033 | 1033 | 0.5
2 | 2023-01-01 00:00:00+00 | 3024 | 3029 | 5
2 | 2023-01-01 00:15:00+00 | 3031.5 | 3036.5 | 7.5
2 | 2023-01-01 00:30:00+00 | 3039 | 3044 | 7.5
2 | 2023-01-01 00:45:00+00 | 3046.5 | 3051.5 | 7.5
2 | 2023-01-01 01:00:00+00 | 3054 | 3054 | 2.5
This is pretty close!! You’ll note that they both add up to 30 kWh, so that’s a good sign.
But…you might also notice that the last bit of data in each bucket is actually be attributed to the next bucket, this is because we , looking back at our explanation from above for meter 2:
|---- first bucket ----|--- next bucket ---
|-------*-------*------|-------*-------*------|-------*-
| | | | | | |
val 3024 3026.5 3029 3031.5 3034 3036.5 3039
minute 0 5 10 15 20 25 30
|-----naive bucket-----|
So our first bucket gets 5, even though the value 3031.5 occurs right at the edge of the bucket, so it really should count in the previous, and then the next bucket looks correct with 7.5, but it’s actually taking the 2.5 we lost from the first and applying it to the next bucket and finally at the end, the last bucket gets the extra 2.5 that’s sort of been carried forward with the offset. This is a complex problem, and I’m honestly not going to bother writing the naive SQL to solve this because it’d be really complex, but we’ll come back and deal with it / discuss it later.
Before we do that and think about this smaller problem, I think it’s important to look at a bigger one that our original poster asked and that’s a pretty huge deal if we get it wrong:
How do we deal with resets?
While this situation might be rare, it’s also pretty important, we don’t want to get a wrong answer. So- how can we model a reset? Well, how about we pretend that when the second meter gets to 3034 kWh it got reset to zero.
SELECT * FROM meter_readings WHERE meter_id = 2 order by ts;
meter_id | ts | total_kwh
----------+------------------------+-----------
2 | 2023-01-01 00:00:00+00 | 3024
2 | 2023-01-01 00:05:00+00 | 3026.5
2 | 2023-01-01 00:10:00+00 | 3029
2 | 2023-01-01 00:15:00+00 | 3031.5
2 | 2023-01-01 00:20:00+00 | 3034 -- reset to zero here
2 | 2023-01-01 00:25:00+00 | 3036.5
2 | 2023-01-01 00:30:00+00 | 3039
2 | 2023-01-01 00:35:00+00 | 3041.5
2 | 2023-01-01 00:40:00+00 | 3044
2 | 2023-01-01 00:45:00+00 | 3046.5
2 | 2023-01-01 00:50:00+00 | 3049
2 | 2023-01-01 00:55:00+00 | 3051.5
2 | 2023-01-01 01:00:00+00 | 3054
So we just update the values in the table wherever they’re greater than that to subtract 3034 like so:
UPDATE meter_readings SET total_kwh = total_kwh - 3034 WHERE meter_id = 2 and total_kwh > 3034;
SELECT * FROM meter_readings WHERE meter_id = 2 order by ts;
UPDATE 8
meter_id | ts | total_kwh
----------+------------------------+-----------
2 | 2023-01-01 00:00:00+00 | 3024
2 | 2023-01-01 00:05:00+00 | 3026.5
2 | 2023-01-01 00:10:00+00 | 3029
2 | 2023-01-01 00:15:00+00 | 3031.5
2 | 2023-01-01 00:20:00+00 | 3034 -- reset here
2 | 2023-01-01 00:25:00+00 | 2.5
2 | 2023-01-01 00:30:00+00 | 5
2 | 2023-01-01 00:35:00+00 | 7.5
2 | 2023-01-01 00:40:00+00 | 10
2 | 2023-01-01 00:45:00+00 | 12.5
2 | 2023-01-01 00:50:00+00 | 15
2 | 2023-01-01 00:55:00+00 | 17.5
2 | 2023-01-01 01:00:00+00 | 20
So now, if we look at our naive query, where we’re just calculating the total kWh used we have a big problem here! We should still be getting 30 kWh, but instead:
SELECT meter_id, max(total_kwh) - min(total_kwh) as kwh_used
FROM meter_readings
GROUP BY meter_id
ORDER BY meter_id;
meter_id | kwh_used
----------+----------
1 | 30
2 | 3031.5
Whoever has meter 2 is gonna get way overcharged if we’re not careful here!
And if we were to do the 15 minute periods like we did before, we’d see that we’re getting a negative result for one of them!
WITH bucketed as (SELECT meter_id, time_bucket('15 min', ts) as bucket, min(total_kwh) as min_kwh, max(total_kwh) as max_kwh
FROM meter_readings
GROUP BY meter_id, bucket
ORDER BY meter_id, bucket)
SELECT *, max_kwh - coalesce(lag(max_kwh) OVER (ordered_meter), min_kwh) as usage
FROM bucketed
WINDOW ordered_meter AS (PARTITION BY meter_id ORDER BY bucket)
ORDER BY meter_id, bucket;
meter_id | bucket | min_kwh | max_kwh | usage
----------+------------------------+---------+---------+-------
1 | 2023-01-01 00:00:00+00 | 1003 | 1010 | 7
1 | 2023-01-01 00:15:00+00 | 1010.5 | 1017.5 | 7.5
1 | 2023-01-01 00:30:00+00 | 1018 | 1025 | 7.5
1 | 2023-01-01 00:45:00+00 | 1025.5 | 1032.5 | 7.5
1 | 2023-01-01 01:00:00+00 | 1033 | 1033 | 0.5
2 | 2023-01-01 00:00:00+00 | 3024 | 3029 | 5
2 | 2023-01-01 00:15:00+00 | 2.5 | 3034 | 5
2 | 2023-01-01 00:30:00+00 | 5 | 10 | -3024
2 | 2023-01-01 00:45:00+00 | 12.5 | 17.5 | 7.5
2 | 2023-01-01 01:00:00+00 | 20 | 20 | 2.5
Luckily we have a hyperfunction provided by Timescale that accounts for this and does the right thing if there’s a reset of this sort, it’s called counter_agg
and it’s made to account for resets in these sorts of counters and still give us an accurate delta
. It assumes that whenever the value drops it has been reset to zero in between and starts counting from there.
Let’s look at what happens with our overall when we use the counter_agg
to get the overall energy usage:
SELECT meter_id,
max(total_kwh) - min(total_kwh) as kwh_used_naive,
counter_agg(ts, total_kwh) -> delta() as kwh_used_counter_agg
FROM meter_readings
GROUP BY meter_id
ORDER BY meter_id;
As with most hyperfunctions, counter_agg
uses two-step aggregation, so we’re using the counter_agg
aggregate and the delta
accessor which gets the max - min value of the counter over the period.
If I wanted to do this with window functions and the like I’d need a lot of layers of CTEs because we’d first need to look for when the value drops, then correct future values and it just gets very difficult very fast. SQL is Turing complete, which means I can be sure that it’s possible.
Now, if I want to get the fifteen minute value like we did before, we can see we have fixed the reset issue, we’re not showing several thousand kWh used in one of the periods:
SELECT meter_id,
time_bucket('15 min', ts) as bucket,
max(total_kwh) - min(total_kwh) as kwh_used_naive,
counter_agg(ts, total_kwh) -> delta() as kwh_used_counter_agg
FROM meter_readings
GROUP BY meter_id, bucket
ORDER BY meter_id, bucket;
meter_id | bucket | kwh_used_naive | kwh_used_counter_agg
----------+------------------------+----------------+----------------------
1 | 2023-01-01 00:00:00+00 | 7 | 7
1 | 2023-01-01 00:15:00+00 | 7 | 7
1 | 2023-01-01 00:30:00+00 | 7 | 7
1 | 2023-01-01 00:45:00+00 | 7 | 7
1 | 2023-01-01 01:00:00+00 | 0 | 0
2 | 2023-01-01 00:00:00+00 | 5 | 5
2 | 2023-01-01 00:15:00+00 | 3031.5 | 5
2 | 2023-01-01 00:30:00+00 | 5 | 5
2 | 2023-01-01 00:45:00+00 | 5 | 5
2 | 2023-01-01 01:00:00+00 | 0 | 0
But we are running into the same problem as before with not counting all the way to the edge of the bucket.
To do that, we’re still going to need to use a window function and structure it the same sort of way:
WITH bucketed as (SELECT meter_id,
time_bucket('15 min', ts) as bucket,
min(total_kwh) as min_kwh,
max(total_kwh) as max_kwh,
counter_agg(ts, total_kwh) as counter_agg
FROM meter_readings
GROUP BY meter_id, bucket
ORDER BY meter_id, bucket)
SELECT meter_id, bucket, max_kwh - coalesce(lag(max_kwh) OVER (ordered_meter), min_kwh) as usage_naive,
toolkit_experimental.interpolated_delta(
counter_agg,
bucket, '15 min'::interval,
lag(counter_agg) OVER ordered_meter,
lead(counter_agg) OVER ordered_meter)
FROM bucketed
WINDOW ordered_meter AS (PARTITION BY meter_id ORDER BY bucket)
ORDER BY meter_id, bucket;
meter_id | bucket | usage_naive | interpolated_delta
----------+------------------------+-------------+--------------------
1 | 2023-01-01 00:00:00+00 | 7 | 7.5
1 | 2023-01-01 00:15:00+00 | 7.5 | 7.5
1 | 2023-01-01 00:30:00+00 | 7.5 | 7.5
1 | 2023-01-01 00:45:00+00 | 7.5 | 7.5
1 | 2023-01-01 01:00:00+00 | 0.5 | 0
2 | 2023-01-01 00:00:00+00 | 5 | 7.5
2 | 2023-01-01 00:15:00+00 | 5 | 7.5
2 | 2023-01-01 00:30:00+00 | -3024 | 7.5
2 | 2023-01-01 00:45:00+00 | 7.5 | 7.5
2 | 2023-01-01 01:00:00+00 | 2.5 | 0
No weird negative values, and you’ll also note that we’ve solved the bucketing problem we identified earlier as well! That’s because the interpolated_delta
does linear interpolation to the edge of the bucket. For buckets with one point, right at the edge, it all ends up counted in the previous bucket. (And if your points aren’t quite aligned with the bucket edge, the right amounts will get attributed to the correct buckets based on the linear interpolation).