This is a great question, and it’s the sort of thing that comes up a lot with energy usage measurements like you were talking about, but also around things like solar panel production, battery usage etc where what you have is instantaneous power and you need to calculate energy from it.
TLDR use time_weight -> average
The TLDR is you can use the time_weight
family of functions to do this calculation. You don’t want to use a normal avg
with potentially irregular data like this. So this will get you a lot of accuracy with very little change in the complexity of the query, you do have to do the right multiplication to account for time and power units to get kWh like so:
SELECT
time_bucket('15 min', ts) as bucket,
customer_id,
circuit_id,
average(
time_weight('Linear', ts, power_w)
) * 15/60/1000 -- time_weighted_average(power in W) * 15 mins / 60 mins per hour / 1000 W per kW
FROM power_usage
WHERE ts >= '2023-01-15'::timestamptz and ts < '2023-01-20'::timestamptz
GROUP BY 1,2,3;
Slightly Less TLDR Most Accurate Query using time_weight -> interpolated_integral
This will be quite accurate, and much more accurate than just using a basic average but the most accurate approach would be to use the time_weight -> interpolated_integral
, The query for that would look something like this:
WITH fifteen_min as ( -- start with a fifteen minute window, this could also be turned into a cagg, but we'll save that for later
SELECT
time_bucket('15 min', ts) as bucket,
customer_id,
circuit_id,
time_weight('Linear', ts, power_w) -- do the time weighting in the first step
FROM power_usage
WHERE ts >= '2023-01-15'::timestamptz - '15 min'::interval and ts < '2023-01-20'::timestamptz + '15 min'::interval
GROUP BY 1,2,3)
SELECT
bucket,
customer_id,
circuit_id,
interpolated_integral( -- we'll get the integral in the second step, this also allows us to
time_weight,
bucket,
'15 min',
LAG(time_weight) OVER ordered,
LEAD(time_weight) OVER ordered,
'hour' -- note we're using hours to go directly to watt-hours
) / 1000 as kWh_used -- divide by 1000 to go from Wh to kWh
FROM
fifteen_min
WHERE bucket >= '2023-01-15'::timestamptz and bucket < '2023-01-20'::timestamptz
WINDOW ordered as (PARTITION BY customer_id, circuit_id ORDER BY bucket); -- window definition is important must partition by correct ids
Longer explanation
But let’s build this up step by step and see how this works, and why it can get you better results than the “naive” average way of doing things.
Let’s recall that the goal here is to go from kW or some form of power to kWh or some form of energy. The usual way to calculate something like this would be to do something like:
- take the average power over some period
- convert to kWh by multiplying the average power over that period by the amount of time in hours it was valid (and dividing by 1000 cause we’re converting from watts to kilowatts)
The query in this case would be something like:
SELECT
time_bucket('15 min', ts) as bucket,
customer_id,
circuit_id,
avg(power_w) * 15/60/1000 -- avg(power in W) * 15 mins / 60 mins per hour / 1000 W per kW
FROM power_usage
WHERE ts >= '2023-01-15'::timestamptz and ts < '2023-01-20'::timestamptz
GROUP BY 1,2,3; -- have to group by
Note that we’re grouping by our 15 minute time_bucket
, and both the customer and circuit ids so that we get the power for that circuit.
And this will totally work if your data is regular, meaning each point covers the same amount of time. But it won’t work as well if your data is somewhat irregular. It’s also a bit confusing in the SQL unless you take care to comment like I have about what sort of conversion you’re doing and why. These sorts of conversions can make for really, really difficult to parse SQL queries that litter your codebase with random formulas that are really hard to understand or audit.
Remember what the question said about this:
We take measurements at minimum every 5 mins, but also send more data when the power changes quickly, so we get more measurements when it’s a bit choppy and some of the devices have pretty quick duty cycles so that’s important for us to get. What that does mean is that I can’t just sum the values and get a reasonable answer, because the values can “count” for different periods of time (not to mention we do have missed messages sometimes, networks being what they are).
So we need to deal with this data irregularity in order to get a proper answer. This is where the time_weight
family of functions is really important. So there are two sources of irregularity we’re accounting for here: 1) client side downsampling, where we avoid sending too many messages when the measurements are changing slowly, but send more measurements when they’re changing more quickly. 2) Network problems leading to missed messages.
Luckily both of these are well accounted for with time_weight
. We’re going to use the ‘Linear’ weighting method because we haven’t been told that this is a system that reads only when the value changes, but rather we’re sampling a continuous system at lower or higher rates depending on some variables.
Note If you want to read more about time_weight
and how exactly it works and this type of problem and derivation, I wrote a blog about it a while back.
The simplest way then is to use the same approach, but instead of using a normal average, we’ll use the time weighted average and then perform the same multiplication to account for time and units like so:
SELECT
time_bucket('15 min', ts) as bucket,
customer_id,
circuit_id,
average(
time_weight('Linear', ts, power_w)
) * 15/60/1000 -- time weighted average(power in W) * 15 mins / 60 mins per hour / 1000 W per kW
FROM power_usage
WHERE ts >= '2023-01-15'::timestamptz and ts < '2023-01-20'::timestamptz
GROUP BY 1,2,3;
The basic graphical intuition for why the normal average doesn’t work is this:
transient
more data quiescent
few data points data stable large change few data points
|----------------------------------|-------------|--------------------|
*
* *
* *
* * * *
* * * *
*
The average just sums all the values and divides by the total number of points. So the average will massively over-weight the transient periods where there were more data points because the value changed faster vs the quiescent periods where there were fewer data points but the value changed slower. But storing more data points over the whole time period would be inefficient in terms of space and fewer over the whole period would cause us to miss the interesting bits.
I’ve been testing this in a data set where we have an energy meter as well as a power meter hooked up, so the energy meter is able to give a “truly accurate” result because it doesn’t have the limitations of the sampling rate. With this approach we’re able to get very close, < 1% error for most 15 minute periods - less than 20 W difference for even the highest error periods, while the naive average approach has 8x the peak error, and missed at worst by more than 250 W for a fifteen minute period. The overall error for the 5 day period I was analyzing was almost 5000 W vs less than 600 W for the time weighted approach.
Okay. I’m gonna end this post here, cause we’ve gotten a long way already, but there’s still more to think about and a more accurate way to do this calculation with interpolated integrals that we’ll cover next.