Dear Eon - Calculating energy use from instantaneous power usage measurements (transforming kW to kWh)

davidkohn · March 15, 2023, 3:47pm

Even higher accuracy with interpolated integrals

Now the approach of taking the average and assuming that it works for the whole period works pretty well in many cases, but a) it still requires us to do the conversion taking account of the time period (15 min / 60 min per hour) ourselves, and b) it’s technically missing some data at the edges. But we’ll get into that in a sec. Let’s first look at the integral function - essentially, when we do the time weighted average we first have to calculate the numerical integral over the period and then we divide it by the length of the period, we can get that value with whatever unit we want (in our case hours):

SELECT 
	time_bucket('15 min', ts) as bucket, 
	customer_id, 
	circuit_id, 
	integral(
		time_weight('Linear', ts, power_w), 
		'hour') / 1000 -- integral in hours gives watt hours / 1000 W per kW
FROM power_usage 
WHERE ts >= '2023-01-15'::timestamptz and ts < '2023-01-20'::timestamptz 
GROUP BY 1,2,3;

But the problem here is that the integral is running into a similar problem we went over in the energy meter question from a while back where the problem is we’re not able to correctly account for data at the edges of buckets.

The thing is that when you do a group by in SQL the aggreagte can only “see” and analyze data within the GROUP it’s working on. In our case, that means that all the data has the same customer_id and circuit_id, which is great, we don’t want other customers or circuits it would throw off our calculations, but it also means everything is within the same 15 minute time_bucket. We don’t know anything about the the data in the bucket before or after, nor do we really even know where the edge of the bucket is.

|---------- bucket 1 ----------|---------- bucket 2 ----------|---------- bucket 3 ----------|
    * - - - - *                         * - - *                
   /           \                       /       \                    * - - *
  /             * - - *              *          \                           \
*                      \                         \                           \
                        *                          *                          * - - - *
                          \                          \       
                            \                          \   
                             *                           *

Here, the * are data points and the lines represent the linear interpolation we’re doing the numerical integral we’re calculating can be thought of as the area under each of these curves.

In bucket 1, we don’t really have this problem because the 1st and last values in the buckets occur at the edges of the bucket (excuse my diagram if that’s not exactly true, ASCII art is hard). But as we get into bucket 2 and 3, we can see that we don’t have as accurate a picture of what’s going on because we can’t interpolate between the data from the previous bucket and the current. What we’d really like is a picture that looks more like:

|---------- bucket 1 ----------|---------- bucket 2 ----------|---------- bucket 3 ----------|
    * - - - - *                          * - - *                
   /           \                       /       \                    * - - *
  /             * - - *              *          \                 /         \            
*                      \           /             \              /            \         
                        *        /                 *          /               * - - - * 
                          \     /                    \      /
                            \ /                        \  /
                             *                           *

Here we actually do our interpolation between points that cross bucket boundaries. And assign the correct amount to each bucket based on where the bucket boundary falls between the points. So that’s the picture of what we want to achieve, you’ll notice that we need a bit more information than we had in the original integral function - we need 1) information about the points before and after our bucket and 2) information about where the bucket boundary actually falls.

And that’s exactly what our time_weight -> interpolated_integral function takes. We’re going to use PostgreSQL window functions, specifically lag() and lead() in order to get the values from the previous and next buckets. We also do everything in a CTE (commmon table expression aka WITH clause) to make it a bit more readable. The final SQL query will look something like this:

WITH fifteen_min as ( -- start with a fifteen minute window, this CTE could also be turned into a cagg, but we'll save that for later
	SELECT 
		time_bucket('15 min', ts) as bucket, 
		customer_id, 
		circuit_id, 
		time_weight('Linear', ts, power_w) -- do the time weighting in the first step and we'll do the integral later
	FROM power_usage 
    -- extend our where clause by one bucket on each side so we have the extra data for interpolation
	WHERE ts >= '2023-01-15'::timestamptz - '15 min'::interval and ts < '2023-01-20'::timestamptz + '15 min'::interval 
	GROUP BY 1,2,3) 

SELECT 
	bucket, 
	customer_id, 
	circuit_id, 
	interpolated_integral( -- we'll get the integral in the second step of the CTE
		time_weight,
		bucket, -- these two parameters where we provide the bucket boundary information
		'15 min',
		LAG(time_weight) OVER ordered, -- the time_weight from the previous bucket
		LEAD(time_weight) OVER ordered, -- the time_weight from the next bucket
		'hour' -- note we're using hours to go directly to watt-hours (Wh)
	) / 1000 as kWh_used -- divide by 1000 to go from Wh to kWh
FROM 
	fifteen_min
-- now cut the data down to exactly the time period we want 
WHERE bucket >= '2023-01-15'::timestamptz and bucket < '2023-01-20'::timestamptz 
WINDOW ordered as (PARTITION BY customer_id, circuit_id ORDER BY bucket); -- window definition is important must partition by correct ids

So that gets us our interpolated integral which has very similar, though slightly better error to our time weighted average approach in the case that I was testing because there tended to be a lot of data in each of the 15 minute buckets. I think you’ll see somewhat more error in the other approach when there’s less data in the bucket and more impact of the data at the edges. (We already account for some of this error with the time weighted average approach because we are assuming that the average applies to the whole bucket, so we’re “missing” less of the data than with the non-interpolated integral, but if data is more sparse the fact that we’re not taking that data into account as much as we should will have more of an impact).

Anyway, I hope all that was useful! Feel free to ask questions etc if I’ve been less than clear (which I probably have).