I have hyper-table which has a list of events.
Each event represents an AWS instance changing price.
For a given window of time I want to calculate total cost of all the running instances within that window.
Let me give you an example. Say we have a table called instance_events
:
| created_at | cost_per_minute |
|------------|-----------------|
| 08:04:00 | 0 |
| 08:14:00 | 10 |
| 08:17:00 | 15 |
| 08:19:00 | 0 |
| 08:22:00 | 10 |
If we calculate the window from 08:15:00 -> 08:20:00
:
- At
08:15:00 -> 08:17:00
the instance costs is $10, so the cost here is$10 * 2 mins = $20
- At
08:17:00 -> 08:19:00
the instance cost is now $15, so the cost here is$15 * 2 mins = $30
- At
08:19:00 -> 08:20:00
the instance cost is $0 so the cost here is$0 * 1 min = $0
We can then sum these values $20 + $30 + $0 = $50.
So from 08:15:00 -> 08:20:00
the total money spent is $50.
I can write a long ugly query in postgresql that does the job
Pure Postgres Query
WITH
first_event AS (SELECT 15 AS t, COALESCE((SELECT cost_per_hour FROM wat_test WHERE t <= 15 ORDER BY t desc LIMIT 1), 0) AS cost_per_hour),
last_event AS (SELECT 20 as t, COALESCE((SELECT cost_per_hour FROM wat_test WHERE t = 20), 0) AS cost_per_hour),
within_window AS (SELECT * FROM wat_test WHERE t >= 15 AND t < 20),
all_events AS (SELECT * FROM first_event UNION SELECT * FROM within_window UNION SELECT * FROM last_event),
ordered_events AS (SELECT * FROM all_events ORDER BY t ASC),
with_diff AS (SELECT *, LEAD(t) OVER (ORDER BY t) as t_2, LEAD(t) OVER (ORDER BY t) - t as diff FROM ordered_events)
SELECT *, (cost_per_hour/60) * diff as abs_cost FROM with_diff;
I was wondering if there is anyway of doing this using timescale db.
I know you can bucket things by time, But in order to calculate the first cost accurately, you need to know the cost of the previous hour (in the example we needed to know what the cost was at 08:14:00
was $10 even though its outside our window of 08:15:00 -> 08:20:00
)
I’ve tried using time_bucket_gapfill and locf() in order to set up the window and get the last item out of the previous window. But I think locf() is only applied after the aggregation has happened. Also I don’t know how to apply the logic to calculate the cost per window as an aggregate, which I have to do since I’m using time_bucket_gapfill().
I have a vague feeling I need transform my table from a sparse to a dense format. Currently I log any time the cost_per_minute changes, I need to somehow transform that into a table where every minute there is a row which contains the current cost, then I could simply sum the cost over the window.
However I’ve really got no clue where to start with that.
I’m new to timescaleDB, and I’ve really gone and thrown myself in the depend so any and all suggestions are greatly appreciated.
Thanks!