Looking at States and Dwell Time With Hyperfunctions
Five years ago, while working at a consulting company in Australia, I posted a question to the Timescale Slack.
The question went unanswered (I think I actually talked to a Timescaler about this in a DM); I implemented a solution in my application, and time quickly moved on.
Fast forward five years. I’d completely forgotten about my question—but it turns out the Internet doesn’t forget! Two months ago, a new Timescale user had the same problem, found the thread, and replied to see if anybody had worked out a solution!
Taking a quick step back, the problem here is looking at a series of state transitions for sensors or devices and working out the amount of time spent in each state: when a new state is started, the old one is closed off.
This is often used for billing or metering (think power meters, devices connected to a network, or even a complex state machine) and can be referred to as dwell time or connection time. In this case, the desired output is ( device_id, state, start_time, dwell_time)
.
KC (who I later found out was working for The Data Loft, a Timescale customer) gave a link to a SQL Server question on Stack Overflow where the same problem was discussed, and generic SQL solutions were put forward. He had tested some and said they were slow. When I tested some, I found that they weren’t quite hitting the correct results either. There had to be an easier and better way!
The Stack Overflow article was even nice enough to provide some demo data for devices moving between WiFi access points. In this case, clientmac
is the device, and apmac
represents the state that is changing (I’ve stripped out the redundant rowID
). If you want to play with this, you can find a full SQL fiddle with the solution here.
CREATE TABLE testdata
(
clientMAC INT NOT NULL,
apMAC VARCHAR(1) NOT NULL,
timeSeen TIMESTAMPTZ NOT NULL
);
-- Create some test data
INSERT INTO testdata (clientmac, apmac, timeseen) VALUES
(1, 'a', '2019-Nov-01 12:01:00'),
(1, 'a', '2019-Nov-01 12:02:00'),
(1, 'a', '2019-Nov-01 12:05:00'),
(1, 'b', '2019-Nov-01 12:10:00'),
(1, 'b', '2019-Nov-01 12:20:00'),
(2, 'a', '2019-Nov-01 12:20:00'),
(2, 'a', '2019-Nov-01 12:22:00'),
(1, 'a', '2019-Nov-01 13:00:00'),
(1, 'a', '2019-Nov-01 13:02:00'),
(1, 'a', '2019-Nov-01 13:06:00'),
(1, 'a', '2019-Nov-01 13:12:00'),
(1, 'a', '2019-Nov-01 14:00:00'),
(1, 'a', '2019-Nov-01 14:12:00'),
(1, 'a', '2019-Nov-01 14:14:00'),
(1, 'a', '2019-Nov-01 14:30:00'),
(1, 'a', '2019-Nov-01 14:35:00');
Before we get to the solution, a quick interlude. In the time between five years ago and now, two important things happened which are worth mentioning.
- I became a Timescaler! I decided that I really loved TimescaleDB and the Timescale story, joined the company, and now run Developer Advocacy.
- We added over 100 hyperfunctions to TimescaleDB (through the timescale-toolkit extension), which extend it with high-performance Rust functions that simplify and supercharge time-series analysis.
Armed with that information, I reached into the timescale-toolkit
for the hyperfunctions dedicated to state tracking and created the following solution to the question (as posed in the Stack Overflow thread):
WITH states AS (
SELECT
(state_timeline(state_agg(timeseen, apMac))).*,
clientmac
FROM testdata
GROUP BY clientmac
)
SELECT
clientmac,
state as apmac,
start_time,
end_time - start_time dwell_time
FROM states
ORDER BY clientmac, start_time;
Which gives the following results:
clientmac | apmac | start_time | dwell_time |
---|---|---|---|
1 | a | 2019-11-01 12:01:00+00 | 00:09:00 |
1 | b | 2019-11-01 12:10:00+00 | 00:15:00 |
1 | a | 2019-11-01 12:25:00+00 | 02:10:00 |
2 | a | 2019-11-01 12:20:00+00 | 00:02:00 |
I wish this had existed five years ago! Once I had this information, working out the average dwell time would have been trivial.
The magic lies in the state_agg
aggregate, which tracks transitions between states within our grouping set (in this case clientmac
), and also in the state_timeline
function, which takes the output of state_agg
and creates a timeline of all state changes with the time they were entered and exited.
With that information in hand, it’s easy to work out the dwell time using end_time - start_time
.
KC was excited, and I loved the chance to provide a long-overdue solution to my past self!
If you want to dive in with hyperfunctions, the fastest way is to create an account and test them out for free with Timescale.