Data Outside Refresh Window Crashes Refresh

Anthony_Gunter · October 18, 2024, 3:10pm

I’m executing this on a completely idle nearly empty database with no compression policy, one hypertable, and two continuous aggregates. I set up refresh policies with:

start_offset => ‘1 month’
end_offset => ‘10 seconds’
schedule_interval => ‘10 seconds’ (i.e. runs 10 seconds after previous run finishes).

Data is basically four GUID identifiers, a UOM string, a float value, a boolean flag, and a bitmask. Continuous aggregates are set to give min, max, average, and bitwise or for the bitmask. The first aggregate is for hour buckets and the second for minute buckets.

I’m inserting 100,000 records of data timestamped at 1Hz, but I’m inserting the records every 10 milliseconds. So, not fast enough or big enough to strain the resources of the database.

If I insert these records starting at 3 weeks old, the refresh completes before I can even see it run.

If I insert these records starting at 7 months old (i.e. completely outside the refresh window), the next refresh will run for 2 hours. The data does not show up in the materialized views, but it’s obviously making the refresh go sideways.

If I watch this on a database with a realistic dataset (~2 billion records), it will run for 6.5 hours and eventually crash because it runs out of space for temporary files.

Version is 2.8.1, is this a known bug? Has it been fixed in a later version?

jonatasdp · October 21, 2024, 10:30pm

Hi @Anthony_Gunter , can you share the exact crash message?

Also, a minimal POC so I can try here in latest versions to double check if this is a versioning issue?

Yes, it will generate a lot of extra work because you generate extra data window that needs to refresh. Every update will also use the offset as a guidance, so, if you backfill something from 6 months ago, the invalidation log will be using the offset + the window range that was updated. Check here to learn more about invalidation logs.