Outlier removal and continuous aggregation

Hi,
I’m using TimescaleDB to analyze marketplace data (sales, listings, buy orders) and need to calculate aggregate metrics (particularly averages) that aren’t skewed by outliers. I’m trying to determine the best architectural approach for handling outliers in relation to continuous aggregates.

My main question is about where in the workflow outlier filtering should occur:

  1. Should I create separate continuous aggregates - one for raw data and another for cleaned data (with outliers removed)?

  2. Or should I preprocess the data to remove outliers before feeding it into the continuous aggregates?

  3. Is outlier filtering something that can/should be handled within the continuous aggregate itself?

Thanks

After fiddling with this all day, I’ve failed to find a pattern to filter outliers within the continuous aggregate itself. The percentile hyperfunctions let me get the IQR stuff conveniently but I can’t find a way to then use those percentiles to filter the original data before calculating the average.

The core issue seems to be that continuous aggregates don’t support:

  1. Subqueries to filter the original data

  2. Joins back to the source table

  3. Complex filtering based on aggregate statistics

This makes it impossible to calculate a true outlier-free average while keeping everything within the continuous aggregate framework. I’m curious if there’s a pattern I’m missing or if this is a fundamental limitation of the current architecture.

I then tried using simple standard deviation based outlier detection via the stats_agg + average hyperfunction but there is no with_bounds() mutator like with some of the other ones so that was a no go as well.

I’d appreciate any guidance here.

Hi @StefanIgnjatovic12 , I don’t think we have anything in place for outliers.

We’re also tracking it here: Analytical functions for Anomaly Detection · Issue #45 · timescale/timescaledb-toolkit · GitHub

In most cases, we recommend filtering out after aggregating, creating a view on top of the continuous aggregates that you can make all references not allowed in the materialization level.