Start supercharging your PostgreSQL today.
Written by Team Timescale
In PostgreSQL and TimescaleDB, the FILTER
clause can be used to extend aggregate functions like sum(), avg(), and count() by adding a WHERE
clause. This is especially useful when you want to perform multiple aggregations in your query.
When FILTER
is used with an aggregate function, only the input rows that its WHERE
clause evaluates to be true will be used in the aggregate clause. The data being processed by the aggregate function is “filtered” by the WHERE
clause’s condition.
Syntax:
<aggregate_function>(<expression>) FILTER(WHERE <condition>)
If you use an aggregate function with a window function call, here is the syntax:
<aggregate_function>(<expression>) FILTER(WHERE <condition>) OVER(<window_definition)
A FILTER
clause can only be used with window functions that are aggregates.
Let’s look at some examples to better understand FILTER
clauses. For the first set of examples, we’ll use the data in this table called car_sales
.
sales_year | sales_month | make | model | kind | quantity | revenue |
2021 | 1 | Ford | F100 | PickUp | 40 | 2500000 |
2021 | 1 | Ford | Mustang | Car | 9 | 1010000 |
2021 | 1 | Renault | Fuego | Car | 20 | 9000000 |
2021 | 2 | Renault | Fuego | Car | 50 | 23000000 |
2021 | 2 | Ford | F100 | PickUp | 20 | 1200000 |
2021 | 2 | Ford | Mustang | Car | 10 | 1050000 |
2021 | 3 | Renault | Megane | Car | 50 | 20000000 |
2021 | 3 | Renault | Koleos | Car | 15 | 1004000 |
2021 | 3 | Ford | Mustang | Car | 20 | 2080000 |
2021 | 4 | Renault | Megane | Car | 50 | 20000000 |
2021 | 4 | Renault | Koleos | Car | 15 | 1004000 |
2021 | 4 | Ford | Mustang | Car | 25 | 2520000 |
PostgreSQL didn’t have FILTER
until version 9.4. Before this version, database developers often used a CASE
statement and WHEN
clauses to get the results you can now get with FILTER
. This method was much less straightforward.
Let’s start with a simple query to get the minimum and maximum revenue per car maker. Here is the query to get the results using the FILTER
clause:
SELECT
min(revenue) FILTER (WHERE make = ‘Ford’) min_ford,
max(revenue) FILTER (WHERE make = ‘Ford’) max_ford,
min(revenue) FILTER (WHERE make = ‘Renault’) min_renault,
max(revenue) FILTER (WHERE make = ‘Renault’) max_renault
FROM
car_sales;
We created an alias for each of these sums with the AS
keyword so that we can differentiate the results, or else each column in the result set would be labeled simply min
or max
. Here are the results:
min_ford | max_ford | min_renault | max_renault |
1010000 | 2520000 | 1004000 | 23000000 |
To get the same results with a CASE statement, you would have to use this query:
SELECT
min(CASE WHEN make = ‘Ford’ THEN revenue ELSE null END) min_ford,
max(CASE WHEN make = ‘Ford’ THEN revenue ELSE null END) max_ford,
min(CASE WHEN make = ‘Renault’ THEN revenue ELSE null END) min_renault,
max(CASE WHEN make = ‘Renault’ THEN revenue ELSE null END) max_renault
FROM
car_sales;
For cases like this, FILTER
is easier to understand.
We can use the FILTER
clause to pivot rows into tables, which you often need to do to generate reports. Let’s say we want to know the total quantity of cars sold in each month. We could use GROUP BY
to do that with a query like this:
SELECT sales_month, sum(quantity) FROM car_sales GROUP BY sales_month;
These are the results:
sales_month | sum |
4 | 90 |
3 | 85 |
1 | 69 |
2 | 80 |
Or we could use FILTER
to pivot these results with a query like this:
SELECT
sum(quantity) FILTER (WHERE sales_month = 1) jan_quantity,
sum(quantity) FILTER (WHERE sales_month = 2) feb_quantity,
sum(quantity) FILTER (WHERE sales_month = 3) mar_quantity,
sum(quantity) FILTER (WHERE sales_month = 4) apr_quantity
FROM
car_sales;
Results:
jan_quantity | feb_quantity | mar_quantity | apr_quantity |
69 | 80 | 85 | 90 |
For the next example, we are going to use a different data set that contains temperature and precipitation data from two cities.
day | city | temperature | precipitation |
2021-09-04 | Miami | 68.36 | 0.00 |
2021-09-05 | Miami | 72.50 | 0.00 |
2021-09-01 | Miami | 65.30 | 0.28 |
2021-09-02 | Miami | 64.40 | 0.79 |
2021-09-03 | Miami | 71.60 | 0.47 |
2021-09-04 | Atlanta | 67.28 | 0.00 |
2021-09-05 | Atlanta | 70.80 | 0.00 |
2021-09-01 | Atlanta | 63.14 | 0.20 |
2021-09-02 | Atlanta | 62.60 | 0.59 |
2021-09-03 | Atlanta | 62.60 | 0.39 |
In the next query, we are going to get the three-day moving average of the temperature in each city. To do this, we will define a window using the OVER
clause and partition it by the city. To show how FILTER
works with the window clause, we will only return the maximum temperature when the temperature for any day in a window, including the current row plus the two rows before is over 70 degrees. In other words, it will give us the highest temperature of the last three days, inclusive of any time the temperature for the day is over 70.
Here is the query:
SELECT city, day, temperature,
MAX(temperature)
FILTER (WHERE temperature > 70)
OVER (
PARTITION BY city
ORDER BY day ASC
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) max_3day
FROM city_data
ORDER BY city, day;
And here are the results:
city | day | temperature | max_3day |
Atlanta | 2021-09-01 | 63.14 | |
Atlanta | 2021-09-02 | 62.60 | |
Atlanta | 2021-09-03 | 62.60 | |
Atlanta | 2021-09-04 | 67.28 | |
Atlanta | 2021-09-05 | 70.80 | 70.80 |
Miami | 2021-09-01 | 65.30 | |
Miami | 2021-09-02 | 64.40 | |
Miami | 2021-09-03 | 71.60 | 71.60 |
Miami | 2021-09-04 | 68.36 | 71.60 |
Miami | 2021-09-05 | 72.50 | 72.50 |
To learn more about the FILTER
clause and how to use it in PostgreSQL, you can check out PostgreSQL’s documentation on aggregate expressions and window function calls.
If you want to experiment with FILTER
in a PostgreSQL (but faster) database while enjoying the benefits of automatic data partitioning and columnar compression to speed up performance and reduce storage, create a free Timescale account today.