You can use window functions in PostgreSQL or TimescaleDB to perform complex calculations across sets of rows (termed as a “window”) related to the current row.
A window, or analytic, function uses the values from one or multiple rows in a database table to perform a calculation and return the value.
Window functions are different from aggregate functions because the rows aren’t grouped into a single output. In a window function, each row can remain separate, but the function has access to more than just the data in the current row.
Window functions always use an OVER
clause directly after the query. This clause is what makes the window function different from a normal function. The OVER
clause creates window frames in rows of data by determining how many rows in the query are split up into each calculation. When you use a window function, the row's value is computed based on all the rows in the same partition as the current row.
You can use window functions with PARTITION BY
and ORDER BY. PARTITION BY
defines the criteria that records must match to be part of the window frame. ORDER BY
determines the order of the records.
OVER
, PARTITION BY
, and ORDER BY
syntax:
OVER ([PARTITION BY <columns>] [ORDER BY <columns>])
ROWS BETWEEN
is used to specify a window frame in relation to the current row.
ROWS BETWEEN
syntax:
OVER ([PARTITION BY <columns>] [ORDER BY <columns>] [ROWS BETWEEN <lower_bound> AND <upper_bound>])
The bounds in ROWS BETWEEN
can be anyone of these five things:
UNBOUNDED
PRECEDING
: All rows before the current row.
n PRECEDING
: n rows before the current row.
CURRENT ROW
: Just the current row.
n FOLLOWING
: n rows after the current row.
UNBOUNDED FOLLOWING
: All rows after the current row.
Use WINDOW
to create a window clause that separates a window function from the SELECT
clause.
WINDOW
syntax:
OVER w FROM WINDOW w AS ([PARTITION BY <columns>] [ORDER BY <columns>] [ROWS BETWEEN <lower_bound> AND <upper_bound>])
Examples
Using a window function over all the rows of a result set
Ordering the records in a window frame
Partitioning the records in a window frame
Ordering and partitioning the records in a window frame
Using a window clause
Using ROWS BETWEEN
in a window clause
These examples use sales data in a database table called sales_data
, like this:
id | sale_time | branch | item | quantity | total |
1 | 2021-08-11 | New York | Watch | 1 | 100 |
2 | 2021-08-11 | Chicago | Watch | 2 | 200 |
3 | 2021-08-12 | Chicago | Necklace | 3 | 600 |
4 | 2021-08-13 | Phoenix | Ring | 1 | 250 |
5 | 2021-08-13 | New York | Ring | 1 | 250 |
6 | 2021-08-14 | Miami | Watch | 2 | 200 |
If you use OVER
without defining a PARTITION BY
, ORDER BY
, or ROWS
clause when using OVER
, the calculation is performed on a window containing all the rows in the record set. Here is an example query to get a summary of sales:
SELECT branch, SUM(total) OVER() AS sum FROM sales_data;
Results:
branch | sum |
New York | 1600 |
Chicago | 1600 |
Chicago | 1600 |
Phoenix | 1600 |
New York | 1600 |
Miami | 1600 |
The amount in the sum column is a sum of all the values in the table.
If you combine an ORDER BY
clause with OVER
, aggregation is performed against the current row and all previous rows in the result set. This is because, by default, window frames use UNBOUNDED PROCEEDING
for aggregation.
This example query also gets a summary of sales, but it orders the results by the time column:
SELECT branch, SUM(total) OVER(ORDER BY id) AS sum FROM sales_data;
Results:
branch | sum |
New York | 100 |
Chicago | 300 |
Chicago | 900 |
Phoenix | 1150 |
New York | 1400 |
Miami | 1600 |
The amount in the sum is a running total of sales.
If you order the results by a column that contains duplicate values, the results turn out differently. For example:
SELECT branch, SUM(total) OVER(ORDER BY sale_time) AS sum FROM sales_data;
Results:
branch | sum |
New York | 300 |
Chicago | 300 |
Chicago | 900 |
Phoenix | 1400 |
New York | 1400 |
Miami | 1600 |
The aggregate sum is still a running total but it is not the same as in the previous example. That is because the window includes all preceding rows, and also includes rows where the sale times match.
PARTITION BY
works like GROUP BY
in a window frame. It groups all the results by the condition you set. This example uses GROUP BY
to get a sum of sales for each branch in the data:
SELECT branch, SUM(total) AS sum FROM sales_data sd GROUP BY branch;
Results:
branch | sum |
Chicago | 800 |
New York | 350 |
Miami | 200 |
Phoenix | 250 |
This example uses PARTITION BY
on the window frame:
SELECT id, branch, SUM(total) OVER(PARTITION BY branch) AS sum FROM sales_data;
Results:
id | branch | sum |
2 | Chicago | 800 |
3 | Chicago | 800 |
6 | Miami | 200 |
1 | New York | 350 |
5 | New York | 350 |
4 | Phoenix | 250 |
The sums are the same in both examples, but the second example did not require them to be grouped.
When you use both ORDER BY
and PARTITION BY
in OVER
, you can specify the order of the results in each partition to which you apply the window function. This example retrieves a running total of sales by location in the data set:
SELECT sale_time, branch, SUM(total) OVER(PARTITION BY branch ORDER BY sale_time) AS sum FROM sales_data;
Results:
sale_time | branch | sum |
2021-08-11 | Chicago | 200 |
2021-08-12 | Chicago | 800 |
2021-08-14 | Miami | 200 |
2021-08-11 | New York | 100 |
2021-08-13 | New York | 350 |
2021-08-13 | Phoenix | 250 |
If you don’t want to use an inline window function, you can convert it to a window clause. Here is the previous example query rewritten with a window clause. It returns the same results in both formats. This is useful if you want to use multiple window functions in your query:
SELECT sale_time, branch, SUM(total) OVER w AS sum
FROM sales_data WINDOW w AS (PARTITION BY branch ORDER BY sale_time);
These examples use a dataset containing the precipitation and temperature data from a couple of cities over five days. This data is in a table called city_data
:
date | city | temperature | precipitation |
2021-09-01 | Miami | 65.30 | 0.28 |
2021-09-01 | Atlanta | 63.14 | 0.20 |
2021-09-02 | Miami | 64.40 | 0.79 |
2021-09-02 | Atlanta | 62.60 | 0.59 |
2021-09-03 | Miami | 68.18 | 0.47 |
2021-09-03 | Atlanta | 66.20 | 0.39 |
2021-09-04 | Miami | 68.36 | 0.00 |
2021-09-04 | Atlanta | 67.28 | 0.00 |
2021-09-05 | Miami | 72.50 | 0.00 |
2021-09-05 | Atlanta | 68.72 | 0.00 |
When you use ROWS BETWEEN
in a window clause, the ORDER BY
clause works a bit differently.
When you use ORDER BY
in your window frame, the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. However, if you don’t use ORDER BY
, the default frame is ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
.
It’s important to think about how you want to use the ORDER BY
clause in your window frame, especially when you also are using a ROWS
clause.
For example, If you want to calculate a three-day moving average of the temperatures in each city, you can use this query:
SELECT city, date, temperature,
AVG(temperature) OVER (
PARTITION BY city
ORDER BY date DESC
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) avg_3day
FROM city_data
ORDER BY city, date;
To get a three-day moving average of the temperature for each city, start by partitioning the window frame by the city. Then, you have to order the date in each city partition so that you can select a three-day set of rows based on the location of the current row. You can then order the date in descending order and use the current row and the next two rows to calculate the average temperature:
Results:
city | date | temperature | avg_3day |
Atlanta | 2021-09-01 | 63.14 | 63.14 |
Atlanta | 2021-09-02 | 62.60 | 62.87 |
Atlanta | 2021-09-03 | 66.20 | 63.98 |
Atlanta | 2021-09-04 | 67.28 | 65.36 |
Atlanta | 2021-09-05 | 68.72 | 67.4 |
Miami | 2021-09-01 | 65.30 | 65.3 |
Miami | 2021-09-02 | 64.40 | 64.85 |
Miami | 2021-09-03 | 68.18 | 65.96 |
Miami | 2021-09-04 | 68.36 | 66.98 |
Miami | 2021-09-05 | 72.50 | 69.68 |
Because the ROWS
clause depends on the ORDER BY
clause in the window frame, you can get the same results by ordering the dates ascending in the window frame and using the current row plus the two preceding rows to calculate the average, like this:
SELECT city, date, temperature,
AVG(temperature) OVER (
PARTITION BY city
ORDER BY date ASC
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) avg_3day
FROM city_data
ORDER BY city, date;
CUME_DIST()
calculates the cumulative distribution of a value in a set of values. This function can be particularly useful in statistical analysis.
SELECT salesperson_id, COUNT(*), CUME_DIST() OVER (ORDER BY COUNT(*) DESC)
FROM sales
GROUP BY salesperson_id;
DENSE_RANK()
assigns a rank to each row within a window partition without gaps in ranking values.
SELECT salesperson_id, COUNT(*), DENSE_RANK() OVER (ORDER BY COUNT(*) DESC)
FROM sales
GROUP BY salesperson_id;
To learn more about how to use RANK() and DENSE_RANK(), check out Understanding RANK() and DENSE_RANK() in PostgreSQL.
FIRST_VALUE()
returns the first value in an ordered set of values.
SELECT product_name, sales, FIRST_VALUE(product_name) OVER (ORDER BY sales DESC)
FROM product_sales;
LAG()
fetches the value from a previous row in the same result set.
SELECT product_name, sales, LAG(sales) OVER (ORDER BY sales)
FROM product_sales;
LAST_VALUE()
returns the last value in an ordered set of values.
SELECT product_name, sales, LAST_VALUE(product_name) OVER (ORDER BY sales DESC)
FROM product_sales;
LEAD()
fetches the value from a subsequent row in the same result set.
SELECT product_name, sales, LEAD(sales) OVER (ORDER BY sales)
FROM product_sales;
NTILE(n)
divides an ordered result set into n number of approximately equal groups.
SELECT product_name, sales, NTILE(4) OVER (ORDER BY sales)
FROM product_sales;
NTH_VALUE(n)
returns the nth row's value from the window frame's first row.
SELECT product_name, sales, NTH_VALUE(product_name, 2) OVER (ORDER BY sales DESC)
FROM product_sales;
PERCENT_RANK()
calculates the percentage rank of a value within a group of values.
SELECT salesperson_id, COUNT(*), PERCENT_RANK() OVER (ORDER BY COUNT(*) DESC)
FROM sales
GROUP BY salesperson_id;
RANK()
provides a unique rank to each distinct row within a window partition.
SELECT salesperson_id, COUNT(*), RANK() OVER (ORDER BY COUNT(*) DESC)
FROM sales
GROUP BY salesperson_id;
ROW_NUMBER()
assigns a unique row number to each row within a window partition.
SELECT salesperson_id, COUNT(*), ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC)
FROM sales
GROUP BY salesperson_id;
For more information about window functions and how you can use them in PostgreSQL, see the PostgreSQL documentation. To find out more about how window functions are processed in PostgreSQL, see this section of the PostgreSQL documentation. And for more details on the syntax of window functions, see this section. For more examples of how to use window functions in your queries, check out these Timescale documentation sections: