Data Processing With PostgreSQL Window Functions

Abstract shapes over a dark background.

You can use window functions in PostgreSQL or TimescaleDB to perform complex calculations across sets of rows (termed as a “window”) related to the current row.

A window, or analytic, function uses the values from one or multiple rows in a database table to perform a calculation and return the value.

Window functions are different from aggregate functions because the rows aren’t grouped into a single output. In a window function, each row can remain separate, but the function has access to more than just the data in the current row.

Window functions always use an OVER clause directly after the query. This clause is what makes the window function different from a normal function. The OVER clause creates window frames in rows of data by determining how many rows in the query are split up into each calculation. When you use a window function, the row's value is computed based on all the rows in the same partition as the current row.

You can use window functions with PARTITION BY and ORDER BY. PARTITION BY defines the criteria that records must match to be part of the window frame. ORDER BY determines the order of the records.

OVER, PARTITION BY, and ORDER BY syntax:

OVER ([PARTITION BY <columns>] [ORDER BY <columns>])

ROWS BETWEEN is used to specify a window frame in relation to the current row.

ROWS BETWEEN syntax:

OVER ([PARTITION BY <columns>] [ORDER BY <columns>] [ROWS BETWEEN <lower_bound> AND <upper_bound>])

The bounds in ROWS BETWEEN can be anyone of these five things:

  • UNBOUNDED PRECEDING: All rows before the current row.

  • n PRECEDING: n rows before the current row.

  • CURRENT ROW: Just the current row.

  • n FOLLOWING: n rows after the current row.

  • UNBOUNDED FOLLOWING: All rows after the current row.

Use WINDOW to create a window clause that separates a window function from the SELECT clause.

WINDOW syntax:

OVER w FROM WINDOW w AS ([PARTITION BY <columns>] [ORDER BY <columns>] [ROWS BETWEEN <lower_bound> AND <upper_bound>]) Examples

  • Using a window function over all the rows of a result set

  • Ordering the records in a window frame

  • Partitioning the records in a window frame

  • Ordering and partitioning the records in a window frame

  • Using a window clause

  • Using ROWS BETWEEN in a window clause

These examples use sales data in a database table called sales_data, like this:

id

sale_time

branch

item

quantity

total

1

2021-08-11

New York

Watch

1

100

2

2021-08-11

Chicago

Watch

2

200

3

2021-08-12

Chicago

Necklace

3

600

4

2021-08-13

Phoenix

Ring

1

250

5

2021-08-13

New York

Ring

1

250

6

2021-08-14

Miami

Watch

2

200

Using a window function over all the rows of a result set

If you use OVER without defining a PARTITION BY, ORDER BY, or ROWS clause when using OVER, the calculation is performed on a window containing all the rows in the record set. Here is an example query to get a summary of sales:

SELECT branch, SUM(total) OVER() AS sum FROM sales_data;

Results:

branch

sum

New York

1600

Chicago

1600

Chicago

1600

Phoenix

1600

New York

1600

Miami

1600

The amount in the sum column is a sum of all the values in the table.

Ordering the records in a window frame

If you combine an ORDER BY clause with OVER, aggregation is performed against the current row and all previous rows in the result set. This is because, by default, window frames use UNBOUNDED PROCEEDING for aggregation.

This example query also gets a summary of sales, but it orders the results by the time column:

SELECT branch, SUM(total) OVER(ORDER BY id) AS sum FROM sales_data;

Results:

branch

sum

New York

100

Chicago

300

Chicago

900

Phoenix

1150

New York

1400

Miami

1600

The amount in the sum is a running total of sales.

If you order the results by a column that contains duplicate values, the results turn out differently. For example:

SELECT branch, SUM(total) OVER(ORDER BY sale_time) AS sum FROM sales_data;

Results:

branch

sum

New York

300

Chicago

300

Chicago

900

Phoenix

1400

New York

1400

Miami

1600

The aggregate sum is still a running total but it is not the same as in the previous example. That is because the window includes all preceding rows, and also includes rows where the sale times match.

Partitioning the records in a window frame

PARTITION BY works like GROUP BY in a window frame. It groups all the results by the condition you set. This example uses GROUP BY to get a sum of sales for each branch in the data:

SELECT branch, SUM(total) AS sum FROM sales_data sd GROUP BY branch;

Results:

branch

sum

Chicago

800

New York

350

Miami

200

Phoenix

250

This example uses PARTITION BY on the window frame:

SELECT id, branch, SUM(total) OVER(PARTITION BY branch) AS sum FROM sales_data;

Results:

id

branch

sum

2

Chicago

800

3

Chicago

800

6

Miami

200

1

New York

350

5

New York

350

4

Phoenix

250

The sums are the same in both examples, but the second example did not require them to be grouped.

Ordering and partitioning the records in a window frame

When you use both ORDER BY and PARTITION BY in OVER, you can specify the order of the results in each partition to which you apply the window function. This example retrieves a running total of sales by location in the data set:

SELECT sale_time, branch, SUM(total) OVER(PARTITION BY branch ORDER BY sale_time) AS sum FROM sales_data;

Results:

sale_time

branch

sum

2021-08-11

Chicago

200

2021-08-12

Chicago

800

2021-08-14

Miami

200

2021-08-11

New York

100

2021-08-13

New York

350

2021-08-13

Phoenix

250

Using a window clause

If you don’t want to use an inline window function, you can convert it to a window clause. Here is the previous example query rewritten with a window clause. It returns the same results in both formats. This is useful if you want to use multiple window functions in your query:

SELECT sale_time, branch, SUM(total) OVER w AS sum FROM sales_data WINDOW w AS (PARTITION BY branch ORDER BY sale_time);

Using ROWS BETWEEN in a window clause

These examples use a dataset containing the precipitation and temperature data from a couple of cities over five days. This data is in a table called city_data:

date

city

temperature

precipitation

2021-09-01

Miami

65.30

0.28

2021-09-01

Atlanta

63.14

0.20

2021-09-02

Miami

64.40

0.79

2021-09-02

Atlanta

62.60

0.59

2021-09-03

Miami

68.18

0.47

2021-09-03

Atlanta

66.20

0.39

2021-09-04

Miami

68.36

0.00

2021-09-04

Atlanta

67.28

0.00

2021-09-05

Miami

72.50

0.00

2021-09-05

Atlanta

68.72

0.00

When you use ROWS BETWEEN in a window clause, the ORDER BY clause works a bit differently.

When you use ORDER BY in your window frame, the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. However, if you don’t use ORDER BY, the default frame is ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

It’s important to think about how you want to use the ORDER BY clause in your window frame, especially when you also are using a ROWS clause.

For example, If you want to calculate a three-day moving average of the temperatures in each city, you can use this query:

SELECT city, date, temperature,     AVG(temperature) OVER (       PARTITION BY city       ORDER BY date DESC       ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) avg_3day FROM city_data ORDER BY city, date;

To get a three-day moving average of the temperature for each city, start by partitioning the window frame by the city. Then, you have to order the date in each city partition so that you can select a three-day set of rows based on the location of the current row. You can then order the date in descending order and use the current row and the next two rows to calculate the average temperature:

Results:

city

date

temperature

avg_3day

Atlanta

2021-09-01

63.14

63.14

Atlanta

2021-09-02

62.60

62.87

Atlanta

2021-09-03

66.20

63.98

Atlanta

2021-09-04

67.28

65.36

Atlanta

2021-09-05

68.72

67.4

Miami

2021-09-01

65.30

65.3

Miami

2021-09-02

64.40

64.85

Miami

2021-09-03

68.18

65.96

Miami

2021-09-04

68.36

66.98

Miami

2021-09-05

72.50

69.68

Because the ROWS clause depends on the ORDER BY clause in the window frame, you can get the same results by ordering the dates ascending in the window frame and using the current row plus the two preceding rows to calculate the average, like this:

SELECT city, date, temperature,     AVG(temperature) OVER (       PARTITION BY city       ORDER BY date ASC       ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) avg_3day FROM city_data ORDER BY city, date;

More PostgreSQL Window Functions

CUME_DIST

CUME_DIST() calculates the cumulative distribution of a value in a set of values. This function can be particularly useful in statistical analysis.

SELECT salesperson_id, COUNT(*), CUME_DIST() OVER (ORDER BY COUNT(*) DESC) FROM sales GROUP BY salesperson_id;

DENSE_RANK

DENSE_RANK() assigns a rank to each row within a window partition without gaps in ranking values.

SELECT salesperson_id, COUNT(*), DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) FROM sales GROUP BY salesperson_id;

To learn more about how to use RANK() and DENSE_RANK(), check out Understanding RANK() and DENSE_RANK() in PostgreSQL.

FIRST_VALUE

FIRST_VALUE() returns the first value in an ordered set of values. SELECT product_name, sales, FIRST_VALUE(product_name) OVER (ORDER BY sales DESC) FROM product_sales;

LAG

LAG() fetches the value from a previous row in the same result set.

SELECT product_name, sales, LAG(sales) OVER (ORDER BY sales) FROM product_sales;

LAST_VALUE

LAST_VALUE() returns the last value in an ordered set of values.

SELECT product_name, sales, LAST_VALUE(product_name) OVER (ORDER BY sales DESC) FROM product_sales;

LEAD

LEAD() fetches the value from a subsequent row in the same result set.

SELECT product_name, sales, LEAD(sales) OVER (ORDER BY sales) FROM product_sales;

NTILE

NTILE(n) divides an ordered result set into n number of approximately equal groups.

SELECT product_name, sales, NTILE(4) OVER (ORDER BY sales) FROM product_sales;

NTH_VALUE

NTH_VALUE(n) returns the nth row's value from the window frame's first row.

SELECT product_name, sales, NTH_VALUE(product_name, 2) OVER (ORDER BY sales DESC) FROM product_sales;

PERCENT_RANK

PERCENT_RANK() calculates the percentage rank of a value within a group of values.

SELECT salesperson_id, COUNT(*), PERCENT_RANK() OVER (ORDER BY COUNT(*) DESC) FROM sales GROUP BY salesperson_id;

RANK

RANK() provides a unique rank to each distinct row within a window partition.

SELECT salesperson_id, COUNT(*), RANK() OVER (ORDER BY COUNT(*) DESC) FROM sales GROUP BY salesperson_id;

ROW_NUMBER

ROW_NUMBER() assigns a unique row number to each row within a window partition.

SELECT salesperson_id, COUNT(*), ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) FROM sales GROUP BY salesperson_id;

Further Reading

For more information about window functions and how you can use them in PostgreSQL, see the PostgreSQL documentation. To find out more about how window functions are processed in PostgreSQL, see this section of the PostgreSQL documentation. And for more details on the syntax of window functions, see this section. For more examples of how to use window functions in your queries, check out these Timescale documentation sections: