Understanding PostgreSQL WITHIN GROUP

Abstract shapes over a dark background.

The WITHIN GROUP clause is useful in PostgreSQL and TimescaleDB when performing aggregations on ordered subsets of data. It essentially allows you to pivot multiple rows of data onto a single row. This clause simplifies many operations that were once only possible through the use of window functions in combination with a common table expression or CTE.

The expressions preceding WITHIN GROUP are called direct arguments, which are only evaluated once per aggregate call and not for each row in the input. Direct arguments are commonly used for percentile fractions and other aggregate calculations that make sense when you use a single value per calculation. The records in a WITHIN GROUP clause require a group of rows organized by the ORDER BY clause.

Here's the WITHIN GROUP syntax:

SELECT <aggregate_function> WITHIN GROUP (ORDER BY <sort_expression>) FROM <table>;

Examples

Let’s say you have this data in a database table called city_data:

id

day

city

temperature

precipitation

17

2021-09-04

Miami

68.36

0.00

19

2021-09-05

Miami

72.50

0.00

11

2021-09-01

Miami

65.30

0.28

13

2021-09-02

Miami

64.40

0.79

15

2021-09-03

Miami

68.18

0.47

18

2021-09-04

Atlanta

67.28

0.00

20

2021-09-05

Atlanta

68.72

0.00

12

2021-09-01

Atlanta

63.14

0.20

14

2021-09-02

Atlanta

62.60

0.59

16

2021-09-03

Atlanta

62.60

0.39

Here are some examples of using WITHIN GROUP in SQL queries.

Using WITHIN GROUP with percentile functions

PostgreSQL and TimescaleDB have two statistical window functions that are often used with WITHIN GROUP. Both percentile_cont and percentile_disc calculate a percentile, but they do so in different ways.

Here is an example of using percentile_disc to aggregate:

SELECT percentile_disc(0.5) WITHIN GROUP ( ORDER BY temperature) FROM city_data;

The result is:

percentile_disc

65.30

Notice that this result is a value that exists in the data. The WITHIN GROUP clause specifies the numeric values that are used by the percentile function. In this case, percentile_disc returns a discrete value from the data set. On the other hand, percentile_cont interpolates the values. Here is the query using percentile_cont:

SELECT percentile_cont(0.5) WITHIN GROUP ( ORDER BY temperature) FROM city_data;

The result is:

percentile_cont

66.28999999999999

Finding the most frequent value in a series of data

The mode() function returns the most commonly occurring value specific in the data set specified by WITHIN GROUP. Here is an example query using mode() on the city_data data set.

SELECT mode() WITHIN GROUP ( ORDER BY temperature) FROM city_data;

And here is the result, because 62.60 is the only value that occurs twice in the temperature column:

mode

62.60

Finding the rank of a value in a series of data

If you want to know how a specific temperature in the city_data table ranks within the group that you have designated in the WITHIN GROUP clause, you can use the rank function. Here is an example:

SELECT rank(72.5) WITHIN GROUP ( ORDER BY temperature) FROM city_data;

And here is the result:

rank

10

The temperature 72.5 is the highest value in the table because it is sorted by default in ascending order; that value ends up at the end of the list of values. The following query gives a different result:

SELECT rank(72.5) WITHIN GROUP ( ORDER BY temperature DESC) FROM city_data;

rank

1

Using WITHIN GROUP with GROUP BY

Using WITHIN GROUP across a complete table is not very useful. In this query, the cities in the data set are grouped, so that the results have the 50-percentile of each city in a separate row:

SELECT city, percentile_cont(0.5) WITHIN GROUP ( ORDER BY temperature) FROM city_data GROUP BY city;

Results:

city

percentile_cont

Atlanta

63.14

Miami

68.18

Next Steps

To learn more about WITHIN GROUP and how to use it in PostgreSQL, see the aggregate expressions, ordered set aggregate functions, and hypothetical set aggregate functions in PostgreSQL documentation.

For more examples of how to use WITHIN GROUP in your own queries, see these Timescale resources: