Understanding PostgreSQL WITHIN GROUP

The WITHIN GROUP clause is useful in PostgreSQL and TimescaleDB when performing aggregations on ordered subsets of data. It essentially allows you to pivot multiple rows of data onto a single row. This clause simplifies many operations that were once only possible through the use of window functions in combination with a common table expression or CTE.

The expressions preceding WITHIN GROUP are called direct arguments, which are only evaluated once per aggregate call and not for each row in the input. Direct arguments are commonly used for percentile fractions and other aggregate calculations that make sense when you use a single value per calculation. The records in a WITHIN GROUP clause require a group of rows organized by the ORDER BY clause.

Here's the WITHIN GROUP syntax:

SELECT <aggregate_function> WITHIN GROUP (ORDER BY <sort_expression>) FROM <table>;

Examples

Using WITHIN GROUP with percentile functions
Finding the most frequent value in a series of data
Finding the rank of a value in a series of data
Using WITHIN GROUP with GROUP BY

Let’s say you have this data in a database table called city_data:

id	day	city	temperature	precipitation
17	2021-09-04	Miami	68.36	0.00
19	2021-09-05	Miami	72.50	0.00
11	2021-09-01	Miami	65.30	0.28
13	2021-09-02	Miami	64.40	0.79
15	2021-09-03	Miami	68.18	0.47
18	2021-09-04	Atlanta	67.28	0.00
20	2021-09-05	Atlanta	68.72	0.00
12	2021-09-01	Atlanta	63.14	0.20
14	2021-09-02	Atlanta	62.60	0.59
16	2021-09-03	Atlanta	62.60	0.39

Here are some examples of using WITHIN GROUP in SQL queries.

Using WITHIN GROUP with percentile functions

PostgreSQL and TimescaleDB have two statistical window functions that are often used with WITHIN GROUP. Both percentile_cont and percentile_disc calculate a percentile, but they do so in different ways.

Here is an example of using percentile_disc to aggregate:

SELECT percentile_disc(0.5) WITHIN GROUP ( ORDER BY temperature) FROM city_data;

The result is:

percentile_disc

65.30

Notice that this result is a value that exists in the data. The WITHIN GROUP clause specifies the numeric values that are used by the percentile function. In this case, percentile_disc returns a discrete value from the data set. On the other hand, percentile_cont interpolates the values. Here is the query using percentile_cont:

SELECT percentile_cont(0.5) WITHIN GROUP ( ORDER BY temperature) FROM city_data;

The result is:

percentile_cont

66.28999999999999

Finding the most frequent value in a series of data

The mode() function returns the most commonly occurring value specific in the data set specified by WITHIN GROUP. Here is an example query using mode() on the city_data data set.

SELECT mode() WITHIN GROUP ( ORDER BY temperature) FROM city_data;

And here is the result, because 62.60 is the only value that occurs twice in the temperature column:

mode

62.60

Finding the rank of a value in a series of data

If you want to know how a specific temperature in the city_data table ranks within the group that you have designated in the WITHIN GROUP clause, you can use the rank function. Here is an example:

SELECT rank(72.5) WITHIN GROUP ( ORDER BY temperature) FROM city_data;

And here is the result:

rank

The temperature 72.5 is the highest value in the table because it is sorted by default in ascending order; that value ends up at the end of the list of values. The following query gives a different result:

SELECT rank(72.5) WITHIN GROUP ( ORDER BY temperature DESC) FROM city_data;

rank

Using WITHIN GROUP with GROUP BY

Using WITHIN GROUP across a complete table is not very useful. In this query, the cities in the data set are grouped, so that the results have the 50-percentile of each city in a separate row:

SELECT city, percentile_cont(0.5) WITHIN GROUP ( ORDER BY temperature) FROM city_data GROUP BY city;

Results:

city	percentile_cont
Atlanta	63.14
Miami	68.18

Next Steps

To learn more about WITHIN GROUP and how to use it in PostgreSQL, see the aggregate expressions, ordered set aggregate functions, and hypothetical set aggregate functions in PostgreSQL documentation.

For more examples of how to use WITHIN GROUP in your own queries, see these Timescale resources: