In PostgreSQL, rank()
and dense_rank()
are general-purpose window functions that are used to order values and assign them rank in the form of numbers on where they fall in relation to each other.
Both of these functions are used with an OVER
clause along with either PARTITION BY
, ORDER BY
, or ROWS BETWEEN
. When used with rank()
and dense_rank()
, they affect the windows in these ways:
ORDER BY
: specifies the column whose values you wish to rank
PARTITION BY
: groups the rankings
The difference between rank()
and dense_rank()
is in how they handle identical values. Let’s say that you are assigning rank based on a grade value, and you end up with two results that are both equal to 85. If you use rank()
, both of these values will be given the same rank, and the next rank will be skipped. So if they both tied for 3, then they will both be given a 3, 4 will be skipped, and the next highest rank will be 5. If you use dense_rank()
on the same example, 4 won’t be skipped. It will be the next rank value.
rank() syntax:
rank () → bigint
dense_rank() syntax:
dense_rank () → bigint
rank() syntax with window functions:
rank() OVER ([PARTITION BY <columns>] [ORDER BY <columns>])
dense_rank() syntax with window functions:
dense_rank() OVER ([PARTITION BY <columns>] [ORDER BY <columns>])
The data set that we will be using for examples contains the grades of three students in four subjects.
student | subject | grade |
Jim | Science | 84 |
Jim | Math | 93 |
Jim | History | 79 |
Jim | English | 75 |
Mary | Science | 81 |
Mary | Math | 81 |
Mary | History | 80 |
Mary | English | 88 |
Sam | Science | 84 |
Sam | Math | 90 |
Sam | History | 79 |
Sam | English | 92 |
If we wanted to know which students needed the most help in specific subjects, we could run a query like this:
SELECT
student,
subject,
grade,
rank() OVER(ORDER BY grade DESC)
FROM grades;
We want the rank of 1 to go to the highest grade. In order to do that, we order the results in the window frame in descending order so the highest grades are at the top.
Here are the results:
student | subject | grade | rank |
Jim | Math | 93 | 1 |
Sam | English | 92 | 2 |
Sam | Math | 90 | 3 |
Mary | English | 88 | 4 |
Sam | Science | 84 | 5 |
Jim | Science | 84 | 5 |
Mary | Math | 81 | 7 |
Mary | Science | 81 | 7 |
Mary | History | 80 | 9 |
Jim | History | 79 | 10 |
Sam | History | 79 | 10 |
Jim | English | 75 | 12 |
From this, we can tell that Jim needs help in both History and English even though he is at the top of the class in Math. Sam also needs help with History. Now, you will notice that both 7 and 10 are repeated, and there is no 8 or 11. This is because we are using rank()
. When we use dense_rank()
instead, there will be no gaps.
Here is that result:
student | subject | grade | dense_rank |
Jim | Math | 93 | 1 |
Sam | English | 92 | 2 |
Sam | Math | 90 | 3 |
Mary | English | 88 | 4 |
Sam | Science | 84 | 5 |
Jim | Science | 84 | 5 |
Mary | Math | 81 | 6 |
Mary | Science | 81 | 6 |
Mary | History | 80 | 7 |
Jim | History | 79 | 8 |
Sam | History | 79 | 8 |
Jim | English | 75 | 9 |
If we want to see how each student ranks per subject, we can add a PARTITION BY
clause, which can group the records in the window. Here is the query:
SELECT
student,
subject,
RANK() OVER(PARTITION BY subject ORDER BY grade DESC)
FROM
grades
ORDER BY student, rank;
And here are the results:
student | subject | rank |
Jim | Math | 1 |
Jim | Science | 1 |
Jim | History | 2 |
Jim | English | 3 |
Mary | History | 1 |
Mary | English | 2 |
Mary | Science | 3 |
Mary | Math | 3 |
Sam | English | 1 |
Sam | Science | 1 |
Sam | History | 2 |
Sam | Math | 2 |
To see how each student’s average grade across all the subjects rank, we can use a subquery to do that. Here is that query:
SELECT
student,
average,
RANK() OVER(ORDER BY average DESC)
FROM
(
SELECT
student,
avg(grade) average
FROM
grades
GROUP BY
student) AS subquery;
And here are the results:
student | average | rank |
Sam | 86.25 | 1 |
Jim | 82.75 | 2 |
Mary | 82.50 | 3 |
For this example, let’s use another data set. This one contains the temperature and precipitation data for a couple of cities.
id | day | city | temperature | precipitation |
17 | 2021-09-04 | Miami | 68.36 | 0.00 |
19 | 2021-09-05 | Miami | 72.50 | 0.00 |
11 | 2021-09-01 | Miami | 65.30 | 0.28 |
13 | 2021-09-02 | Miami | 64.40 | 0.79 |
18 | 2021-09-04 | Atlanta | 67.28 | 0.00 |
12 | 2021-09-01 | Atlanta | 63.14 | 0.20 |
14 | 2021-09-02 | Atlanta | 62.60 | 0.59 |
16 | 2021-09-03 | Atlanta | 62.60 | 0.39 |
15 | 2021-09-03 | Miami | 71.60 | 0.47 |
20 | 2021-09-05 | Atlanta | 70.80 | 0.00 |
Let’s say we have a temperature, and we want to see where it ranks in relation to the temperatures in our table. For example, we want to see how 68 degrees ranks in both Miami and Atlanta for this range of dates.
Here is the query:
SELECT
city,
rank(68) WITHIN GROUP (
ORDER BY temperature DESC)
FROM city_data
GROUP BY city;
We grouped the results by the city so we get the rank of 68 degrees in each. Then we use the WITHIN GROUP
clause, which specifies how to sort the rows that are grouped by the aggregate function. Here, we order the rows by the descending temperature because we want the highest temperature to get a rank of 1.
Here are the results:
city | rank |
Atlanta | 2 |
Miami | 4 |
This tells us that 68 degrees would be the second-highest temperature for Atlanta and the fourth-highest temperature in Miami for the date range in the table.
To learn more about rank()
and dense_rank()
and how to use them in PostgreSQL, check out PostgreSQL’s documentation on window functions. For more examples on how to use them in your own TimescaleDB SQL queries, see these Timescale documentation sections: