Understanding DISTINCT in PostgreSQL (With Examples)

Try for free

Start supercharging your PostgreSQL today.

Abstract light blue triangle over a black background

Written by Team Timescale

In PostgreSQL or TimescaleDB, you may often need to eliminate duplicate rows from your query results to get a unique set of records. The DISTINCT clause in PostgreSQL is designed to do just that, ensuring your results contain only unique rows.

PostgreSQL DISTINCT Syntax

The basic syntax of the DISTINCT clause is as follows:

SELECT DISTINCT column1, column2, ... FROM table;

You can also use DISTINCT ON to return the first row of each group of duplicates based on a specified column or set of columns:

SELECT DISTINCT ON (column1) column1, column2, ... FROM table ORDER BY column1, column2, ...;

Example table

Let's consider a table weather_data with the following structure and data:

CREATE TABLE weather_data (     id SERIAL PRIMARY KEY,     device_id VARCHAR(50),     temperature NUMERIC,     humidity NUMERIC,     wind_speed NUMERIC,     recorded_at TIMESTAMP );

INSERT INTO weather_data (device_id, temperature, humidity, wind_speed, recorded_at) VALUES ('device_1', 22.5, 55.0, 12.5, '2023-01-15 08:00:00'), ('device_2', 21.0, 60.0, 10.0, '2023-01-15 08:05:00'), ('device_1', 23.0, 57.0, 11.0, '2023-01-15 08:10:00'), ('device_2', 19.5, 62.0, 8.0, '2023-01-15 08:15:00'), ('device_1', 20.0, 59.0, 9.5, '2023-01-15 08:20:00'), ('device_3', 24.0, 54.0, 13.0, '2023-01-15 08:25:00'), ('device_2', 18.5, 63.0, 7.5, '2023-01-15 08:30:00'), ('device_3', 22.0, 55.5, 12.0, '2023-01-15 08:35:00'), ('device_1', 21.5, 58.0, 10.5, '2023-01-15 08:40:00'), ('device_3', 23.5, 53.0, 13.5, '2023-01-15 08:45:00');

Learn how the Timescale Team made DISTINCT queries up to 8,000x faster on PostgreSQL.

Practical examples using DISTINCT

Let's dive into some examples to see the DISTINCT clause in action.

Example 1: Selecting distinct values from a single column

Suppose we want to retrieve all unique device_id values from the weather_data table. We can use the DISTINCT clause to achieve this:

SELECT DISTINCT device_id FROM weather_data;

Result:

 device_id  -----------  device_1    device_2    device_3  

This query returns a unique set of device_id values from the weather_data table.

Example 2: Selecting distinct combinations of multiple columns

Now, let's say we want to retrieve unique combinations of device_id and humidity. We can use the DISTINCT clause on multiple columns:

SELECT DISTINCT device_id, humidity FROM weather_data;

Result:  device_id | humidity  -----------+----------  device_1  |     55.0   device_2  |     60.0   device_1  |     57.0   device_2  |     62.0   device_1  |     59.0   device_3  |     54.0   device_2  |     63.0   device_3  |     55.5   device_1  |     58.0   device_3  |     53.0 

This query returns unique combinations of device_id and humidity from the weather_data table.

Example 3: Using DISTINCT ON to get the first row of each group

Suppose we want to retrieve the first recorded temperature for each device. We can use DISTINCT ON to achieve this:

SELECT DISTINCT ON (device_id) device_id, temperature, recorded_at FROM weather_data ORDER BY device_id, recorded_at;

Result:

 device_id | temperature |     recorded_at       -----------+-------------+---------------------  device_1  |        22.5 | 2023-01-15 08:00:00   device_2  |        21.0 | 2023-01-15 08:05:00   device_3  |        24.0 | 2023-01-15 08:25:00 

This query returns the first recorded temperature for each device_id based on the recorded_at timestamp.

Conclusion

The DISTINCT clause is a powerful tool in PostgreSQL for removing duplicate rows from your query results. To learn more about this PostgreSQL clause, check the official documentation. And if to get the behind-the-scenes story of how the Timescale Team made DISTINCT queries up to 8,000x faster, read our blog post.