Start supercharging your PostgreSQL today.
Written by Team Timescale
In PostgreSQL or TimescaleDB, you may often need to eliminate duplicate rows from your query results to get a unique set of records. The DISTINCT
clause in PostgreSQL is designed to do just that, ensuring your results contain only unique rows.
The basic syntax of the DISTINCT
clause is as follows:
SELECT DISTINCT column1, column2, ...
FROM table;
You can also use DISTINCT ON
to return the first row of each group of duplicates based on a specified column or set of columns:
SELECT DISTINCT ON (column1) column1, column2, ...
FROM table
ORDER BY column1, column2, ...;
Let's consider a table weather_data
with the following structure and data:
CREATE TABLE weather_data (
id SERIAL PRIMARY KEY,
device_id VARCHAR(50),
temperature NUMERIC,
humidity NUMERIC,
wind_speed NUMERIC,
recorded_at TIMESTAMP
);
INSERT INTO weather_data (device_id, temperature, humidity, wind_speed, recorded_at) VALUES
('device_1', 22.5, 55.0, 12.5, '2023-01-15 08:00:00'),
('device_2', 21.0, 60.0, 10.0, '2023-01-15 08:05:00'),
('device_1', 23.0, 57.0, 11.0, '2023-01-15 08:10:00'),
('device_2', 19.5, 62.0, 8.0, '2023-01-15 08:15:00'),
('device_1', 20.0, 59.0, 9.5, '2023-01-15 08:20:00'),
('device_3', 24.0, 54.0, 13.0, '2023-01-15 08:25:00'),
('device_2', 18.5, 63.0, 7.5, '2023-01-15 08:30:00'),
('device_3', 22.0, 55.5, 12.0, '2023-01-15 08:35:00'),
('device_1', 21.5, 58.0, 10.5, '2023-01-15 08:40:00'),
('device_3', 23.5, 53.0, 13.5, '2023-01-15 08:45:00');
Learn how the Timescale Team made DISTINCT queries up to 8,000x faster on PostgreSQL.
Let's dive into some examples to see the DISTINCT
clause in action.
Example 1: Selecting distinct values from a single column
Suppose we want to retrieve all unique device_id
values from the weather_data
table. We can use the DISTINCT
clause to achieve this:
SELECT DISTINCT device_id
FROM weather_data;
Result:
device_id
-----------
device_1
device_2
device_3
This query returns a unique set of device_id
values from the weather_data
table.
Example 2: Selecting distinct combinations of multiple columns
Now, let's say we want to retrieve unique combinations of device_id
and humidity
. We can use the DISTINCT
clause on multiple columns:
SELECT DISTINCT device_id, humidity
FROM weather_data;
Result:
device_id | humidity
-----------+----------
device_1 | 55.0
device_2 | 60.0
device_1 | 57.0
device_2 | 62.0
device_1 | 59.0
device_3 | 54.0
device_2 | 63.0
device_3 | 55.5
device_1 | 58.0
device_3 | 53.0
This query returns unique combinations of device_id
and humidity
from the weather_data
table.
Example 3: Using DISTINCT ON to get the first row of each group
Suppose we want to retrieve the first recorded temperature for each device. We can use DISTINCT ON
to achieve this:
SELECT DISTINCT ON (device_id) device_id, temperature, recorded_at
FROM weather_data
ORDER BY device_id, recorded_at;
Result:
device_id | temperature | recorded_at
-----------+-------------+---------------------
device_1 | 22.5 | 2023-01-15 08:00:00
device_2 | 21.0 | 2023-01-15 08:05:00
device_3 | 24.0 | 2023-01-15 08:25:00
This query returns the first recorded temperature for each device_id
based on the recorded_at
timestamp.
The DISTINCT
clause is a powerful tool in PostgreSQL for removing duplicate rows from your query results. To learn more about this PostgreSQL clause, check the official documentation. And if to get the behind-the-scenes story of how the Timescale Team made DISTINCT queries up to 8,000x faster, read our blog post.