Understanding Foreign Keys in PostgreSQL

A colorful key over a dark blue background

Foreign keys in PostgreSQL are a fundamental way of defining relationships between tables. Splitting data across multiple tables is foundational to relational databases and data normalization. However, when spreading related data across various tables, we must ensure that values in one table are valid and exist in another. Enter foreign keys.

In this article, we'll see examples of how to use foreign keys in PostgreSQL, understand what they're used for, and apply different variations of foreign key operations to maintain consistency.

What Is a Foreign Key?

A foreign key "links" two tables together. It does this through a column in a child table, which stores the primary key of a row in a parent table. A foreign key is a constraint we add to the child table column that enforces the primary key to exist in the parent table. This is called referential integrity. By maintaining referential integrity, we keep our databases maintainable. PostgreSQL will throw an error if someone inserts a row into a child table with an id that doesn't exist in the parent table.

Using PostgreSQL Foreign Keys

To really get a grasp of foreign keys, let's look at an example through the lens of an online store. We have the following tables to track orders customers have made:

CREATE TABLE customers ( id SERIAL PRIMARY KEY, name TEXT, address TEXT );

CREATE TABLE orders ( id SERIAL PRIMARY KEY, price DECIMAL );

Right now, there are no foreign keys or links between the customers and orders tables. To ensure every order belongs to a customer, we can add a new column and foreign key to an existing table using ALTER TABLE.

ALTER TABLE orders ADD COLUMN customer_id INT NOT NULL CONSTRAINT fk_customers REFERENCES customers(id); -- the foreign key

In this statement, we added a new column to the orders table called customer_id, which will hold the primary key (ID) from the customers table. Next, we tell PostgreSQL to create a constraint named fk_customers on the customer_id column to enforce any data in that column is found in the id columns of the customers table. This constraint is the foreign key. By convention, we name foreign keys with the prefix fk_[name] to differentiate foreign keys from other constraints and indexes.

Now add customer and order data:

INSERT INTO customers (name, address) VALUES ('Dylan', '1234 E Street'); INSERT INTO customers (name, address) VALUES ('Karisa', '345 Oak Lane'); INSERT INTO customers (name, address) VALUES ('Nora', '1111 Red Blvd');

INSERT INTO orders (price, customer_id) VALUES (3.50, 1); INSERT INTO orders (price, customer_id) VALUES (8.75, 3); INSERT INTO orders (price, customer_id) VALUES (1.20, 1);

We can observe the power of foreign keys by inserting a new row into the order table using a customer_id that does not exist in the customers table.

INSERT INTO orders (price, customer_id) VALUES (1.53, 40);

Our foreign key works! Orders must have a valid customer associated with them. If an invalid customer_id is given, then PostgreSQL will prevent the insert from running.

As we've seen, foreign keys allow us to split tables apart to normalize data and maintain relationships. They help keep referential integrity by guaranteeing that the primary key stored in the child table is, in fact, an actual entry in the parent table. But this isn't everything. Additionally, we can set up foreign key operations to modify all related rows simultaneously.

Defining Foreign Keys With CREATE TABLE

In the orders to customers example, we first created both tables and then added a foreign key from orders to customers using customer_id. This is an example of adding a foreign key *after* a table is created, but what if we know the parent and child tables while creating a table? We can add a constraint during table creation:

CREATE TABLE customers ( id SERIAL PRIMARY KEY, name TEXT, address TEXT );

CREATE TABLE orders ( id SERIAL PRIMARY KEY, price DECIMAL, customer_id int CONSTRAINT fk_customers REFERENCES customers(id) );

When defining a foreign key, we have several options at our disposal. For instance, we don't necessarily need to assign a name to every foreign key; PostgreSQL can generate a name for us. Let's consider the process of removing the CONSTRAINT fk_customers:

Additionally, PostgreSQL will assume that foreign keys will reference the primary key of a parent table, so we can omit (id) when creating the foreign key.

PostgreSQL Foreign Key Operations

What happens when a parent table deletes a row that a child table relies on through a foreign key? Using the example above, what happens to customers' orders if a customer is deleted? Let's try it.

DELETE FROM customers WHERE id=1;

By default, PostgreSQL employs the NO ACTION operation on any foreign keys. This means that if a child table's row still references a parent table, PostgreSQL will not allow the deletion of the parent row, throwing an error instead. However, we can work around this by first deleting any order associated with customer.id=1, and then we will be allowed to delete the customer.

NO ACTION prevents deleting a parent row until all associated child rows are deleted. Additionally, NO ACTION prevents someone from updating the primary key of the parent row while it has relations.

Foreign key operations allow us to define what happens to child rows if a parent is updated or deleted. This is done by adding ON UPDATE or ON DELETE to the constraint.

For example, If we wanted to define the NO ACTION operation on delete explicitly, we create our table like so:

CREATE TABLE orders ( id SERIAL PRIMARY KEY, price DECIMAL, customer_id int REFERENCES customers ON DELETE NO ACTION );

Of course, PostgreSQL gives us different options for how associated rows behave when a parent row is removed through foreign key operations.

Restriction

RESTRICT works exactly the same as NO ACTION, except the constraint check gets deferred until the end of the query transaction. This can be useful for complex scripts, but generally, NO ACTION will be used instead.

CREATE TABLE orders ( id SERIAL PRIMARY KEY, price DECIMAL, customer_id int REFERENCES customers ON DELETE RESTRICT );

Set null

SET NULL will replace the value of the child column with NULL when the parent is deleted. This can be useful when we want to track child references even after deleting a parent.

CREATE TABLE orders ( id SERIAL PRIMARY KEY, price DECIMAL, customer_id int REFERENCES customers ON DELETE SET NULL );

When using SET NULL with ON UPDATE, all orders associated with them will be set to NULL whenever the customer's primary key changes.

CREATE TABLE orders ( id SERIAL PRIMARY KEY, price DECIMAL, customer_id int REFERENCES customers ON UPDATE SET NULL );

Cascade

CASCADE will remove all associated rows when a parent table row is deleted. From our example, if a customer is deleted, all their orders will also be deleted. Using CASCADE is a great way to keep your database clean.

CREATE TABLE orders ( id SERIAL PRIMARY KEY, price DECIMAL, customer_id int REFERENCES customers ON DELETE CASCADE );

Paired with ON UPDATE, CASCADE will update all child rows with the new primary key when the parent row is updated.

CREATE TABLE orders ( id SERIAL PRIMARY KEY, price DECIMAL, customer_id int REFERENCES customers ON UPDATE CASCADE );

Set default

SET DEFAULT, like SET NULL, will update the child row when the parent row is removed, except it sets the value to the column's default value.

Say we have a system customer to whom all orders go when a customer is deleted. By first setting customer_id to a default value and using SET DEFAULT, we can ensure that all orders are automatically assigned to the system customer.

INSERT INTO customers (id, name, address) VALUES (0, 'system', '127.0.0.1');

CREATE TABLE orders ( id SERIAL PRIMARY KEY, price DECIMAL, customer_id int DEFAULT 0 REFERENCES customers ON DELETE SET DEFAULT -- Assign to customer with id 0 );

Similarly, when using the ON UPDATE operation, the child rows will be set to the default value when the parent row is updated.

CREATE TABLE orders ( id SERIAL PRIMARY KEY, price DECIMAL, customer_id int DEFAULT 0 REFERENCES customers ON UPDATE SET DEFAULT );

Conclusion

Foreign keys are a powerful feature in PostgreSQL that enables us to maintain data integrity and establish meaningful relationships between tables. By enforcing referential integrity, foreign keys ensure that our database remains consistent and reliable even as data evolves.

If you have a large PostgreSQL database with a massive number of tables linked via foreign keys and want to make the move to Timescale, as of TimescaleDB 2.16.0, we support foreign keys between regular PostgreSQL tables and hypertables. Hypertables work like regular PostgreSQL tables but automatically partition your data, speeding up your queries and ingests. This performance boost will sustain as your tables' volume keeps growing, making hypertables extremely scalable.

Hypertables are at the core of TimescaleDB, along with other native features, such as a powerful hybrid row-columnar storage engine that allows you to reduce your storage footprint by 90-95 % and always up-to-date materialized views to rev up use cases like real-time analytics.

If this piqued your interest, remember: you can now convert any PostgreSQL table with a foreign key into a hypertable without having to worry about your schema. Easy as that.

Create a free Timescale account and try it for yourself today.