Start supercharging your PostgreSQL today.
Written by Abhinav D.
If you have worked with production-scale databases in PostgreSQL, you know that effective backups and a well-defined recovery plan are crucial for managing them. Backups protect your data from loss or corruption and enable you to recover your database in case of failures, outages, or human errors. In this guide, we'll explore pg_restore
, a PostgreSQL utility specifically designed to work with logical backups, playing a vital role in database recovery.
pg_restore
allows you to restore a database from a logical backup created by pg_dump
. Logical backups contain SQL commands that recreate the database objects and data, offering flexibility and portability across different PostgreSQL versions and other database systems.
However, it's important to note that pg_restore
and logical backups are not suitable for every situation. They have limitations, especially when dealing with large-scale databases or complex recovery scenarios. In this article, we'll dive deeper into:
Understanding what pg_restore
is and how it works
Identifying when pg_restore
is the right tool for your recovery needs
Exploring practical examples of using pg_restore
effectively
By the end of this guide, you'll have a solid understanding of pg_restore
and how it fits into your overall backup and recovery strategy for PostgreSQL databases.
Logical backups differ from physical backups because they don't contain a direct copy of the database files. Instead, a logical backup is a file that includes a series of SQL commands that, when executed, rebuild the database to its state at the time of the backup.
This approach offers several advantages:
Logical backups are portable across different PostgreSQL versions and can even be migrated to other database systems that support SQL.
They allow for selective restoration of specific database objects, such as tables or schemas.
Logical backups are human-readable and can be inspected or modified if needed.
When creating logical backups of your PostgreSQL database, pg_dump
is the go-to tool. To create a logical backup using pg_dump
, you can use the following basic command:
pg_dump [database_name] -f backup_file.sql
-f or --file: Specifies the file path of the output file.
This command connects to the specified database, retrieves the SQL commands necessary to recreate the database objects and data, and saves them to a file named backup_file.sql.
For example, to create a logical backup of our e-commerce database, you would run:
pg_dump ecommerce -f /tmp/ecommerce_backup.sql
pg_dump
provides many options to customize the backup process. Some commonly used options include:
-U or --username: Specifies the username to connect to the database.
-W or --password: Prompts for the password to authenticate the connection.
-j or --jobs: Enables parallel backup by specifying the number of concurrent jobs.
-F or --format: Specifies the output format of the backup file. The available formats are plain (default), custom, directory, and tar.
For example, to create a backup of e-commerce database in the tar format with username and password authentication, you would run:
pg_dump -U postgres -W -F tar ecommerce -f /tmp/ecommerce_backup.tar
In this example:
-F tar specifies the tar format, which creates an uncompressed tar archive.
The output file extension is .tar to reflect the tar format.
It's important to note that when using pg_dump
for migration purposes, you should use the custom or directory format, as they provide additional features like compression and parallel restoration. You can find the complete list of available options in the PostgreSQL documentation.
In addition to pg_dump
, PostgreSQL also provides pg_dumpall
, which creates a logical backup of an entire PostgreSQL cluster, including all databases, roles, and other cluster-wide objects.
The key difference between pg_dump
and pg_dumpall
is that pg_dumpall
includes cluster-level objects like user roles and permissions while pg_dump
focuses on a single database.
To create a logical backup of an entire cluster using pg_dumpall
, you can use the following command:
pg_dumpall -U postgres -W -f /tmp/cluster_backup.tar
This command will generate an SQL script that includes commands to recreate all databases, roles, and other cluster-wide objects.
Note the multiple password prompts. pg_dumpall
will connect to each database, hence the password prompt per database connection. This is the default behavior when password authentication is used. However, you can use the passfile option if there are multiple databases.
Using pg_dump
and pg_dumpall
, you can create comprehensive logical backups of your PostgreSQL databases and clusters, ensuring you have the necessary data and configuration to restore your database environment.
Once you have created a logical backup using pg_dump
, you can use pg_restore
to rebuild the database from that backup file. pg_restore
is a powerful tool that allows you to restore an entire database or specific parts of it, giving you flexibility in the restoration process.
Let's connect the examples with the previous section, where we created a logical backup of the e-commerce database using pg_dump
. We'll use that backup file to demonstrate how to use pg_restore
.
Simple example To restore the entire e-commerce database from the backup file, you can use the following command:
pg_restore -d ecommerce ecommerce_backup.sql
This command assumes that the target database e-commerce already exists and will restore the data. If the database doesn't exist, you'll need to create it first.
Examples with options
pg_restore
provides several options to customize the restoration process. Here are a few important ones:
-c or --clean: Drops database objects before recreating them. This ensures a clean restoration.
-C or --create: Creates the target database before restoring the data.
-j or --jobs: Specifies the number of concurrent jobs for parallel restoration.
Here's an example that will restore an e-commerce database from a backup file created in the above section:
pg_restore -U postgres -W -C -c --if-exists -d postgres /tmp/ecommerce_backup.tar
In this example,
-C: This option tells pg_restore
to create the database before restoring the data. If the database already exists, pg_restore
will exit with an error unless the --if-exists option is also specified.
-c: This option specifies the "clean" mode for the restore. It tells pg_restore
to drop database objects (tables, functions, etc.) before recreating them. This ensures that the restored database is in a clean state and matches the structure of the backup file.
--if-exists: This option is used with the -C option. It tells pg_restore
to ignore errors if the database being created already exists. If the database exists, pg_restore
will proceed with the restore without attempting to create it again.
-d postgres: This option specifies the name of the database to which to connect initially. In this case, it's the Postgres database, which is the default database that typically exists in PostgreSQL installations. pg_restore
needs to connect to an existing database to create the new database specified in the backup file.
/tmp/ecommerce_backup.tar: This is the path to the backup file that contains the database dump. It should be a valid backup file created by pg_dump
in the "tar" format.
You can find the complete list of available options in the PostgreSQL documentation.
When you run pg_restore
, it follows these steps to rebuild the database:
Reading the backup file: pg_restore
reads the specified backup file, which contains SQL commands generated by pg_dump.
Creating the database (optional): If the -C option is used, pg_restore
creates the target database before proceeding with the restoration.
Dropping existing objects (optional): If the -c option is used, pg_restore
drops any existing database objects before recreating them.
Executing SQL commands: pg_restore
executes the SQL commands from the backup file to recreate the database objects, such as tables, indexes, constraints, and data.
Parallel processing (optional): If the -j option is used, pg_restore
utilizes multiple jobs to execute the SQL commands in parallel, speeding up the restoration process. Parallel processing is available using the custom or directory archive formats in pg_restore
.
Throughout the restoration process, pg_restore
provides flexibility in rebuilding specific parts of the database. You can use options like -t or --table to restore only specific tables, -n or --schema to restore specific schemas, and more. This allows you to restore the desired parts of the database selectively.
Furthermore, if you have created a logical backup of an entire PostgreSQL cluster using pg_dumpall
, you can use pg_restore
to rebuild the entire cluster, including all databases and cluster-wide objects.
By understanding how pg_restore
works, and considering these limitations, you can effectively rebuild your databases from logical backups, ensuring data recovery and migration capabilities.
Let's explore and discuss the specific use cases where pg_restore
shines.
One of the primary use cases for pg_restore
is system migration. When you need to move a PostgreSQL database to a different server, upgrade to a newer version, or switch to a different database system, logical backups created by pg_dump
and restored with pg_restore
can be a viable solution.
The SQL-based structure of logical backups allows for easy transfer across different PostgreSQL versions and even to other SQL-compliant databases. pg_restore
can handle the recreation of database objects and data insertion on the target system, making the migration process more straightforward.
However, it's important to note that migrating large databases using logical backups can be time-consuming and resource-intensive. Migration techniques like replication or parallel restoration might be more suitable.
Another valuable use case for pg_restore
is partial restoration. Sometimes, you may only need to restore specific parts of a database rather than the entire one. pg_restore
provides filters and options to selectively restore individual tables, schemas, or other database objects.
For example, you can use the -t or --table option to restore only specific tables or the -n or --schema option to restore objects within a particular schema. This granular control over the restoration process can be beneficial when recovering specific data or troubleshooting issues related to particular database objects.
There are benefits and drawbacks regarding logical backup and restoration tools like pg_restore
.
Flexible formatting: Logical backups created by pg_dump
can be formatted in various ways, such as plain SQL, custom archive, or directory format. This flexibility allows for easier manipulation and customization of the backup files.
Fully restores the database to the most recent state: Logical backups capture the complete state of the database at the time the backup was taken. When restored using pg_restore
, the database is rebuilt to its exact state at the time of the backup, including all data and schema objects. In contrast, physical restores can only restore the database to a fixed savepoint.
Suitable for migration and updates: Logical backups are particularly useful when migrating databases to newer versions of PostgreSQL or moving data between different database systems. The SQL-based nature of logical backups makes them compatible across different PostgreSQL versions and even with other SQL-compliant databases.
Supports partial restores: pg_restore
allows for selective restoration of specific database objects, such as tables, schemas, or functions. This feature is handy when you only need to restore a subset of the database rather than the entire database.
Slower restoration process: Restoring a database from a logical backup using pg_restore
can be slower than restoring from a physical backup. The restoration process involves executing SQL commands to recreate the database objects and insert the data, which can be time-consuming for large databases.
Requires compute resources: pg_restore
needs to execute the SQL commands from the backup file, which requires CPU and memory resources on the target database server. This can impact the performance of the server during the restoration process.
Challenges with large databases: Logical backups and restoration with pg_restore
can struggle with large databases. Generating the backup file and executing the SQL commands during restoration can take significant time and resources, making it less practical for databases with terabytes of data.
Let's look at an example of using pg_restore
with filters to rebuild specific tables from a database backup. Suppose we have a logical backup file named ecommerce_backup.tar
that contains a backup of our e-commerce database.
To view the table of contents of the backup file, you can use the -l or --list option:
pg_restore -l ecommerce_backup.tar
This command will display a list of all the objects in the backup file.
To restore only specific tables, you can create a list file containing the names of the tables you want to restore. For example, we want to restore only the products
and customers
tables.
First, let’s output the contents of the backup file to a restore_list.txt
pg_restore -l ecommerce_backup.tar > restore_list.txt
Then, you can keep the objects you want to restore. We will keep only two tables for this example.
215; 1259 17198 TABLE public customers postgres
217; 1259 17202 TABLE public order_items postgres
Then, use the -L or --use-list option to specify the list file:
pg_restore -U postgres -W -d ecommerce -L restore_list.txt ecommerce_backup.tar
This command will restore only the order_items
and customers
tables from the backup file into the e-commerce database.
You can use the -n or --schema option to restore objects within a specific schema. For example, to restore only the objects in the public schema:
pg_restore -U postgres -W -d ecommerce -n public ecommerce_backup.tar
Use the- N or- exclude-schema option to exclude objects within a specific schema. For example, to exclude the temp schema from the restoration:
pg_restore -U postgres -W -d ecommerce -N temp ecommerce_backup.tar
This command will restore all objects from the backup file except those belonging to the temp schema.
These examples demonstrate how pg_restore
provides flexibility in selectively restoring specific parts of a database based on your requirements.
Let's explore an example of migrating a PostgreSQL database to Timescale using pg_restore
. Timescale is a time-series database that extends PostgreSQL with additional functionality for handling time-series data.
To migrate a PostgreSQL database to Timescale using pg_restore
, follow the steps outlined in the Timescale documentation.
Here are a few key considerations to keep in mind:
Role management: Before dumping the data, it's important to handle the roles and permissions separately. You can use pg_dumpall
with the --roles-only
option to dump the roles from the source PostgreSQL database.
Schema and data dump: Use pg_dump
to create a logical backup of the source database schema and data. However, you must specify certain flags to ensure compatibility with Timescale.
--no-tablespaces
: Timescale has limitations on tablespace support, so this flag is necessary.
--no-owner
and --no-privileges
: These flags are required because Timescale's default user, tsdbadmin
, is not a superuser and has restricted privileges compared to PostgreSQL's default superuser.
Restoring with concurrency: When using the directory format for pg_dump
and pg_restore
, you can speed up the restoration process by leveraging concurrency. However, concurrently loading the _timescaledb_catalog
schema can cause errors due to insufficient privileges. To work around this, serially load the _timescaledb_catalog
schema and then load the rest of the database concurrently.
Post-migration tasks: After the data is loaded, it's recommended that the table statistics be updated by running ANALYZE
on all the data. This helps optimize query performance in Timescale.
Verification and application setup: Before bringing your application online with the migrated database, thoroughly verify the data integrity and ensure the migration was successful.
It's important to note that migrating large databases using pg_dump
and pg_restore
can be time-consuming and may require downtime for your application. For databases larger than 100 GB, Timescale recommends using their live migration strategy for a low-downtime migration solution instead.
Also, remember that migrating to Timescale may require additional steps to enable Timescale-specific features like hypertables, data compression, and retention policies after the migration is complete.
This guide explored the versatility of pg_restore
, a logical backup and recovery tool for PostgreSQL. We've learned how pg_restore
works hand in hand with pg_dump
to create and restore logical backups, providing flexibility and granular control over the restoration process.
A few key advantages of pg_restore
are its ability to facilitate system migrations and partial database restorations. Whether you need to migrate a PostgreSQL database to a newer version, move data between different systems, or selectively restore specific objects, pg_restore
offers the tools and options to accomplish these tasks easily.
However, it's important to note that logical replication is often the recommended approach for system migrations for most real-world workloads, especially those involving larger databases or requiring minimal downtime.
If you're working with time-series data and considering migrating to a specialized time-series database like Timescale, pg_restore
can be valuable. This enables you to use Timescale's powerful features, such as hypertables, data compression, and retention policies, to optimize your time-series workloads and achieve maximum efficiency.
To experience the benefits of Timescale firsthand, try it for free today. Create your account and explore the possibilities of a purpose-built time-series database.
Learn more about migration strategies: