Category: All posts
Apr 08, 2024
Posted by
Muhammad Ali Iqbal
Data migration involves careful planning and follow-up tasks such as making a strategy, backing up data, doing quality checks, and ensuring the moved data is correct. When moving data to Timescale, you'll have to follow a few steps, from selecting the compatible data format and tool for migration to focusing on measuring the data transfer, along with testing and ensuring it's all done right.
In this article, we’ll show you three ways to migrate your data into our PostgreSQL-but-faster database.
To check your Timescale version, run the following command:
SELECT extversion FROM pg_extension WHERE extname='timescaledb';
CSV is a common format for gathering and storing data in a tabular format structure. Each line in a CSV file usually represents a single record or row of a table. The migration of data using CSV files is a straightforward process across various data systems. PostgreSQL provides functionality to both import and export CSV files.
In PostgreSQL and Timescale, the COPY command helps to move data in and out of CSV files. It provides different options such as custom separators/delimiters, error handling, and null value setting. COPY works faster than inserts because it reduces overheads like connection overhead, network round trips, and managing transactions. This makes it a really good candidate for bulk data ingestion.
Create a CSV file from a table using COPY TO:
This command exports the data from schema.table into a CSV file with a header row.
COPY schema.table TO '/DIR/schema_table.csv' DELIMITER ',' CSV HEADER;
For a specific result set:
COPY (Select * from schema.table where ….) TO '/DIR/schema_table.csv' DELIMITER ',' CSV HEADER;
Load the CSV file into a table using COPY FROM:
Make sure that the table is created in the target database inside the same schema as the source.
COPY schema.table FROM '/DIR/schema_table.csv' DELIMITER ',' CSV HEADER;
Load the CSV file into Timescale using Timescaledb-parallel-copy:
The Timescale Parallel Copy tool, developed in Go, is used to efficiently import large CSV data files into a Timescale hypertable.
Before using this tool, check that the time column inside the table uses the TIMESTAMPTZ datatype and ensure that the target table has been converted into a hypertable.
To load a CSV file, use this command:
timescaledb-parallel-copy --connection "host=<CLOUD_HOST> user=tsdbadmin password=<CLOUD_PASSWORD> port=<CLOUD_PORT>
sslmode=require" --db-name tsdb --table <schema.table> --file <schema_table>.csv --workers 4 --copy-options "CSV" --reporting-period 30s —batch-size 10000
The TimescaleDB Parallel Copy tool is great for importing data faster because it uses many workers at once, unlike the usual COPY commands that use just one thread. It's also fine-tuned for working with hypertables, which are designed for time-series data.
Check it out here: Timescale Parallel Copy Tool.
Migrating data using pg_dump and pg_restore is a common and efficient method, especially suited for database sizes less than 100 GB.
The most common tool used for creating logical backups (a sequence of SQL statements) in PostgreSQL is the pg_dump/pg_dumpall utility. pg_dump
runs SELECT in the background and takes access shared locks on database objects, which means it doesn’t block reads/writes inside the table.
Take a backup of roles inside the database first (only necessary if you're using roles other than the default Postgres role in your database):
pg_dumpall -d “<Source_Connection_String>” --quote-all-identifiers --roles-only -f roles.sql
Where
Source_Connection_String can be specified as:
postgres://<sourcedbuser>:<sourcepassword>@<sourcedbhost>:<sourcedbport>/<sourcedbname>
Target_Connection_String
postgres://tsdbadmin:<targetpassword>@<targetdbhost>:<targetdbport>/tsdb
Run the following command to take a backup using pg_dump in plain format:
pg_dump -d <Connection_String> --quote-all-identifiers --no-tablespaces --no-owner --no-privileges **--format=plain** -f backup.sql
For backups using parallel workers, use the directory format:
pg_dump -d <Source_Connection_String> --quote-all-identifiers --no-tablespaces --no-owner --no-privileges **--format=directory** --jobs=8 -f backup.dir
Since the backup is in plain format, psql can be used to import the dump into the target PostgreSQL database.
Psql -h Target_host -p Target_port -U tsdbadmin -d tsdb -f backupfile.sql
If background jobs are running on Timescale (target database), the better way to restore is the following:
psql -h Target_host -p Target_port -U tsdbadmin -d tsdb -v ON_ERROR_STOP=1 --echo-errors -f roles.sql -c "SELECT public.timescaledb_pre_restore();"-f backupfile.sql -c "SELECT public.timescaledb_post_restore();"
Where
timescaledb_pre_restore()
is called before restoration and stops background workers.
timescaledb_post_restore()
is called after restoration and restarts background workers.
_timescaledb_catalog schema doesn’t support concurrent restoration using pg_restore. To avoid errors, first restore the timescaledb_catalog schema, and then you can load the rest of the database concurrently.
pg_restore -d “Target_Connection_String” --format=directory --schema='_timescaledb_catalog' --exit-on-error --no-tablespaces --no-owner --no-privileges backupfile.dir
Next, restore everything except the _timescaledb_catalog using --exclude-schema and --jobs flag:
pg_restore -d “Target_Connection_String” --format=directory --jobs=8 --exclude-schema='_timescaledb_catalog' --exit-on-error --disable-triggers --no-tablespaces --no-owner --no-privileges backupfile.dir
SELECT public.alter_job(id::integer, scheduled=>false)
FROM _timescaledb_config.bgw_job
WHERE id >= 1000;
Once pg_dump is completed, you can enable the job with the following query:
SELECT public.alter_job(id::integer, scheduled=>true)
FROM _timescaledb_config.bgw_job
WHERE id >= 1000;
Check out our docs for more troubleshooting on migrating your database.
When you need to move a large amount of data with minimal downtime, live migration is a great strategy provided by Timescale. It uses a tool called pgcopydb in the background, which works in two main steps:
For more information, check the pgcopydb documentation.
ALTER SYSTEM SET wal_level = logical;
ALTER SYSTEM set old_snapshot_threshold=-1
If you’re using a managed service (AWS RDS, Azure Flexible Server, etc.), adjust these settings through your cloud provider’s interface.
ALTER TABLE {table_name} REPLICA IDENTITY USING INDEX {_index_name};
For tables without a primary key or unique index, use REPLICA IDENTITY FULL, though this can slow down replication for tables with frequent updates/deletes. The issue with REPLICA IDENTITY FULL is that for each UPDATE or DELETE statement, PostgreSQL will log the old column values inside write-ahead-log (WAL) files, which can result in replication overhead and can incur I/O load on the source database.
export SOURCE=postgres://{user}:{password}@{source host}:{source port}/{dbname}
export TARGET=postgres://tsdbadmin:{password}@{target host}:{target port}/tsdb
docker run --rm -dit --name live-migration -e PGCOPYDB_SOURCE_PGURI=$SOURCE -e PGCOPYDB_TARGET_PGURI=$TARGET -v ~/live-migration:/opt/timescale/ts_cdc timescale/live-migration:v0.0.1
This command will perform the following steps:
c
key. If you’re migrating from self-hosted TimescaleDB, it won’t show this prompt.[WATCH] Source DB - Target DB => 128MBKeep an eye on the data transfer progress and replication lag.
[WATCH] Source DB - Target DB => 0MB. Press "c" (and ENTER) to proceed
Syncing last LSN in Source DB to Target DB ...
After this, the live migration tool will handle migrating sequences and cleaning the temporary files created during migration. If everything goes as planned, you'll get a final message, Migration successfully completed, which indicates that your data has been successfully moved. You can now connect your application to a Timescale database.
If you're browsing this blog post to assess how easy it would be to migrate to Timescale, we hope this step-by-step guide will clear your doubts and help you choose the best way to move your data. Remember, you can always check our migration docs for more details.
For added peace of mind, know that when migrating to our mature cloud platform, you will have access to our expert Support team to help you along the way from day one—even as a trialer. Sign up today; it's free for 30 days, no credit card required.
👉 Learn how to build Python apps using PostgreSQL/Timescale.