This topic describes how to use pg_dump and pg_restore to migrate a database from a self-managed PostgreSQL instance to an ApsaraDB RDS for PostgreSQL instance. This applies if you are familiar with pg_dump and pg_restore and do not want to use the migration services and features provided by Alibaba Cloud, such as Data Transmission Service (DTS) and the cloud migration feature of ApsaraDB RDS for PostgreSQL.
Prerequisites
An RDS instance that runs the same PostgreSQL version as the self-managed PostgreSQL instance is created. For more information, see Create an ApsaraDB RDS for PostgreSQL instance.
NoteYou can use pg_dump and pg_restore to migrate a database across PostgreSQL versions. We recommend that you use the same PostgreSQL version for the self-managed PostgreSQL instance and the RDS instance to avoid compatibility issues.
You can run the
psql --version
command on the server on which the self-managed PostgreSQL instance is deployed to query the PostgreSQL version of the instance.You can upgrade the major engine version of your RDS instance after the migration is complete. For more information, see Upgrade the major engine version.
If the self-managed PostgreSQL instance is deployed on an Elastic Compute Service (ECS) instance, we recommend that you create the RDS instance in the same region and virtual private cloud (VPC) as the ECS instance. This way, data is migrated over an internal network, which is faster and more secure.
A whitelist is configured for the RDS instance to allow access from the server on which the self-managed PostgreSQL instance is deployed. For more information, see Configure an IP address whitelist.
A privileged account is created for the RDS instance. For more information, see Create an account.
The endpoint and port number of the RDS instance are obtained. For more information, see View and change the endpoints and port numbers of an ApsaraDB RDS for PostgreSQL instance.
NoteIf the self-managed PostgreSQL instance is deployed on an ECS instance that resides in the same region and VPC as the RDS instance, you must obtain the internal endpoint and port number of the RDS instance. In other cases, you must obtain the public endpoint and port number of the RDS instance.
An administrator account that assumes the superuser role is created for the self-managed PostgreSQL instance.
pg_dump and pg_restore are installed on the server on which the self-managed PostgreSQL instance is deployed.
NoteWhen you install the self-managed PostgreSQL instance on your on-premises machine, pg_dump and pg_restore are automatically installed. You can run the
pg_dump --version
andpg_restore --version
commands to query the details.
Procedure
The migration solution described in this topic is suitable for migrating a single database. If you want to migrate multiple databases or the entire self-managed PostgreSQL instance to an RDS instance, refer to the following solutions:
To migrate multiple databases on the self-managed PostgreSQL instance at a time, we recommend that you use DTS. For more information, see Use DTS to migrate data from a self-managed PostgreSQL database to an ApsaraDB RDS for PostgreSQL instance.
To migrate the entire self-managed PostgreSQL instance, we recommend that you use the cloud migration feature of ApsaraDB RDS for PostgreSQL. For more information, see Use the cloud migration feature for an ApsaraDB RDS for PostgreSQL instance.
Step 1: Check the information about the database that you want to migrate
Connect to the self-managed PostgreSQL instance. In the psql command line, run the \l
command to query the information about databases and check the values of Owner, Encoding, Collate, and Ctype of the database that you want to migrate.
In this example, the database named testdb
is migrated. The following information is returned for the database: Owner is testuser
, Encoding is UTF8
, Collate is en_US.UTF-8
, and Ctype is en_US.UTF-8
.
List of databases
Name | Owner | Encoding | Collate | Ctype | ICU Locale | Locale Provider | Access privileges
-----------+----------+----------+-------------+-------------+------------+-----------------+-----------------------
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc | =Tc/postgres +
| | | | | | | postgres=CTc/postgres
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc | =c/postgres +
| | | | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc | =c/postgres +
| | | | | | | postgres=CTc/postgres
testdb | testuser | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc |
(4 rows)
Step 2: Migrate users
Connect to the server on which the self-managed PostgreSQL instance is deployed and run the following command to export the owner named
testuser
of the database that you want to migrate:pg_dumpall -g -h 127.0.0.1 -p 5432 -U testuser
Sample output:
NoteIn the Roles section of the command output, you can obtain the statement that is related to the permissions of the
testuser
user.-- -- PostgreSQL database cluster dump -- SET default_transaction_read_only = off; SET client_encoding = 'UTF8'; SET standard_conforming_strings = on; -- -- Roles -- CREATE ROLE postgres; ALTER ROLE postgres WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN REPLICATION BYPASSRLS; CREATE ROLE testuser; ALTER ROLE testuser WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN REPLICATION BYPASSRLS PASSWORD 'SCRAM-SHA-256$4096:0IEvNqWupQsGcaoDS3SX****3U='; -- -- User Configurations -- -- -- PostgreSQL database cluster dump complete --
Use the privileged account to connect to the RDS instance. For more information, see Connect to an ApsaraDB RDS for PostgreSQL instance.
Change
SUPERUSER
in the obtained SQL statement that is related to the permissions of thetestuser
user tords_SUPERUSER
. On the RDS instance, create an account that has the same permissions as the testuser user of the self-managed PostgreSQL instance.CREATE ROLE testuser; ALTER ROLE testuser WITH rds_SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN REPLICATION BYPASSRLS PASSWORD 'SCRAM-SHA-256$4096:0IEvNqWupQsGcaoDS3SX****3U=';
Step 3: Create a destination database on the RDS instance
Use the privileged account to connect to the RDS instance.
Create a destination database. Make sure that the destination database has the same character set and the same Collate, and Ctype settings as the source database on the self-managed PostgreSQL instance. For example, the destination database must have the same settings as the source database that is obtained in Step 1: Encoding is
UTF8
, Collate isen_US.UTF-8
, and Ctype isen_US.UTF-8
.CREATE DATABASE <Name of the destination database> WITH OWNER <Database owner> TEMPLATE template0 ENCODING 'UTF8' LC_CTYPE 'en_US.UTF-8' LC_COLLATE 'en_US.UTF-8';
NoteThe name of the destination database can be different from the name of the source database on the self-managed PostgreSQL instance.
The owner of the destination database must be the same as the owner of the source database on the self-managed PostgreSQL instance.
Step 4: Migrate data
You can use one of the following methods to migrate data:
If the server on which the self-managed PostgreSQL instance is deployed can connect to the RDS instance, we recommend that you use online migration.
If the server on which the self-managed PostgreSQL instance is deployed cannot connect to the RDS instance, we recommend that you use offline migration. Specifically, use pg_dump to export the data to a file, copy the file to the host to which the RDS instance connects, and then use pg_restore to import the file to the RDS instance.
Online migration
Configure a password file. For more information, see The Password File.
Run the following command on the server on which the self-managed PostgreSQL instance is deployed to create and edit the .pgpass file to store the password:
vi ~/.pgpass
Press
i
to enter the insert mode and enter the password-related content.NoteThe content of the password file is in the following format:
host:port:dbname:username:password
.By default, the metadata of a PostgreSQL database is stored in lower case. Therefore, the values of the dbname and username parameters must be in lower case. If the database name and username that you specified use uppercase letters and are enclosed in a pair of double quotation marks ("), the preceding condition does not apply.
Endpoint of the RDS instance:5432:Name of the destination database:Username of the owner of the destination database:Password 127.0.0.1:5432:Name of the source database on the self-managed PostgreSQL instance:Username of the superuser:Password
Press
Esc
and enter:wq
to exit the insert mode.Modify permissions on the
.pgpass
file.chmod 700 ~/.pgpass
Run the following command on the server on which the self-managed PostgreSQL instance is deployed to migrate data:
nohup pg_dump -F p -h 127.0.0.1 -p 5432 -U <Superuser of the self-managed instance> -d <Name of the database that you want to migrate> --no-tablespaces | time psql -h <Endpoint of the RDS instance> -p 5432 -U <Username of the owner of the destination database> --single-transaction <Name of the destination database> > ./pg.dump.log 2>&1 &
NoteYou can query the pg.dump.log file to check whether error logs exist. If the file is empty, the migration is successful. In this case, you can log on to the RDS instance to view the migrated data.
The
--single-transaction
option is specified in the preceding command. During data import, all operations are encapsulated in one transaction to ensure that all operations are either all successful or all fail. If all operations fail, you can resolve the issue and directly import the data again.
Offline migration
Run the following command on the server on which the self-managed PostgreSQL instance is deployed to export the pg.dump file:
nohup pg_dump -F c -h 127.0.0.1 -p 5432 -U <Superuser of the self-managed instance> -d <Name of the database that you want to migrate> --no-tablespaces -f ./pg.dump > ./pg.dump.log 2>&1 &
NoteWait for the export to complete. Then, verify that no error logs exist in the pg.dump.log file.
Copy the
pg.dump
file to a host to which the RDS instance can connect. Then, import the file.NoteIf you use this method, you must configure a whitelist for the RDS instance to allow the host to access the instance. For more information, see Configure an IP address whitelist.
Single-thread import
pg_restore -h <Endpoint of the RDS instance> -p 5432 -U <Username of the owner of the destination database> -d <Name of the destination database> --no-tablespaces --single-transaction pg.dump >./pg.restore.log
NoteWait for the import to complete. You can query the
pg.restore.log
file to check whether error logs exist. If the file is empty, the migration is successful. In this case, you can log on to the RDS instance to view the migrated data.The
--single-transaction
option is specified in the preceding command. During data import, all operations are encapsulated in one transaction to ensure that all operations are either all successful or all fail. If all operations fail, you can resolve the issue and directly import the data again.
Multi-thread parallel import
pg_restore -h <Endpoint of the RDS instance> -p 5432 -U <Username of the owner of the destination database> -d <Name of the destination database> --no-tablespaces -j <Number of parallel threads> pg.dump >./pg.restore.log 2>&1 &
NoteYou cannot use the
--single-transaction
option and the-j
option that is used to enable parallelism at the same time.The value of the
-j
option is a positive integer. We recommend that the maximum value of the option is twice or four times the number of CPU cores of the system. This ensures that system resources are fully utilized and prevents resource contention and performance degradation that are caused by a large number of parallel tasks.This method does not use the
--single-transaction
option to encapsulate transactions. If an error is reported during data import, you must resolve the issue, clear all tables in the destination database on the RDS instance, and then import data again. This helps ensure data consistency.
References
You can configure more parameters for the commands in this topic based on your business requirements. For more information, see the following documents:
The migration solution described in this topic is suitable for migrating a single database. If you want to migrate multiple databases or the entire self-managed PostgreSQL instance to an RDS instance, refer to the following solutions:
To migrate multiple databases on the self-managed PostgreSQL instance at a time, we recommend that you use DTS. For more information, see Use DTS to migrate data from a self-managed PostgreSQL database to an ApsaraDB RDS for PostgreSQL instance.
To migrate the entire self-managed PostgreSQL instance, we recommend that you use the cloud migration feature of ApsaraDB RDS for PostgreSQL. For more information, see Use the cloud migration feature for an ApsaraDB RDS for PostgreSQL instance.