With the rapid development of information technology, cloud computing has become an essential platform for enterprises and developers to improve efficiency, expand their business, and achieve business innovation. ApsaraDB RDS for PostgreSQL leverages the robust features of PostgreSQL and the flexibility and convenience of the Alibaba Cloud infrastructure to provide high-performance, highly elastic, easy-to-use, stable, and secure database services. This topic describes how to use Data Transmission Service (DTS) to migrate data of a self-managed PostgreSQL 9.x database to an ApsaraDB RDS for PostgreSQL instance to upgrade the database engine version.
Before you start
Before you start, we recommend that you get familiar with DTS. For more information, see What is DTS?
Introduction to data migration technologies
Logical replication is supported for PostgreSQL 9.4 and later. Logical replication of PostgreSQL is developed based on the publish and subscribe model. Logical replication allows data to be replicated to different instances that run different PostgreSQL versions or to different databases on the same instance. Compared with streaming replication, logical replication provides a higher level of flexibility and supports finer-grained data replication. For example, logical replication can be used to replicate specific tables or specific columns of a table. Logical replication has been widely used in scenarios such as cross-version database upgrades, heterogeneous database migration, and heterogeneous system data integration because logical replication ensures compatibility across database engine versions, supports the replication of specific data, and improves flexibility and ease of use. You can use Alibaba Cloud DTS, which combines logical replication of PostgreSQL with a logical decoding extension, to migrate data of a self-managed PostgreSQL 9.x database to an RDS instance.

Data migration consists of the following phases: schema migration, full data migration, and incremental data migration. After each phase is complete, the data verification step that corresponds to each phase is performed. The incremental data migration phase and the data verification step for the phase are continuously performed until the data migration task ends.
Schema migration: Before you migrate data, DTS reads and parses the metadata of the source database and generates DDL statements that comply with the syntax of the destination database based on the metadata. Then, the DDL statements are executed in the destination database to create corresponding schema objects.
Full data migration: During full data migration, DTS splits all data in the source database and concurrently writes the split data to the destination database. When full data migration starts, the incremental data reading module is activated, the logical decoding extension ali_decoding
is used to parse and reformat the incremental data, and the formatted data is stored in the on-premises DTS server.
Incremental data migration: After full data migration is complete, DTS retrieves the incremental data that is stored in the on-premises DTS server and reformats and migrates the stored data to the destination database. The ali_decoding
extension is used to parse and reformat the incremental data that is generated in real time and write the incremental data to the destination database in real time. This process continues until all ongoing data changes in the source database are written to the destination database. This helps make sure that the source database is completely synchronized with the destination database.
Data verification: After each phase is complete, DTS verifies the corresponding data that is written to the destination database. This helps ensure that the data in the source and destination databases is consistent. During incremental data verification, DTS verifies the incremental data written to the destination database in real time to ensure that the data written to the destination database is consistent with the data in the source database.
Usage notes
You must complete the required operations before you perform the data migration. For example, you must check the version of the self-managed PostgreSQL database, check the database engine version of the RDS instance, and check the limits on the self-managed PostgreSQL database.
The version of the self-managed PostgreSQL database ranges from 9.4.8 to 10.0.
You can run the select version();
command to query the PostgreSQL version of the self-managed PostgreSQL database.
The RDS instance must run PostgreSQL 11.0 or later.
Limits on the self-managed PostgreSQL database:
The tables that you want to migrate must have a PRIMARY KEY or UNIQUE constraint, and all fields must be unique. Otherwise, the destination database may contain duplicate data records.
The name of the source database cannot contain hyphens (-). For example, dts-testdata is invalid.
DTS cannot migrate specific items in the source database, including temporary tables, internal triggers, and specific functions. The functions include stored procedures and functions in the C programming language and internal stored procedures and functions. DTS can migrate custom parameters of the COMPOSITE, ENUM, and RANGE types. DTS can migrate tables that contain one of the following PRIMARY KEY, FOREIGN KEY, UNIQUE, and CHECK constraints.
When data is migrated from a self-managed PostgreSQL database, you cannot perform a primary/secondary switchover. Otherwise, the data migration fails.
During schema migration and full data migration, do not perform DDL operations to change the schemas of databases or tables. Otherwise, the data migration fails.
For more information, see Usage notes.
Migration process
The following figure shows how to migrate data of a self-managed PostgreSQL 9.x database to an RDS instance.

Preparations
Create an account
Create a superuser account for the self-managed PostgreSQL database and use the account to perform the data migration. In this example, the migrate_test account is used.
CREATE USER migrate_test WITH superuser password 'your_password';
Check whether the account is created.
\du+
List of roles
Role name | Attributes | Member of | Description
migrate_test | Superuser | {} |
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {} |
Modify parameters
Check whether the parameter settings meet the data migration requirements. If the parameter settings do not meet the data migration requirements, modify the parameter settings by referring to the following section.
Check whether the wal_level
parameter is set to logical
.
SHOW wal_level;
wal_level
minimal
(1 row)
If the wal_level parameter is not set to logical, execute the following statement to change the value of the wal_level
parameter to logical
:
ALTER SYSTEM SET wal_level='logical';
Check whether the max_wal_senders
parameter is set to 0.
SHOW max_wal_senders;
max_wal_senders
0
(1 row)
If the max_wal_senders
parameter is not set to 0, you must set the parameter to a value that is greater than 0. We recommend that the value be greater than or equal to 64.
ALTER SYSTEM SET max_wal_senders=64;
Check whether the max_replication_slots
parameter is set to 0.
SHOW max_replication_slots;
max_replication_slots
0
(1 row)
If the max_replication_slots
parameter is not set to 0, you must set the parameter to a value that is greater than 0. We recommend that the value be greater than or equal to 64.
ALTER SYSTEM SET max_replication_slots=64;
Restart the PostgreSQL service for the configurations to take effect.
Allow access from DTS
Add the following rules to the pg_hba.conf file to allow DTS to access the self-managed PostgreSQL database:
host all all 100.104.0.0/16 md5
host replication all 100.104.0.0/16 md5
After the configuration is complete, log on to the database and execute the following statement to make the configuration take effect.
SELECT pg_reload_conf();
pg_reload_conf
t
(1 row)
Compile and install the extension that is required for the data migration.
Download the ali_decoding extension provided by DTS, and compile and install the extension on the server that hosts the self-managed PostgreSQL database.
Query the version of the self-managed PostgreSQL database.
In this example, PostgreSQL 9.4.14 is used. You must replace PostgreSQL 9.4.14 with the actual database engine version.
Download the source code packagefor the version of the self-managed PostgreSQL database from the PostgreSQL official website, and upload the package to the server that hosts the self-managed PostgreSQL database.
For example, if the server runs CentOS 7.9, you can download the source code package postgresql-9.4.14.tar.gz
for PostgreSQL 9.4.14.
wget https://ftp.postgresql.org/pub/source/v9.4.14/postgresql-9.4.14.tar.gz
Decompress the source code package and compile and install the source code.
Create a directory and move the source code package to the directory. In this example, a directory named postgres is created.
mkdir postgres
mv ~/postgresql-9.4.14.tar.gz postgres
Go to the postgres directory and decompress the source code package.
cd postgres/
tar -zxvf postgresql-9.4.14.tar.gz
Go to the directory in which the source code is stored and create an installation directory.
cd postgresql-9.4.14/
mkdir -p /usr/local/pgsql94
Configure, compile, and install the source code.
Configure the source code.
sudo ./configure --prefix=/usr/local/pgsql94
Run the echo $?
command to check the configuration result. If a value of 0 is returned, the configuration is successful, and you can proceed to the next step. If a non-zero value is returned, the configuration failed. You must troubleshoot the issue and try again.
Important
The value of the prefix parameter must be different from the installation path of the self-managed PostgreSQL database.
When you compile the source code and install PostgreSQL, the operating system version of PostgreSQL must be consistent with the GNU Compiler Collection (GCC) version.
If an error occurs when you run the sudo ./configure
command, you can modify the command based on the error message. For example, if the error message is readline library not found. Use --without-readline to disable readline support.
, you can modify the command to sudo ./configure --without-readline
.
Compile the source code.
Run the echo $?
command to check the configuration result. If a value of 0 is returned, the configuration is successful, and you can proceed to the next step. If a non-zero value is returned, the configuration failed. You must troubleshoot the issue and try again.
Install the source code.
Run the echo $?
command to check the configuration result. If a value of 0 is returned, the configuration is successful, and you can proceed to the next step. If a non-zero value is returned, the configuration failed. You must troubleshoot the issue and try again.
Download the ali_decoding extension and upload it to the server that hosts the self-managed PostgreSQL database.
In this example, Linux is used. You can run the following commands to download and install the extension.
Important
The branch version of the ali_decoding extension must be ali_decoding_94.
sudo wget https://github.com/YeFuner/rds_dbsync/archive/refs/heads/ali_decoding_94.zip
unzip ali_decoding_94.zip
Access the rds_dbsync-ali_decoding_94 directory that is generated after the decompression, and copy the ali_decoding directory to the contrib directory of PostgreSQL that is compiled and installed.
cd rds_dbsync-ali_decoding_94/
cp -r ali_decoding/ /home/postgres/postgresql-9.4.14/contrib/
cd /home/postgres/postgresql-9.4.14/contrib/ali_decoding/
Note
You must replace the /home/postgres/postgresql-9.4.14/contrib/ directory in the preceding command with the directory in which the source code is stored.
Access the ali_decoding directory and replace the content of the Makefile file with the following information:
# contrib/ali_decoding/Makefile
MODULE_big = ali_decoding
MODULES = ali_decoding
OBJS = ali_decoding.o
DATA = ali_decoding--0.0.1.sql ali_decoding--unpackaged--0.0.1.sql
EXTENSION = ali_decoding
NAME = ali_decoding
#subdir = contrib/ali_decoding
#top_builddir = ../..
#include $(top_builddir)/src/Makefile.global
#include $(top_srcdir)/contrib/contrib-global.mk
#PG_CONFIG = /usr/pgsql-9.6/bin/pg_config
#pgsql_lib_dir := $(shell $(PG_CONFIG) --libdir)
#PGXS := $(shell $(PG_CONFIG) --pgxs)
#include $(PGXS)
# Run the following commands to install the source code:
ifdef USE_PGXS
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
else
subdir = contrib/ali_decoding
top_builddir = ../..
include $(top_builddir)/src/Makefile.global
include $(top_srcdir)/contrib/contrib-global.mk
endif
Compile and install the ali_decoding extension, and generate the file required to install the ali_decoding extension.
Compile the extension.
Run the echo $?
command to check the configuration result. If a value of 0 is returned, the configuration is successful, and you can proceed to the next step. If a non-zero value is returned, the configuration failed. You must troubleshoot the issue and try again.
Install the extension.
Run the echo $?
command to check the configuration result. If a value of 0 is returned, the configuration is successful, and you can proceed to the next step. If a non-zero value is returned, the configuration failed. You must troubleshoot the issue and try again.
Copy the generated ali_decoding file to the specified directory for the self-managed PostgreSQL database.
Use the pg_config tool to find the lib directory that is used by the kernel of the current self-managed PostgreSQL database, and install ali_decoding.so that is obtained after compilation to the lib directory.
/usr/local/pgsql94/bin/pg_config --pkglibdir
The following information is returned:
/usr/local/pgsql94/lib
/usr/bin/install -c -m 755 ali_decoding.so '/usr/local/pgsql94/lib/ali_decoding.so'
/usr/bin/install -c -m 755 ali_decoding.so '/usr/local/pgsql94/lib/'
Use the pg_config tool to find the share directory that is used by the kernel of the current self-managed PostgreSQL database, and install ali_decoding.control, ali_decoding--0.0.1.sql, and ali_decoding--unpackaged--0.0.1.sql that are obtained after compilation to the extension directory under the share directory.
/usr/local/pgsql94/bin/pg_config --sharedir
The following information is returned:
/usr/local/pgsql94/share
/usr/bin/install -c -m 644 ali_decoding.control '/usr/local/pgsql94/share/extension/'
/usr/bin/install -c -m 644 ali_decoding--0.0.1.sql ali_decoding--unpackaged--0.0.1.sql '/usr/local/pgsql94/share/extension/'
Create an RDS instance.
Create an RDS instance and create a privileged account on the RDS instance. The privileged account is used to perform the data migration. Create a database on the RDS instance based on the information of the database whose data you want to migrate.
The name of the database on the RDS instance must be the same as the self-managed PostgreSQL database.
The owner of the database on the RDS instance must be the privileged account that is used for the data migration.
Procedure
Create an RDS instance. For more information, see Create an ApsaraDB RDS for PostgreSQL instance.
Create a privileged account. For more information, see Create an account.
Create a database on the RDS instance. For more information, see Create a database.
Create the required extension on the RDS instance
Run the \dx
command to query the extensions that are used for the self-managed PostgreSQL database, and create the same extension for the destination database on the RDS instance.
If a PostGIS extension is used and the versions of the PostGIS extension used for the source and destination databases are different, execute the following SQL statement on the destination database to create an extension of the same version as the source database.
CREATE EXTENSION IF NOT EXISTS postgis_raster;
Use DTS to migrate data
For more information about how to configure a DTS migration task, see Procedure.
Complete cutover
After the DTS migration task is complete, perform a full backup on the destination RDS instance and clone the backup set to generate another RDS instance that has the same database engine version and data as the destination RDS instance. The cloned RDS instance is used for comprehensive verification in a test environment. After all compatibility issues are resolved, you can start to switch over your workloads. For more information about how to clone an RDS instance, see Restore data of an ApsaraDB RDS for PostgreSQL instance.
When the DTS task enters the Incremental Data Migration phase and the data synchronization latency reaches the cutover threshold (the latency is less than 1 second), stop the application and stop writing data to the source database.
Configure the pg_hba.conf file and specify the highest priorities for the following rules to prohibit connections to the source database by using the service account. In this example, the service account is test_acc1.
host all test_acc1 0.0.0.0/0 reject
host replication test_acc1 0.0.0.0/0 reject
After the configuration is complete, log on to the source database and execute the SELECT pg_reload_conf();
statement to make the configuration take effect.
Execute the following SQL statement to terminate all existing connections:
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE pid <> pg_backend_pid() AND usename NOT IN ('migrate_test');
Use the service account to connect to the source database. If the following error is returned, the configuration takes effect.
FATAL: pg_hba.conf rejects connection for host "127.0.0.1", user "test_acc1", database "postgres"
After you switch your workloads over to the destination database, the value of the sequence parameter that is newly written to the destination database does not increase based on the maximum value of the sequence parameter for the source database. To resolve the issue, perform the following operations after the system stops writing data to the self-managed PostgreSQL database: Query the maximum value of the sequence parameter for the source database and change the initial value of the sequence parameter for the destination database to the maximum value of the sequence parameter for the source database.
Execute the following SQL statements to query the value of the sequence parameter for the self-managed PostgreSQL database. Execute the returned SQL statement on the destination RDS instance.
DO LANGUAGE plpgsql $$
DECLARE
nsp name;
rel name;
val int8;
BEGIN
FOR nsp,rel IN SELECT nspname,relname FROM pg_class t2 , pg_namespace t3 WHERE t2.relnamespace=t3.oid AND t2.relkind='S'
LOOP
EXECUTE format($_$SELECT last_value FROM %I.%I$_$, nsp, rel) INTO val;
raise notice '%',
format($_$SELECT setval('%I.%I'::regclass, %s);$_$, nsp, rel, val+1);
END LOOP;
END;
$$;
The following information provides an example of the returned SQL statement:
NOTICE: select setval('public.my_sequence'::regclass, 2);
In the DTS console, configure a data backflow task or data synchronization task to send incremental data from the RDS instance back to the self-managed PostgreSQL database. For more information, see Procedure.
Important
This operation must be performed before the application is started.
The source database of the data synchronization task is a database on the RDS instance, and the destination database is the self-managed PostgreSQL database.
If the type of the data synchronization task is incremental data migration only, you do not need to select the schema synchronization and full data synchronization options.
Change the connection addresses of all applications to the endpoint of the destination RDS instance and start the applications. For more information about how to obtain the endpoint of an RDS instance, see View and change the endpoints and port numbers.
Monitor the applications and ensure that they run as expected.