Open source PostgreSQL allows you to use replication slots to enable the Change Data Capture (CDC) feature. This topic describes how to enable the CDC feature for an ApsaraDB RDS for PostgreSQL instance.
Prerequisites
An RDS instance is created. For more information, see Create an ApsaraDB RDS for PostgreSQL instance.
A whitelist is configured to allow the requests from your client to the RDS instance. For more information, see Configure an IP address whitelist for an ApsaraDB RDS for PostgreSQL instance.
psql is installed on your client. For more information, see PostgreSQL official documentation.
Usage notes
You can enable the CDC feature and consume the captured data only on the primary RDS instance. Read-only RDS instances are not supported.
ApsaraDB RDS for PostgreSQL supports the Logical Replication Slot Failover feature. The primary/secondary switchover does not affect the CDC feature. For more information, see Logical Replication Slot Failover.
Before you enable the CDC feature for an RDS instance, you must modify specified parameters of the RDS instance. The parameter modification may trigger an instance restart. To prevent impacts on your workloads, we recommend that you modify the parameters during off-peak hours.
After you enable the CDC feature for an RDS instance, you must take note of the following points:
The RDS instance requires more storage for WAL logs. If the consumption of captured data is abnormal or stopped, WAL logs are not automatically cleared. The logs accumulate and exhaust the disk space. In this case, the RDS instance may be locked. If the RDS instance is locked, the RDS instance can process only read requests.
If a downstream client consumes the data rows of the RDS instance, the client must be configured to report the consumption in real time. Otherwise, the RDS instance retains the data rows of multiple versions. If the data rows of multiple versions are retained, transaction ID (XID) wraparound occurs, and your RDS instance can process only read requests. Sample error log:
"HINT: Close open transactions soon to avoid wraparound problems. You might also need to commit or roll back old prepared transactions, or drop stale replication slots." "WARNING: oldest xmin is far in the past."
You can manually delete replication slots to allow AliPG to automatically clear WAL logs and data rows of earlier versions. For more information, see Disable the CDC feature.
NoteWe recommend that you monitor the size of WAL logs and the storage usage on the RDS instance on a regular basis and configure alert rules. For more information, see View the Enhanced Monitoring metrics and Manage the alert rules of an ApsaraDB RDS for PostgreSQL instance.
Enable the CDC feature
Step 1: Create a test database
- Go to the Instances page. In the top navigation bar, select the region in which the RDS instance resides. Then, find the RDS instance and click the ID of the instance.
In the left-side navigation pane of the page that appear, click Databases.
On the page that appears, click Create Database.
NoteIn this example, a database named testdb is created. For more information, see Create a database.
Step 2: Create test accounts and configure permissions
In the left-side navigation pane of the page that appears, click Accounts.
On the page that appears, click Create Account to create a privileged account named db_admin and a standard account named cdc_user. The privileged account has administrative rights, and the standard account is used to capture change data.
NoteThe usernames of the accounts in this example are for reference only. You can specify usernames based on your business requirements. For more information about how to create an account, see Create an account.
Connect your client to the RDS instance by using the db_admin account.
psql -h <Endpoint of the RDS instance> -p 5432 -U db_admin -d testdb
NoteFor more information about how to obtain the endpoint of an RDS instance, see View and change the endpoints and port numbers of an ApsaraDB RDS for PostgreSQL instance.
Execute the following statement to assign the replication role to the cdc_user account:
ALTER USER cdc_user WITH REPLICATION;
You can execute the following statement to query the result:
SELECT rolreplication FROM pg_roles WHERE rolname='cdc_user';
Sample output:
rolreplication ---------------- t (1 row)
Execute the following statement to grant permissions to the cdc_user account:
GRANT SELECT ON ALL TABLES IN SCHEMA PUBLIC to cdc_user;
Step 3: Modify the parameters of the RDS instance
Execute the following statement to query the parameter settings of the RDS instance:
SELECT name, setting, short_desc, source FROM pg_settings WHERE name ='wal_level';
Sample output:
name | setting | short_desc | source -----------------------+---------+-------------------------------------------------------------------------+-------------------- wal_level | replica | Sets the level of information written to the WAL. | configuration file (1 rows)
The wal_level parameter specifies the amount of data that can be written to WAL logs. The default value is replica. You can configure this parameter only when the server is started. Valid values:
minimal: records only the data that is required to restore the instance from a breakdown or immediate shutdown. At the minimal level, databases cannot be restored by using base backups or WAL logs.
replica: writes a sufficient amount of data to support WAL archiving and replication, including running read-only queries on a standby server.
logical: adds information that is required to support logical decoding in addition to the data that is written at the replica level.
- Go to the Instances page. In the top navigation bar, select the region in which the RDS instance resides. Then, find the RDS instance and click the ID of the instance.
In the left-side navigation pane of the page that appears, click Parameters.
Change the value of the wal_level parameter to logical.
NoteFor more information about how to modify instance parameters, see Modify the parameters of an ApsaraDB RDS for PostgreSQL instance.
After you modify the parameters and submit the modification, the RDS instance restarts. To prevent impacts on your workloads, we recommend that you modify the parameters during off-peak hours.
Step 4: Create a logical replication slot
After you modify the parameters and submit the modification in Step 3, the RDS instance restarts for the modification to take effect. You can perform the following operations when the RDS instance is in the Running state:
Connect your client to the RDS instance by using the db_admin account.
psql -h <Endpoint of the RDS instance> -p 5432 -U db_admin -d testdb
Execute the following statement to use
test_decoding
to create a replication slot namedcdc_replication_slot
:SELECT pg_create_logical_replication_slot('cdc_replication_slot', 'test_decoding');
Notecdc_replication_slot
is used for reference only. You can specify a name based on your business requirements.test_decoding
is an output extension that is provided by open source PostgreSQL. You can directly use test_decoding in the statement.
Sample output:
pg_create_logical_replication_slot ------------------------------------ (cdc_replication_slot,1/14003428) (1 row)
You can execute the following statement to query the result:
SELECT * FROM pg_replication_slots;
Sample output:
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase ----------------------+---------------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+----------------+----------- cdc_replication_slot | test_decoding | logical | 18822 | testdb | f | f | | | 22356 | 1/140033F0 | 1/14003428 | reserved | | f (1 row)
Step 5: Create test data
Execute the following statements to create test data and simulate the production environment:
CREATE TABLE public.tb_test(
id int NOT NULL PRIMARY KEY
);
ALTER TABLE public.tb_test ADD name varchar(1) NULL;
INSERT INTO public.tb_test SELECT 1, 'A';
Step 6: Read data by using your client
Connect your client to the RDS instance by using the cdc_user account.
psql -h <Endpoint of the RDS instance> -p 5432 -U cdc_user -d testdb
Execute the following statement to read data from the replication slot:
SELECT * FROM pg_logical_slot_peek_changes('cdc_replication_slot', null, null);
Sample output:
lsn | xid | data ------------+-------+------------------------------------------------------------------------- 1/14003D90 | 22376 | BEGIN 22376 1/1401DDE8 | 22376 | COMMIT 22376 1/1401DDE8 | 22377 | BEGIN 22377 1/1401E100 | 22377 | COMMIT 22377 1/1401E2A8 | 22382 | BEGIN 22382 1/1401E2A8 | 22382 | table public.tb_test: INSERT: id[integer]:1 name[character varying]:'A' 1/1401E3C0 | 22382 | COMMIT 22382 (7 rows)
Step 7: Consume the captured data by using your client
Run the
\q
command to close the database connection.Run the following command to consume the captured data:
NoteYou must run the
pg_recvlogical
command by using the postgres user. You can run the su - postgres command to switch to the postgres user. If the-bash: pg_recvlogical: command not found
error message is displayed, resolve the issue based on the descriptions in FAQ.pg_recvlogical -h <Endpoint of the RDS instance> -U <Privileged account> -d <Test database> --create-slot --if-not-exists --slot=cdc_replication_slot --plugin=test_decoding --start -f -
Example:
pg_recvlogical -h pgm-*****.pgsql.singapore.rds.aliyuncs.com -U db_admin -d testdb --create-slot --if-not-exists --slot=cdc_replication_slot --plugin=test_decoding --start -f -
Sample output:
BEGIN 22376 COMMIT 22376 BEGIN 22377 COMMIT 22377 BEGIN 22382 table public.tb_test: INSERT: id[integer]:1 name[character varying]:'A' COMMIT 22382
Disable the CDC feature
If you enable the CDC feature for an RDS instance, the RDS instance requires more storage for WAL logs. If the consumption of captured data is abnormal or stopped, WAL logs are not automatically deleted. The logs accumulate and exhaust the disk space. In this case, the RDS instance may be locked. If the RDS instance is locked, the instance can process only read requests. You can manually delete inactive replication slots to allow AliPG to automatically delete WAL logs.
You can delete inactive replication slots in the ApsaraDB RDS console, by calling API operations, or by executing SQL statements.
ApsaraDB RDS console: Use the WAL log management feature for an ApsaraDB RDS for PostgreSQL instance.
API operations: DeleteSlot.
SQL statements:
Connect to the RDS instance by using the db_admin account.
psql -h <Endpoint of the RDS instance> -p 5432 -U db_admin -d testdb
Execute the following statement to view the names and related information about inactive replication slots:
SELECT slot_name, slot_type, database, active, safe_wal_size FROM pg_replication_slots WHERE active = 'f';
Sample output:
slot_name | slot_type | database | active | safe_wal_size ----------------------+-----------+----------+--------+--------------- cdc_replication_slot | logical | testdb | f | (1 row)
Execute the following statement to delete the logical replication slot:
SELECT pg_drop_replication_slot('cdc_replication_slot');
FAQ
What do I do if the
-bash: pg_recvlogical: command not found
error message is displayed when a client consumes the captured data?pg_recvlogical is a native logic decoding tool for PostgreSQL. This tool uses the default test_decoding extension. The test_decoding extension is stored in the contrib/test_decoding directory of the PostgreSQL source code package. We recommend that you compile the PostgreSQL source code and install PostgreSQL. Then, you can find the pg_recvlogical tool in the /bin directory of the installation path of PostgreSQL. For more information about how to install PostgreSQL from source code, see Installation from Source Code.
What do I do if I delete replication slots but the WAL files are not automatically cleared and still occupy disk space?
You can set the
wal_keep_segments
parameter to the default value128
to reduce the number of saved files. For more information about how to modify instance parameters, see Modify the parameters of an ApsaraDB RDS for PostgreSQL instance.