All Products
Search
Document Center

Realtime Compute for Apache Flink:Configure a PostgreSQL database

Last Updated:Apr 22, 2024

Before you use the PostgreSQL Change Data Capture (CDC) connector, make sure that the upstream PostgreSQL storage meets the configuration requirements. This topic describes the configuration requirements and related operations for ApsaraDB RDS for PostgreSQL, Amazon RDS for PostgreSQL, and self-managed PostgreSQL databases.

Background information

Before you use the PostgreSQL CDC connector, you must make sure that the upstream PostgreSQL storage meets the configuration requirements. This topic describes how to configure ApsaraDB RDS for PostgreSQL, Amazon RDS for PostgreSQL, and self-managed PostgreSQL databases. The PostgreSQL CDC connector synchronizes data based on the logical subscription mechanism of PostgreSQL. For more information, see Logical Decoding Concepts and Principles and Best Practices of Logical Subscription.

ApsaraDB RDS for PostgreSQL database

  • Network connection between ApsaraDB RDS for PostgreSQL and Ververica Platform (VVP)

  • ApsaraDB RDS for PostgreSQL server configurations

    • Supported versions

      Only data of PostgreSQL 10, PostgreSQL 11, PostgreSQL 12, PostgreSQL 13, PostgreSQL 14, PostgreSQL 15, and PostgreSQL 16 can be synchronized. You can run the select version() command to query the version of PostgreSQL.

    • Logical decoding

      • Configure wal_level = logical to enable logical decoding. For more information, see Modify the parameters of an ApsaraDB RDS for PostgreSQL instance. If the configuration is successful, you can test and use this feature based on the example that is described in Logical decoding.

      • Make sure that the setting of REPLICA IDENTITY of the table whose data needs to be synchronized is FULL.

        You can execute the following SQL statements to view and modify the setting of REPLICA IDENTITY.

        -- View the setting of REPLICA IDENTITY. 
        SELECT CASE relreplident
          WHEN'd'THEN'default'
        WHEN'n'THEN'nothing'
        WHEN'f'THEN'full'
        WHEN'i'THEN'index'
        ENDASreplica_identity
        FROMpg_class
        WHEREoid='mytablename'::regclass;
        
        -- Modify the setting of REPLICA IDENTITY. 
        ALTERTABLEmytablenameREPLICAIDENTITYFULL;
  • Data synchronization plug-in

    By default, the wal2json and pgoutput plug-ins are installed in ApsaraDB RDS for PostgreSQL 10 and later. The wal2json plug-in is supported only when the major engine version of the ApsaraDB RDS for PostgreSQL instance is PostgreSQL 11, PostgreSQL 12, or PostgreSQL 13. The pgoutput plug-in is supported only when the major engine version of the ApsaraDB RDS for PostgreSQL instance is PostgreSQL 10 or later.

    Note

    If you use the pgoutput plug-in to synchronize data, make sure that the account that you use is a privileged account. For more information, see Create an account.

Amazon RDS for PostgreSQL

  • Network connection between Amazon RDS for PostgreSQL and VVP

    To establish a network connection between Amazon RDS for PostgreSQL and VVP, you must allow VVP to access Amazon RDS for PostgreSQL over the Internet. For more information, see How does Realtime Compute for Apache Flink access the Internet? and Controlling access with security groups.

  • Amazon RDS for PostgreSQL server configurations

    • Supported versions

      Only data of PostgreSQL 10, PostgreSQL 11, PostgreSQL 12, PostgreSQL 13, PostgreSQL 14, PostgreSQL 15, and PostgreSQL 16 can be synchronized. You can run the select version() command to query the version of PostgreSQL.

    • Logical decoding

      • Set the rds.logical_replication parameter to 1.

      • Make sure that the value of the wal_level parameter is logical. By default, if the rds.logical_replication parameter is set to 1, the wal_level parameter is automatically set to logical.

      • Make sure that the setting of REPLICA IDENTITY of the table whose data needs to be synchronized is FULL.

        You can execute the following SQL statements to view and modify the setting of REPLICA IDENTITY.

        -- View the setting of REPLICA IDENTITY. 
        SELECTCASErelreplident
         WHEN'd'THEN'default'
         WHEN'n'THEN'nothing'
         WHEN'f'THEN'full'
         WHEN'i'THEN'index'
         ENDASreplica_identity
        FROMpg_class
        WHEREoid='mytablename'::regclass;
        
        -- Modify the setting of REPLICA IDENTITY. 
        ALTERTABLEmytablenameREPLICAIDENTITYFULL;
      • Account role requirements

        By default, only a management account of Amazon Web Services (AWS) is assigned the rds_replication role on Amazon RDS for PostgreSQL. A user who is assigned this role can manage logical slots and read streaming data by using logical slots. If you want to enable logical replication by using an account other than a management account, you must assign the rds_replication role to the account. For example, you can assign the rds_replication role to the user that is specified by <my_user>. You can assign the rds_replication role to users only if you have the superuser permissions. If you want to create an initial snapshot by using an account other than a management account, you must grant the account the SELECT permission on the table that you want to capture.

    • Data synchronization plug-in

      Use the pgoutput plug-in for data synchronization.

Self-managed PostgreSQL database

  • Network connection between your self-managed PostgreSQL database and VVP

    • If your self-managed PostgreSQL database is deployed on Alibaba Cloud and resides in the same VPC as VVP, a network connection is established between your self-managed PostgreSQL database and VVP.

    • If your self-managed PostgreSQL database is not deployed on Alibaba Cloud, you must allow VVP to access the public IP address of your self-managed PostgreSQL database. In this case, you must connect your self-managed PostgreSQL database and VVP by using VPN Gateway. For more information, see How does Realtime Compute for Apache Flink access the Internet? You also need to modify the pg_hba.conf file to allow VVP to access your self-managed PostgreSQL database. For more information, see Update the pg_hba.conf file.

  • Self-managed PostgreSQL server configurations

    • Supported versions

      Only data of PostgreSQL 9.6, PostgreSQL 10, PostgreSQL 11, PostgreSQL 12, PostgreSQL 13, PostgreSQL 14, PostgreSQL 15, and PostgreSQL 16 can be synchronized. You can run the select version() command to query the version of PostgreSQL.

    • Logical decoding

      • Make sure that the value of the wal_level parameter is logical. Modify the postgresql.conf file and restart the database.

      • Make sure that the setting of REPLICA IDENTITY of the table whose data needs to be synchronized is FULL.

        You can execute the following SQL statements to view and modify the setting of REPLICA IDENTITY.

        -- View the setting of REPLICA IDENTITY. 
        SELECT CASE relreplident
           WHEN 'd' THEN 'default'
           WHEN 'n' THEN 'nothing'
           WHEN 'f' THEN 'full'
           WHEN 'i' THEN 'index'
           END AS replica_identity
        FROM pg_class
        WHERE oid = 'mytablename'::regclass;
        
        -- Modify the setting of REPLICA IDENTITY. 
        ALTER TABLE mytablename REPLICA IDENTITY FULL;
      • Create an account and grant the account the permissions to synchronize data.

        The data synchronization account must have the REPLICATION and LOGIN permissions, and the SELECT permission to synchronize table data.

        CREATE ROLE <replication_user> REPLICATION LOGIN;
        
        GRANT SELECT ON <table_name> TO <name>;
    • Data synchronization plug-in

      By default, the pgoutput plug-in is installed in PostgreSQL 10 and later. If you use other PostgreSQL versions or install other plug-ins for data synchronization, follow the instructions that are described in Logical Decoding Output Plug-in Installation for PostgreSQL.

      If you use the pgoutput plug-in as the synchronization tool, make sure that the data synchronization account is the owner of the table whose data needs to be synchronized. To ensure that the data synchronization account is the owner of the table whose data needs to be synchronized, perform the following steps:

      -- 1. Create a synchronization group. 
      CREATE ROLE <replication_group>;
      
      -- 2. Add the original owner of the table to the synchronization group. 
      GRANT REPLICATION_GROUP TO <original_owner>;
      
      -- 3. Add the data synchronization account to the synchronization group. 
      GRANT REPLICATION_GROUP TO <replication_user>;
      
      -- 4. Transfer the relevant permissions to the synchronization group. 
      ALTER TABLE <table_name> OWNER TO REPLICATION_GROUP;
      Note

      In this case, the data synchronization account is only the owner of a single table. Therefore, set the debezium.publication.autocreate.mode parameter in the WITH clause to filtered. If the operation that is performed to configure relevant permissions for each table is complex, you can directly grant the pg_monitor permission to the data synchronization account.

      GRANT pg_monitor TO <replication_user>;