All Products
Search
Document Center

:One-click data warehousing

Last Updated:Jan 04, 2024

AnalyticDB for PostgreSQL provides the one-click data warehousing feature to synchronize data from ApsaraDB RDS for PostgreSQL to AnalyticDB for PostgreSQL. To synchronize data in real time, you can create a publication on ApsaraDB RDS for PostgreSQL, and then create a one-click data warehousing job as the subscription to the publication on AnalyticDB for PostgreSQL.

Usage notes

  • The one-click data warehousing feature is supported only for AnalyticDB for PostgreSQL instances of V6.3.8.0 or later in elastic storage mode.

  • The one-click data warehousing feature is supported only for ApsaraDB RDS for PostgreSQL instances that run PostgreSQL 10.0 or later.

  • The ApsaraDB RDS for PostgreSQL and AnalyticDB for PostgreSQL instances must reside within the same virtual private cloud (VPC).

  • The synchronization objects must be tables.

  • Only the INSERT, UPDATE, DELETE, and TRUNCATE operations can be synchronized. Schema changes cannot be synchronized.

  • The destination table must have the same schema as the source table.

  • The source table on ApsaraDB RDS for PostgreSQL must contain a primary key.

  • Only the following data types are supported for synchronization:

    Numeric, character, BOOLEAN, time and date, enumeration, GEO, INTERNET, RANGE, JSON, BYTEA, BIT, and an array of the preceding types.

    Note

    If you synchronize data of other types, the one-click data warehousing job may be interrupted.

  • If data of the source table on ApsaraDB RDS for PostgreSQL is updated by using a statement such as UPDATE/DELETE WHERE CTID = ******, the update is not synchronized to the destination table on AnalyticDB for PostgreSQL.

Prerequisites

  • AnalyticDB for PostgreSQL

  • ApsaraDB RDS for PostgreSQL

    • An ApsaraDB RDS for PostgreSQL instance is created. For more information, see Create an ApsaraDB RDS for PostgreSQL instance.

    • A privileged account is created. For more information, see Create an account.

    • The wal_level parameter is set to logical. For information about how to modify parameters, see Modify the parameters of an ApsaraDB RDS for PostgreSQL instance.

      Note

      The modification to the parameter takes effect only after you restart the instance. We recommend that you restart the instance during off-peak hours.

    • The internal endpoint of the AnalyticDB for PostgreSQL instance is added to a whitelist of the ApsaraDB RDS for PostgreSQL instance.

SQL syntax

This section describes the SQL syntax that is used to create, modify, and delete a one-click data warehousing job.

Create a one-click data warehousing job

Syntax

CREATE SUBSCRIPTION <subscription_name>
    CONNECTION 'conninfo'
    PUBLICATION <publication_name> [, ...]
    [ WITH ( subscription_parameter [= value] [, ... ] ) ]

Parameters

Parameter

Description

subscription_name

The name of the one-click data warehousing job.

conninfo

The connection information of the ApsaraDB RDS for PostgreSQL instance, including the endpoint, port number, database name, account, and password.

  • Endpoint and port number: the endpoint and port number of the ApsaraDB RDS for PostgreSQL instance. For information about how to obtain the endpoint and port number, see View and change the endpoints and port numbers of an ApsaraDB RDS for PostgreSQL instance.

  • Database name: the name of the database that contains the source table.

  • Account and password: the account that has read and write permissions on the preceding database and the corresponding password.

Example: host=pgm-bp*******************.pg.rds.aliyuncs.com port=5432 dbname=*** user=*** password=***

publication_name

The name of the publication that is created on the ApsaraDB RDS for PostgreSQL instance.

subscription_parameter

The optional parameter of the one-click data warehousing job. Only copy_data (boolean) is supported.

copy_data (boolean): specifies whether to synchronize existing data from the source table to the destination table when the one-click data warehousing job is created. Valid values:

  • true

  • false

Modify a one-click data warehousing job

  • Refresh the synchronization objects of a one-click data warehousing job

    When a new source table is added to the publication on the ApsaraDB RDS for PostgreSQL instance, you must perform a REFRESH operation to allow the destination table to subscribe to data of the new source table.

    Syntax

    ALTER SUBSCRIPTION <subscription_name> REFRESH PUBLICATION [ WITH ( refresh_option [= value] [, ... ] ) ]

    Parameters

    Parameter

    Description

    subscription_name

    The name of the one-click data warehousing job.

    refresh_option

    The optional parameter of the one-click data warehousing job. Only copy_data (boolean) is supported.

    copy_data (boolean): specifies whether to synchronize existing data from the source table to the destination table when the one-click data warehousing job is created. Valid values:

    • true

    • false

  • Start a one-click data warehousing job

    Syntax

    ALTER SUBSCRIPTION <subscription_name> ENABLE

    Parameters

    Parameter

    Description

    subscription_name

    The name of the one-click data warehousing job.

  • Stop an ongoing one-click data warehousing job

    Syntax

    ALTER SUBSCRIPTION <subscription_name> DISABLE

    Parameters

    Parameter

    Description

    subscription_name

    The name of the one-click data warehousing job.

  • Modify the owner of a one-click data warehousing job

    Syntax

    ALTER SUBSCRIPTION <subscription_name> OWNER TO { <new_owner> | CURRENT_USER | SESSION_USER }

    Parameters

    Parameter

    Description

    subscription_name

    The name of the one-click data warehousing job.

    new_owner

    The new owner of the one-click data warehousing job.

  • Modify the name of a one-click data warehousing job

    Syntax

    ALTER SUBSCRIPTION <subscription_name> RENAME TO <new_name>

    Parameters

    Parameter

    Description

    subscription_name

    The name of the one-click data warehousing job.

    new_name

    The new name of the one-click data warehousing job.

Delete a one-click data warehousing job

Syntax

DROP SUBSCRIPTION [ IF EXISTS ] <subscription_name>

Parameters

Parameter

Description

subscription_name

The name of the one-click data warehousing job that you want to delete.

For more information about the syntax, see the PostgreSQL documentation.

Start data synchronization

The following example shows how to synchronize a source table named logical_tb1 from the ApsaraDB RDS for PostgreSQL instance to the AnalyticDB for PostgreSQL instance:

  1. Create a source table named logical_tb1 on the ApsaraDB RDS for PostgreSQL instance.

    CREATE TABLE logical_tb1(a int primary key, b varchar(20));
  2. Create a publication named pub1 on the ApsaraDB RDS for PostgreSQL instance and add the logical_tb1 table to the pub1 publication.

    CREATE PUBLICATION pub1 FOR TABLE logical_tb1;
  3. Create a destination table that has the same schema as the source table on the AnalyticDB for PostgreSQL instance.

    CREATE TABLE logical_tb1(a int primary key, b varchar(20));
  4. Create a one-click data warehousing job on the AnalyticDB for PostgreSQL instance.

    • If you want to synchronize existing data and new data from the source table, execute the following statement:

      CREATE SUBSCRIPTION sub1 CONNECTION 'host=****** port=5432 user=*** dbname=*** password=***' PUBLICATION pub1;
    • If you want to synchronize only new data from the source table, execute the following statement:

      CREATE SUBSCRIPTION sub1 CONNECTION 'host=****** port=5432 user=*** dbname=*** password=***' PUBLICATION pub1 WITH(copy_data = false);

Modify the tables that you want to synchronize

Assume that a one-click data warehousing job named sub1 is created on the AnalyticDB for PostgreSQL instance to synchronize the logical_tb1 table that is contained in the pub1 publication on the ApsaraDB RDS for PostgreSQL instance. The following example shows how to synchronize another table named logical_tb2 to the AnalyticDB for PostgreSQL instance:

  1. Create a source table named logical_tb2 on the ApsaraDB RDS for PostgreSQL instance.

    CREATE TABLE logical_tb2(id int primary key, name varchar(20), age int);
  2. Add the logical_tb2 table to the pub1 publication on the ApsaraDB RDS for PostgreSQL instance.

    ALTER PUBLICATION pub1 ADD TABLE logical_tb2;
  3. Create a destination table that has the same schema as the source table on the AnalyticDB for PostgreSQL instance.

    CREATE TABLE logical_tb2(id int primary key, name varchar(20), age int);
  4. Refresh the subscribed publication information on the AnalyticDB for PostgreSQL instance. The logical_tb2 table is synchronized to the AnalyticDB for PostgreSQL instance.

    ALTER SUBSCRIPTION sub1 REFRESH PUBLICATION;

Stop data synchronization

Assume that a publication named pub1 is created on the ApsaraDB RDS for PostgreSQL instance and the pub1 publication contains the logical_tb1 and logical_tb2 tables. The following example shows how to continue synchronizing the logical_tb1 table and stop synchronizing the logical_tb2 table:

  1. Remove the logical_tb2 table from the pub1 publication on the ApsaraDB RDS for PostgreSQL instance.

    ALTER PUBLICATION pub1 DROP TABLE logical_tb2;
  2. Refresh the subscribed publication information on the AnalyticDB for PostgreSQL instance. The logical_tb1 table is synchronized to the AnalyticDB for PostgreSQL instance but the logical_tb1 table is not.

    ALTER SUBSCRIPTION sub1 REFRESH PUBLICATION;

Query the state of a one-click data warehousing job

Query whether a one-click data warehousing job is complete

Execute the following statement on the ApsaraDB RDS for PostgreSQL instance to query whether a one-click data warehousing job is complete:

SELECT confirmed_flush_lsn = pg_current_wal_lsn() FROM pg_replication_slots WHERE slot_name = '<subscription_name>';

subscription_name specifies the name of the one-click data warehousing job that you want to query.

Return values:

  • t: The one-click data warehousing job is complete.

  • f: The one-click data warehousing job is in progress.

  • null: The one-click data warehousing job does not exist or is abnormal.

Query the state of a one-click data warehousing job

Execute the following statement on the ApsaraDB RDS for PostgreSQL instance to query the state of a one-click data warehousing job:

SELECT active FROM pg_replication_slots WHERE slot_name='<subscription_name>';

subscription_name specifies the name of the one-click data warehousing job that you want to query.

Return values:

  • t: The one-click data warehousing job is normal.

  • f: The one-click data warehousing job is abnormal.

  • null: The one-click data warehousing job does not exist.

Query the number of one-click data warehousing jobs on an AnalyticDB for PostgreSQL instance

Execute the following statement on the AnalyticDB for PostgreSQL instance to query the number of one-click data warehousing jobs on the instance:

SELECT count(*) FROM pg_subscription WHERE subenabled = true;

References