All Products
Search
Document Center

ApsaraDB RDS:Use the zero-ETL feature

Last Updated:Oct 22, 2024

ApsaraDB RDS provides the zero-ETL feature to synchronize data from an ApsaraDB RDS for MySQL instance to a data warehouse. You do not need to create or maintain data synchronization tasks, and you are not charged for data synchronization tasks. This reduces data transmission costs and O&M costs. The data warehouse can be an AnalyticDB for MySQL cluster, an AnalyticDB for PostgreSQL cluster, or an ApsaraDB for ClickHouse cluster.

Supported regions

China (Beijing), China (Hangzhou), China (Shanghai), and China (Shenzhen)

Note

The feature is rolled out for RDS instances in regions in phases. The information in the ApsaraDB RDS console shall prevail.

Overview

In the era of big data, enterprises must use extract-transform-load (ETL) tools to efficiently manage and use large amounts of business data distributed across different systems and platforms.

An ETL tool extracts data from an upper-level business system, transforms the data, and then loads the data to data warehouses. This process incorporates distributed data into data warehouses for further computing, analysis, and business decision-making.

The following section describes the challenges associated with traditional ETL processes:

  • Increased system complexity: The maintenance of ETL tools increases O&M difficulty and prevents you from focusing on business application development.

  • Increased resource costs: Different data sources may require different ETL tools, and you are charged additional fees for creating ETL tasks.

  • Reduced data timeliness: Specific ETL processes involve periodic batch updates. In near-real-time scenarios, analysis results cannot be quickly generated.

To resolve the issues, Alibaba Cloud ApsaraDB provides the zero-ETL feature that allows you to quickly create a data synchronization task between a business system that uses online transaction processing (OLTP) databases and a data warehouse that uses online analytical processing (OLAP) databases. The feature automatically extracts, transforms, cleanses, and loads the data of the business system to the data warehouse. This way, data synchronization and management can be completed in a one-stop manner to integrate transaction processing and data analysis. This allows you to focus on data analysis.

Benefits

  • Ease of use: You do not need to create or maintain complex data pipelines to perform ETL operations. You need to only select the source instance and destination cluster to create real-time data synchronization tasks. This reduces the challenges of building and managing data pipelines and allows you to focus on application development.

  • Zero cost: You can create zero-ETL tasks to synchronize data to a data warehouse free of charge for data analysis.

  • Multi-source aggregation: You can create a zero-ETL task to synchronize data from multiple RDS instances to an AnalyticDB for MySQL cluster, an AnalyticDB for PostgreSQL instance, or an ApsaraDB for ClickHouse cluster in real time to build a global analysis perspective.

    Note

    If you synchronize data from multiple RDS instances to an ApsaraDB for ClickHouse cluster, make sure that the objects to be synchronized do not overlap within different tasks.

Synchronization links

  • ApsaraDB RDS for MySQL to ApsaraDB for ClickHouse

  • ApsaraDB RDS for MySQL to AnalyticDB for MySQL 3.0

  • ApsaraDB RDS for MySQL to AnalyticDB for PostgreSQL

Billing rules

You can use the zero-ETL feature to synchronize data free of charge.

Prerequisites

Preparations

Create a service-linked role and grant the management permissions to a RAM user.

ApsaraDB RDS for MySQL to ApsaraDB for ClickHouse

  1. Create the AliyunServiceRoleForClickHouseZeroETL service-linked role.

    Note

    When you create a task and click the instance ID drop-down list, a message indicating that you must create the AliyunServiceRoleForClickHouseZeroETL role is displayed. The system automatically creates the role. No manual operations are required.

  2. Grant a RAM user management permissions.

    To allow a RAM user to create zero-ETL tasks, you must grant the RAM user the following permissions. For more information, see Create custom policies.

    • Permissions on the RDS instance: You must attach the AliyunRDSFullAccess policy to the RAM user.

    • Permissions on the ApsaraDB for ClickHouse cluster: You must attach the custom policy to the RAM user. The following script describes the custom policy:

      {
          "Version": "1",
          "Statement": [
              {
                  "Action": "clickhouse:*",
                  "Resource": "*",
                  "Effect": "Allow"
              },
              {
                  "Action": "ram:CreateServiceLinkedRole",
                  "Resource": "*",
                  "Effect": "Allow",
                  "Condition": {
                      "StringEquals": {
                          "ram:ServiceName": "clickhouse.aliyuncs.com"
                      }
                  }
              }
          ]
      }
    • Permissions on Data Transmission Service (DTS): You must attach the custom policy to the RAM user. The following script describes the custom policy:

      {
          "Version": "1",
          "Statement": [
              {
                  "Action": "dts:*",
                  "Resource": "*",
                  "Effect": "Allow"
              },
              {
                  "Action": "ram:PassRole",
                  "Resource": "*",
                  "Effect": "Allow",
                  "Condition": {
                      "StringEquals": {
                          "acs:Service": "dts.aliyuncs.com"
                      }
                  }
              }
          ]
      }

ApsaraDB RDS for MySQL to AnalyticDB for MySQL 3.0

  1. Create the AliyunServiceRoleForAnalyticDBForMySQL service-linked role.

    1. Log on to the RAM console. In the left-side navigation pane, click Roles. Check whether a service-linked role named AliyunServiceRoleForAnalyticDBForMySQL exists in the role list. If the service-linked role does not exist, create the role.

    2. In the upper-left corner of the page that appears, click Create Role.

    3. In the Select Role Type step, select Alibaba Cloud Service and click Next.

    4. Set the Role Type parameter to Service Linked Role and the Select Service parameter to AnalyticDB for MySQL.

    5. Click OK. Return to the Roles page and check whether the service-linked role is created.

  2. Grant a RAM user management permissions.

    You can grant a RAM user permissions on all or specific ApsaraDB RDS for MySQL instances and AnalyticDB for MySQL clusters. For more information, see Create custom policies.

    The following section describes the scripts of custom policies:

    Grant permissions on all RDS instances and the AnalyticDB for MySQL cluster

    {
        "Version": "1",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "dts:*",
                "Resource": [
                    "acs:adb:*:*:*",
                    "acs:rds:*:*:*"
                ]
            },
            {
                "Effect": "Allow",
                "Action": [
                    "dts:DescribeRegions",
                    "dts:DescribeConfigRelations",
                    "dts:DescribeSrcLinkConfig",
                    "dts:DescribeDestLinkConfig",
                    "dts:DescribeLinkConfig",
                    "dts:DescribeConciseJobStatics",
                    "dts:ListUserAuthorizationLogs",
                    "dts:CreateUserAuthorization"
                ],
                "Resource": [
                    "acs:dts:*:*:*"
                ]
            }
        ]
    }

    Grant permissions on specific RDS instances and the AnalyticDB for MySQL cluster

    {
        "Version": "1",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "dts:*",
                "Resource": [
                    "acs:adb:*:*:dbcluster/am-2zeod8ax4b9a****", Use the ID of the AnalyticDB for MySQL cluster to be synchronized.
                    "acs:rds:*:*:dbinstance/rm-2ze6fs8ouh43****", Use the ID of the RDS instance to be synchronized.
                ]
            },
            {
                "Effect": "Allow",
                "Action": [
                    "dts:DescribeRegions",
                    "dts:DescribeConfigRelations",
                    "dts:DescribeSrcLinkConfig",
                    "dts:DescribeDestLinkConfig",
                    "dts:DescribeLinkConfig",
                    "dts:DescribeConciseJobStatics",
                    "dts:ListUserAuthorizationLogs",
                    "dts:CreateUserAuthorization"
                ],
                "Resource": "acs:dts:*:*:*"
            }
        ]
    }

ApsaraDB RDS for MySQL to AnalyticDB for PostgreSQL

  1. Create a service-linked role named AliyunServiceRoleForADBPG.

    1. Log on to the RAM console. In the left-side navigation pane, click Roles. On the page that appears, check whether the AliyunServiceRoleForADBPG service-linked role exists. If the service-linked role does not exist, create the role.

    2. In the upper-left corner of the page that appears, click Create Role.

    3. In the Select Role Type step, select Alibaba Cloud Service and click Next.

    4. Set the Role Type parameter to Service Linked Role and the Select Service parameter to AnalyticDB for PostgreSQL.

    5. Click OK. Return to the Roles page and check whether the service-linked role is created.

  2. Grant a RAM user management permissions.

    You can grant a RAM user permissions on all or specific ApsaraDB RDS for MySQL instances and AnalyticDB for PostgreSQL clusters. For more information, see Create custom policies.

    The following section describes the scripts of custom policies:

    Grant permissions on all RDS instances and the AnalyticDB for PostgreSQL cluster

    {
        "Version": "1",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "dts:*",
                "Resource": [
                    "acs:gpdb:*:*:*",
                    "acs:rds:*:*:*"
                ]
            },
            {
                "Effect": "Allow",
                "Action": [
                    "dts:DescribeRegions",
                    "dts:DescribeConfigRelations",
                    "dts:DescribeSrcLinkConfig",
                    "dts:DescribeDestLinkConfig",
                    "dts:DescribeLinkConfig"
                ],
                "Resource": [
                    "acs:gpdb:*:*:*"
                ]
            }
        ]
    }

    Grant permissions on specific RDS instances and the AnalyticDB for PostgreSQL cluster

    {
        "Version": "1",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "dts:*",
                "Resource": [
                    "acs:gpdb:*:*:dbinstanc/gp-bp1a740l3zx4****", Use the ID of the AnalyticDB for PostgreSQL cluster to be synchronized.
                    "acs:rds:*:*:dbinstance/rm-2ze6fs8ouh43****", Use the ID of the RDS instance to be synchronized.
                ]
            },
            {
                "Effect": "Allow",
                "Action": [
                    "dts:DescribeRegions",
                    "dts:DescribeConfigRelations",
                    "dts:DescribeSrcLinkConfig",
                    "dts:DescribeDestLinkConfig",
                    "dts:DescribeLinkConfig"
                ],
                "Resource": "acs:dts:*:*:*"
            }
        ]
    }

Procedure

  1. Log on to the ApsaraDB RDS console. In the left-side navigation pane, click Data Integration.

  2. On the Zero-ETL tab, click Create Zero-ETL Task.

  3. On the Create Zero-ETL Task page, configure the source and destination databases.

    • The following table describes the parameters of the source database.

      Parameter

      Description

      Task Name

      The name of the zero-ETL task.

      Database Type

      The database type. Select RDS for MySQL.

      Access Method

      The access method. Set the value to Alibaba Cloud Instance.

      Instance Region

      The region of the source instance.

      Instance ID

      The ID of the ApsaraDB RDS for MySQL instance.

      Database Name

      The name of the database in the ApsaraDB RDS for MySQL instance.

      Database account

      The username of the database account for the ApsaraDB RDS for MySQL instance.

      Database Password

      The password of the database account for the ApsaraDB RDS for MySQL instance.

      Encryption

      The method that is used to connect to the ApsaraDB RDS for MySQL instance. Valid values: Non-encrypted and SSL-encrypted.

      Note

      If you select SSL-encrypted, you must enable SSL encryption for the ApsaraDB RDS for MySQL instance. For more information, see Configure the SSL encryption feature.

    • The following table describes the parameters of the destination database.

      Parameter

      Description

      Database Type

      The database type. Valid values: AnalyticDB for MySQL 3.0, AnalyticDB PostgreSQL, and ClickHouse.

      Access Method

      The access method. Set the value to Alibaba Cloud Instance.

      Instance Region

      The ID of the cluster in which the destination database is created.

      Instance ID

      The ID of the cluster in which the destination database is created.

      Database account

      The username of the database account for the cluster in which the destination database is created.

      Database Password

      The password of the database account for the cluster in which the destination database is created.

  4. After you configure the preceding parameters, click Test Connectivity and Proceed. In the Configure Zero-ETL step, configure the parameters. The following table describes the parameters.

    Parameter/Section

    Description

    Synchronization Type

    The type of the data synchronization between the source and destination databases.

    Valid values: Schema Synchronization, Full Data Synchronization, and Incremental Data Synchronization.

    Note

    For synchronization tasks, Incremental Data Synchronization is automatically selected and cannot be cleared. Incremental data of the source database is synchronized to the destination database in real time.

    If you do not want to synchronize incremental data, use the data migration method.

    DDL and DML Operations to Be Synchronized

    The DML operations and DDL operations that you want to synchronize. The DML operations include INSERT, UPDATE, and DELETE. The DDL operations include CREATE, ALTER, DROP, RENAME, and TRUNCATE. By default, all operations are selected.

    Source Objects and Selected Objects

    The objects in the source database and the objects that you want to synchronize.

    Advanced Settings (Optional)

    The retry time for failed connections between the source and destination databases and the retry time for other issues that occur on the source and destination databases.

  5. After you configure the preceding parameters, click Next: Configure Database and Table Fields. In the Configure Database and Table Fields step, configure the required parameters. The following table describes the parameters.

    Parameter

    Description

    Database Name

    The name of the selected database.

    Table Name

    The name of the selected table.

    Primary Key Column

    The primary key column of the selected table.

    Distribution Key

    The distribution key column of the selected table.

    Types

    The type of the table, which can be a partitioned table or a replicated table.

    Definition Status

    The status of the selected table. After you configure the table fields, the status of the table changes from Undefined to Defined.

  6. After you configure the preceding parameters, click Next: Save Task Settings and Precheck.

  7. If the precheck is successful, click Start to start the zero-ETL task.

Note

On the Zero-ETL tab, you can click the Source tab to view the Name, Source/Destination, and Status parameters of the zero-ETL task.