ApsaraDB RDS provides the zero-ETL feature to synchronize data from an ApsaraDB RDS for MySQL instance to a data warehouse. You do not need to create or maintain data synchronization tasks, and you are not charged for data synchronization tasks. This reduces data transmission costs and O&M costs. The data warehouse can be an AnalyticDB for MySQL cluster, an AnalyticDB for PostgreSQL cluster, or an ApsaraDB for ClickHouse cluster.
Supported regions
China (Beijing), China (Hangzhou), China (Shanghai), and China (Shenzhen)
The feature is rolled out for RDS instances in regions in phases. The information in the ApsaraDB RDS console shall prevail.
Overview
In the era of big data, enterprises must use extract-transform-load (ETL) tools to efficiently manage and use large amounts of business data distributed across different systems and platforms.
An ETL tool extracts data from an upper-level business system, transforms the data, and then loads the data to data warehouses. This process incorporates distributed data into data warehouses for further computing, analysis, and business decision-making.
The following section describes the challenges associated with traditional ETL processes:
Increased system complexity: The maintenance of ETL tools increases O&M difficulty and prevents you from focusing on business application development.
Increased resource costs: Different data sources may require different ETL tools, and you are charged additional fees for creating ETL tasks.
Reduced data timeliness: Specific ETL processes involve periodic batch updates. In near-real-time scenarios, analysis results cannot be quickly generated.
To resolve the issues, Alibaba Cloud ApsaraDB provides the zero-ETL feature that allows you to quickly create a data synchronization task between a business system that uses online transaction processing (OLTP) databases and a data warehouse that uses online analytical processing (OLAP) databases. The feature automatically extracts, transforms, cleanses, and loads the data of the business system to the data warehouse. This way, data synchronization and management can be completed in a one-stop manner to integrate transaction processing and data analysis. This allows you to focus on data analysis.
Benefits
Ease of use: You do not need to create or maintain complex data pipelines to perform ETL operations. You need to only select the source instance and destination cluster to create real-time data synchronization tasks. This reduces the challenges of building and managing data pipelines and allows you to focus on application development.
Zero cost: You can create zero-ETL tasks to synchronize data to a data warehouse free of charge for data analysis.
Multi-source aggregation: You can create a zero-ETL task to synchronize data from multiple RDS instances to an AnalyticDB for MySQL cluster, an AnalyticDB for PostgreSQL instance, or an ApsaraDB for ClickHouse cluster in real time to build a global analysis perspective.
NoteIf you synchronize data from multiple RDS instances to an ApsaraDB for ClickHouse cluster, make sure that the objects to be synchronized do not overlap within different tasks.
Synchronization links
ApsaraDB RDS for MySQL to ApsaraDB for ClickHouse
ApsaraDB RDS for MySQL to AnalyticDB for MySQL 3.0
ApsaraDB RDS for MySQL to AnalyticDB for PostgreSQL
Billing rules
You can use the zero-ETL feature to synchronize data free of charge.
Prerequisites
An RDS instance is created. For more information, see Create an ApsaraDB RDS for MySQL instance.
An ApsaraDB for ClickHouse cluster, an AnalyticDB for PostgreSQL instance, or an AnalyticDB for MySQL Data Warehouse Edition cluster is created. For more information, see Create a database, Create an instance, and Create a Data Warehouse Edition cluster.
The source instance and the destination cluster or instance reside in the same region.
Database accounts are created for the RDS instance and the destination cluster or instance. For more information, see Create an account for an ApsaraDB RDS for MySQL instance, Create an account for an ApsaraDB for ClickHouse cluster, Create a database account for an AnalyticDB for MySQL cluster, and Create a database account for an AnalyticDB for PostgreSQL cluster.
Preparations
Create a service-linked role and grant the management permissions to a RAM user.
ApsaraDB RDS for MySQL to ApsaraDB for ClickHouse
Create the AliyunServiceRoleForClickHouseZeroETL service-linked role.
NoteWhen you create a task and click the instance ID drop-down list, a message indicating that you must create the AliyunServiceRoleForClickHouseZeroETL role is displayed. The system automatically creates the role. No manual operations are required.
Grant a RAM user management permissions.
To allow a RAM user to create zero-ETL tasks, you must grant the RAM user the following permissions. For more information, see Create custom policies.
Permissions on the RDS instance: You must attach the AliyunRDSFullAccess policy to the RAM user.
Permissions on the ApsaraDB for ClickHouse cluster: You must attach the custom policy to the RAM user. The following script describes the custom policy:
{ "Version": "1", "Statement": [ { "Action": "clickhouse:*", "Resource": "*", "Effect": "Allow" }, { "Action": "ram:CreateServiceLinkedRole", "Resource": "*", "Effect": "Allow", "Condition": { "StringEquals": { "ram:ServiceName": "clickhouse.aliyuncs.com" } } } ] }
Permissions on Data Transmission Service (DTS): You must attach the custom policy to the RAM user. The following script describes the custom policy:
{ "Version": "1", "Statement": [ { "Action": "dts:*", "Resource": "*", "Effect": "Allow" }, { "Action": "ram:PassRole", "Resource": "*", "Effect": "Allow", "Condition": { "StringEquals": { "acs:Service": "dts.aliyuncs.com" } } } ] }
ApsaraDB RDS for MySQL to AnalyticDB for MySQL 3.0
Create the AliyunServiceRoleForAnalyticDBForMySQL service-linked role.
Log on to the RAM console. In the left-side navigation pane, click Roles. Check whether a service-linked role named AliyunServiceRoleForAnalyticDBForMySQL exists in the role list. If the service-linked role does not exist, create the role.
In the upper-left corner of the page that appears, click Create Role.
In the Select Role Type step, select Alibaba Cloud Service and click Next.
Set the Role Type parameter to Service Linked Role and the Select Service parameter to AnalyticDB for MySQL.
Click OK. Return to the Roles page and check whether the service-linked role is created.
Grant a RAM user management permissions.
You can grant a RAM user permissions on all or specific ApsaraDB RDS for MySQL instances and AnalyticDB for MySQL clusters. For more information, see Create custom policies.
The following section describes the scripts of custom policies:
Grant permissions on all RDS instances and the AnalyticDB for MySQL cluster
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": "dts:*", "Resource": [ "acs:adb:*:*:*", "acs:rds:*:*:*" ] }, { "Effect": "Allow", "Action": [ "dts:DescribeRegions", "dts:DescribeConfigRelations", "dts:DescribeSrcLinkConfig", "dts:DescribeDestLinkConfig", "dts:DescribeLinkConfig", "dts:DescribeConciseJobStatics", "dts:ListUserAuthorizationLogs", "dts:CreateUserAuthorization" ], "Resource": [ "acs:dts:*:*:*" ] } ] }
Grant permissions on specific RDS instances and the AnalyticDB for MySQL cluster
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": "dts:*", "Resource": [ "acs:adb:*:*:dbcluster/am-2zeod8ax4b9a****", Use the ID of the AnalyticDB for MySQL cluster to be synchronized. "acs:rds:*:*:dbinstance/rm-2ze6fs8ouh43****", Use the ID of the RDS instance to be synchronized. ] }, { "Effect": "Allow", "Action": [ "dts:DescribeRegions", "dts:DescribeConfigRelations", "dts:DescribeSrcLinkConfig", "dts:DescribeDestLinkConfig", "dts:DescribeLinkConfig", "dts:DescribeConciseJobStatics", "dts:ListUserAuthorizationLogs", "dts:CreateUserAuthorization" ], "Resource": "acs:dts:*:*:*" } ] }
ApsaraDB RDS for MySQL to AnalyticDB for PostgreSQL
Create a service-linked role named AliyunServiceRoleForADBPG.
Log on to the RAM console. In the left-side navigation pane, click Roles. On the page that appears, check whether the AliyunServiceRoleForADBPG service-linked role exists. If the service-linked role does not exist, create the role.
In the upper-left corner of the page that appears, click Create Role.
In the Select Role Type step, select Alibaba Cloud Service and click Next.
Set the Role Type parameter to Service Linked Role and the Select Service parameter to AnalyticDB for PostgreSQL.
Click OK. Return to the Roles page and check whether the service-linked role is created.
Grant a RAM user management permissions.
You can grant a RAM user permissions on all or specific ApsaraDB RDS for MySQL instances and AnalyticDB for PostgreSQL clusters. For more information, see Create custom policies.
The following section describes the scripts of custom policies:
Grant permissions on all RDS instances and the AnalyticDB for PostgreSQL cluster
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": "dts:*", "Resource": [ "acs:gpdb:*:*:*", "acs:rds:*:*:*" ] }, { "Effect": "Allow", "Action": [ "dts:DescribeRegions", "dts:DescribeConfigRelations", "dts:DescribeSrcLinkConfig", "dts:DescribeDestLinkConfig", "dts:DescribeLinkConfig" ], "Resource": [ "acs:gpdb:*:*:*" ] } ] }
Grant permissions on specific RDS instances and the AnalyticDB for PostgreSQL cluster
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": "dts:*", "Resource": [ "acs:gpdb:*:*:dbinstanc/gp-bp1a740l3zx4****", Use the ID of the AnalyticDB for PostgreSQL cluster to be synchronized. "acs:rds:*:*:dbinstance/rm-2ze6fs8ouh43****", Use the ID of the RDS instance to be synchronized. ] }, { "Effect": "Allow", "Action": [ "dts:DescribeRegions", "dts:DescribeConfigRelations", "dts:DescribeSrcLinkConfig", "dts:DescribeDestLinkConfig", "dts:DescribeLinkConfig" ], "Resource": "acs:dts:*:*:*" } ] }
Procedure
Log on to the ApsaraDB RDS console. In the left-side navigation pane, click Data Integration.
On the Zero-ETL tab, click Create Zero-ETL Task.
On the Create Zero-ETL Task page, configure the source and destination databases.
The following table describes the parameters of the source database.
Parameter
Description
Task Name
The name of the zero-ETL task.
Database Type
The database type. Select RDS for MySQL.
Access Method
The access method. Set the value to Alibaba Cloud Instance.
Instance Region
The region of the source instance.
Instance ID
The ID of the ApsaraDB RDS for MySQL instance.
Database Name
The name of the database in the ApsaraDB RDS for MySQL instance.
Database account
The username of the database account for the ApsaraDB RDS for MySQL instance.
Database Password
The password of the database account for the ApsaraDB RDS for MySQL instance.
Encryption
The method that is used to connect to the ApsaraDB RDS for MySQL instance. Valid values: Non-encrypted and SSL-encrypted.
NoteIf you select SSL-encrypted, you must enable SSL encryption for the ApsaraDB RDS for MySQL instance. For more information, see Configure the SSL encryption feature.
The following table describes the parameters of the destination database.
Parameter
Description
Database Type
The database type. Valid values: AnalyticDB for MySQL 3.0, AnalyticDB PostgreSQL, and ClickHouse.
Access Method
The access method. Set the value to Alibaba Cloud Instance.
Instance Region
The ID of the cluster in which the destination database is created.
Instance ID
The ID of the cluster in which the destination database is created.
Database account
The username of the database account for the cluster in which the destination database is created.
Database Password
The password of the database account for the cluster in which the destination database is created.
After you configure the preceding parameters, click Test Connectivity and Proceed. In the Configure Zero-ETL step, configure the parameters. The following table describes the parameters.
Parameter/Section
Description
Synchronization Type
The type of the data synchronization between the source and destination databases.
Valid values: Schema Synchronization, Full Data Synchronization, and Incremental Data Synchronization.
NoteFor synchronization tasks, Incremental Data Synchronization is automatically selected and cannot be cleared. Incremental data of the source database is synchronized to the destination database in real time.
If you do not want to synchronize incremental data, use the data migration method.
DDL and DML Operations to Be Synchronized
The DML operations and DDL operations that you want to synchronize. The DML operations include INSERT, UPDATE, and DELETE. The DDL operations include CREATE, ALTER, DROP, RENAME, and TRUNCATE. By default, all operations are selected.
Source Objects and Selected Objects
The objects in the source database and the objects that you want to synchronize.
Advanced Settings (Optional)
The retry time for failed connections between the source and destination databases and the retry time for other issues that occur on the source and destination databases.
After you configure the preceding parameters, click Next: Configure Database and Table Fields. In the Configure Database and Table Fields step, configure the required parameters. The following table describes the parameters.
Parameter
Description
Database Name
The name of the selected database.
Table Name
The name of the selected table.
Primary Key Column
The primary key column of the selected table.
Distribution Key
The distribution key column of the selected table.
Types
The type of the table, which can be a partitioned table or a replicated table.
Definition Status
The status of the selected table. After you configure the table fields, the status of the table changes from Undefined to Defined.
After you configure the preceding parameters, click Next: Save Task Settings and Precheck.
If the precheck is successful, click Start to start the zero-ETL task.
On the Zero-ETL tab, you can click the Source tab to view the Name, Source/Destination, and Status parameters of the zero-ETL task.