AnalyticDB for MySQL provides a seamless integration (Zero-ETL) feature. This feature lets you create data synchronization pipelines from RDS for MySQL to AnalyticDB for MySQL. It helps you manage data synchronization in a single location and integrates transaction processing with data analytics.
Overview
In the era of big data, businesses have data scattered across different systems and platforms. To manage and use this data effectively, companies often rely on extract, transform, and load (ETL) tools for centralized data management.
ETL is the process of extracting, transforming, and loading data from upstream business systems into a data warehouse. The goal is to consolidate scattered upstream data into the destination data warehouse. Further computation and analysis of this data in the data warehouse helps businesses make effective decisions.
Traditional ETL processes often face the following challenges:
Increased resource costs: Different data sources may require different ETL tools. Building ETL pipelines increases resource costs.
Increased system complexity: You must maintain the ETL tools yourself. This increases operations and maintenance (O&M) complexity and prevents you from focusing on business application development.
High data latency: Some ETL processes involve periodic batch updates. In near-real-time application scenarios, this prevents the rapid generation of analysis results.
To solve these problems, Alibaba Cloud ApsaraDB provides the seamless integration (Zero-ETL) feature. This feature lets you quickly build data synchronization pipelines between online transactional processing (OLTP) systems and online analytical processing (OLAP) data warehouses. Data from OLTP systems is automatically extracted, transformed, and loaded into OLAP data warehouses. This provides a one-stop solution for data synchronization and management. It integrates transaction processing with data analytics and allows customers to focus on their data analytics business.
Benefits
Easy to use: You do not need to create and maintain complex data pipelines for ETL operations. You can simply select the source data and the destination instance to automatically create a real-time data synchronization pipeline. This reduces the challenges of building and managing data pipelines, allowing you to focus on developing upstream applications.
Zero cost: Zero-ETL pipelines are free of charge. You can analyze upstream data in AnalyticDB for MySQL at no extra cost.
Multi-source aggregation: You can use Zero-ETL pipelines to synchronize data from multiple instances to a single AnalyticDB for MySQL cluster in real time. This provides a global perspective for analytics.
Supported synchronization pipelines
Pipelines from RDS for MySQL to AnalyticDB for MySQL.
Pipelines from PolarDB for MySQL to AnalyticDB for MySQL. For more information, see Synchronize data using seamless integration (Zero-ETL).
Prerequisites
The AnalyticDB for MySQL cluster and the RDS for MySQL instance are in the same region.
A database account is created for the AnalyticDB for MySQL cluster and the RDS for MySQL instance.
Billing
Data synchronization pipelines are free of charge.
Precautions
The seamless integration (Zero-ETL) feature is available only in the following regions: China (Beijing), China (Hangzhou), China (Shanghai), China (Shenzhen), China (Zhangjiakou), China (Qingdao), China (Guangzhou), Hong Kong (China), Singapore, US (Silicon Valley), and US (Virginia).
If the number of Zero-ETL pipelines for an AnalyticDB for MySQL cluster exceeds the limit, you cannot create new pipelines. You can use DTS to create a new synchronization pipeline, or delete unused Zero-ETL pipelines to free up capacity. The limits on the number of Zero-ETL pipelines are as follows:
If the total number of reserved AnalyticDB Compute Units (ACUs) for a cluster is less than 24 ACU, you can create one Zero-ETL pipeline.
If the total number of reserved ACUs for a cluster is 24 ACU or greater, you can create
3 + 3 * [(Total ACUs - 24) / 50]Zero-ETL pipelines.The result of
[(Total ACUs - 24) / 50]is rounded down to the nearest integer. For example, if a cluster has a total of 48 ACUs, the result of the calculation is 0.48. This value is rounded down to 0, which means you can create three Zero-ETL pipelines.
NoteFor a Data Lakehouse Edition cluster, the total number of reserved ACUs is the sum of reserved computing resources and reserved storage resources in ACUs.
For a Data Warehouse Edition cluster, one core is equivalent to one ACU. In Elastic mode, the total number of reserved ACUs is the sum of the cores of computing resources and elastic I/O units. In Reserved mode, the total number of reserved ACUs is the number of cores of computing resources.
Preparations
Create a service-linked role for AnalyticDB for MySQL
Navigate to the Roles list in the Resource Access Management (RAM) console.
Check whether a service-linked role named AliyunServiceRoleForAnalyticDBForMySQL exists in the role list. If not, create the role.
In the navigation pane on the left, click Create Role.
In the Create Role dialog box, select Alibaba Cloud Service, and click Next.
Set Role Type to Service-Linked Role, and select AnalyticDB for MySQL.
Click OK. Return to the role list and verify that the service-linked role is created.
Grant management permissions to a RAM user
A Resource Access Management (RAM) user needs the following two permissions to create and manage Zero-ETL pipelines.
Grant a RAM user management permissions for AnalyticDB for MySQL
A RAM user must have the AliyunADBFullAccess management permission for AnalyticDB for MySQL to create and manage Zero-ETL pipelines for AnalyticDB for MySQL. For more information, see Manage RAM user permissions.
Grant a RAM user management permissions for Zero-ETL
To create and manage Zero-ETL pipelines, a RAM user needs permissions for both the data source instance and the destination instance (AnalyticDB for MySQL). You can create a custom policy to allow the creation of pipelines for all or specified instances of the data source and AnalyticDB for MySQL. Then, attach the custom policy to the RAM user.
The script for the custom policy is as follows:
Grant permissions for all RDS for MySQL instances and AnalyticDB for MySQL clusters
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": "dts:*",
"Resource": [
"acs:adb:*:*:*",
"acs:rds:*:*:*",
]
},
{
"Effect": "Allow",
"Action": [
"dts:DescribeRegions",
"dts:DescribeConfigRelations",
"dts:DescribeSrcLinkConfig",
"dts:DescribeDestLinkConfig",
"dts:DescribeLinkConfig"
],
"Resource": [
"acs:dts:*:*:*"
]
}
]
}Grant permissions for specified RDS for MySQL instances and AnalyticDB for MySQL clusters
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": "dts:*",
"Resource": [
"acs:adb:*:*:dbcluster/am-2zeod8ax4b9a****",
"acs:rds:*:*:dbinstance/rm-2ze6fs8ouh43****",
]
},
{
"Effect": "Allow",
"Action": [
"dts:DescribeRegions",
"dts:DescribeConfigRelations",
"dts:DescribeSrcLinkConfig",
"dts:DescribeDestLinkConfig",
"dts:DescribeLinkConfig"
],
"Resource": "acs:dts:*:*:*"
}
]
}Procedure
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. Find the cluster that you want to manage and click the cluster ID.
Navigate to the seamless integration (Zero-ETL) page. The navigation path varies depending on the product edition.
For Enterprise Edition, Basic Edition, or Data Lakehouse Edition clusters: In the navigation pane on the left, choose .
For Data Warehouse Edition clusters: In the navigation pane on the left, choose .
Click Create Zero-ETL Task. On the Create Zero-ETL Task page, configure the source and destination database information.
Configure the source database information:
Source Database Information
Description
Task Name
The name of the Zero-ETL task.
Database Type
Select RDS for MySQL.
Access Method
Only Alibaba Cloud Instance is supported.
Instance Region
The region where the RDS for MySQL instance resides.
RDS Instance ID
The ID of the RDS for MySQL instance.
Database Account
The database account of the RDS for MySQL instance.
Database Password
The password for the database account of the RDS for MySQL instance.
Encryption
The method to connect to the RDS for MySQL instance. Options include Unencrypted Connection and SSL Secure Connection.
NoteIf you set this parameter to SSL-encrypted, you must enable SSL encryption for the RDS for MySQL instance in advance. For more information, see Use a cloud certificate to quickly enable SSL encryption.
Configure the destination database information:
Destination Database Information
Description
Database Type
Currently, only AnalyticDB for MySQL 3.0 is supported as the destination database.
Access Method
Only Alibaba Cloud Instance is supported.
Instance Region
The region where the AnalyticDB for MySQL cluster resides.
Instance ID
The ID of the AnalyticDB for MySQL cluster.
Database Account
The database account of the AnalyticDB for MySQL cluster.
Database Password
The password for the database account of the AnalyticDB for MySQL cluster.
After you configure the preceding parameters, click Test Connectivity and Proceed. On the Zero-ETL configuration page, configure the following parameters:
Configuration Item
Description
DDL and DML Operations to Be Synchronized
Select the DML and DDL operations to synchronize. By default, all operations are selected.
Source Objects and Selected Objects
The source objects and the objects to be synchronized in the database.
Advanced Settings (Optional)
Set the retry time for when the source or destination database cannot be connected and for when other issues occur with the source or destination database.
After you configure the preceding parameters, click Next: Configure Database and Table Fields, and configure the following information:
Database, Table, and Field Configuration
Description
Database Name
Select an existing database.
Table Name
Select an existing data table.
Primary Key Column
The field that serves as the primary key in the selected data table.
Distribution Key
The field that serves as the distribution key in the selected data table.
Type
The type of the data table. Options include Partitioned Table and Replicated Table.
Definition Status
After the database, table, and field configurations are defined, the status changes from Undefined to Defined.
After you configure all the preceding parameters, click Next: Save Task Settings and Precheck.
After the precheck passes, click Start to start the Zero-ETL task.
On the Seamless Data Integration page, you can view information about the Zero-ETL task, such as its ID/Name, Source/Destination, and Status.
Monitor and configure alerts for Zero-ETL tasks
After you create and start a Zero-ETL task, you can set alert rules for the task in the Cloud Monitor console to monitor its running status in real time. The procedure is as follows.
Log on to the Cloud Monitor console.
View monitoring information.
In the navigation pane on the left, choose .
Hover over the AnalyticDB for MySQL card and click AnalyticDB for mysql 3.0 - ZeroETL Latency.
Click a cluster ID to view the monitoring information for the Zero-ETL tasks in that cluster.
Create an alert rule.
In the navigation pane on the left, click .
Click Create Alert Rule. In the Create Alert Rule panel, configure an alert rule for the Zero-ETL task. For more information, see Create an alert rule.
NoteSet Product to AnalyticDB for MySQL 3.0 - ZeroETL Latency.
Create a subscription policy.
In the navigation pane on the left, click .
Click Create Subscription Policy. On the Create Subscription Policy page, configure a subscription policy for the Zero-ETL task. For more information, see Create an event subscription.
NoteProduct: Select AnalyticDB for MySQL 3.0.
Event Type: Select Abnormal or Restore.
Event Name: Select ZeroETL Task Abnormal or ZeroETL Task Recovered.