Synchronize data from a Simple Log Service Logstore to an AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster in real time - AnalyticDB

AnalyticDB for MySQL Data Lakehouse Edition (V3.0) provides the data synchronization feature. You can use the feature to synchronize data from a Simple Log Service Logstore to an AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster in real time based on a specific offset. This feature helps meet your requirements for real-time analysis of log data.

Prerequisites

An AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster, a Simple Log Service project, and a Logstore that belongs to the project are created in the same region. For more information, see Create a cluster and Getting Started.
A job resource group is created in the AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster. For more information, see Create a resource group.
A database account is created for the AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster.
- If you use an Alibaba Cloud account, you must create a privileged account. For more information, see the "Create a privileged account" section of the Create a database account topic.
- If you use a Resource Access Management (RAM) user, you must create both a privileged account and a standard account and associate the standard account with the RAM user. For more information, see Create a database account and Associate or disassociate a database account with or from a RAM user.
A destination database and a destination table are created in the AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster. For more information, see CREATE TABLE.

Usage notes

You can synchronize data to one destination table in an AnalyticDB for MySQL cluster from only one Simple Log Service Logstore. To synchronize data from multiple Logstores, create multiple destination tables in the cluster.

Billing rules

The data synchronization feature is in public preview until May 30, 2024. You are not charged for using this feature to synchronize Simple Log Service data during public preview.

Procedure

Step 1: (Optional) Configure RAM authorization.
Step 2: Create a data source.
Step 3: Create a data synchronization job.
Step 4: Start the data synchronization job.
Step 5: Manage the data source.

(Optional) Configure RAM authorization

Note

If you want to synchronize Simple Log Service data within the same Alibaba Cloud account, you can skip this step and create a data source. For more information, see the "Create a data source" section of this topic.

If you want to synchronize Simple Log Service data to an AnalyticDB for MySQL cluster across Alibaba Cloud accounts, you must create a Resource Access Management (RAM) role on the data source, grant permissions to the RAM role by attaching a policy, and then edit the trust policy of the RAM role.

Create a RAM role. For more information, see Create a RAM role for a trusted Alibaba Cloud account.
Note
Select Other Alibaba Cloud Account for the Select Trusted Alibaba Cloud Account parameter and enter the ID of the Alibaba Cloud account to which the AnalyticDB for MySQL cluster belongs. You can log on to the Account Management console and go to the Overview page to view the account ID.
Attach the AliyunAnalyticDBAccessingLogRolePolicy policy to the RAM role. For more information, see the "Method 2: Grant permissions to a RAM role by clicking Precise Permission on the Roles page" section of the Grant permissions to a RAM role topic.
Edit the trust policy of the RAM role to allow the AnalyticDB for MySQL cluster in the specified Alibaba Cloud account to assume the RAM role. For more information, see Edit the trust policy of a RAM role.
```
{
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Effect": "Allow",
      "Principal": {
        "RAM": [
            "acs:ram::<Alibaba Cloud account ID>:root"
        ],
        "Service": [
            "<Alibaba Cloud account ID>@ads.aliyuncs.com"
        ]
      }
    }
  ],
  "Version": "1"
}
```
Note
In the preceding code, the Alibaba Cloud account ID parameter specifies the ID of the Alibaba Cloud account to which the AnalyticDB for MySQL cluster belongs entered in Step 1. When you specify this parameter, delete the angle brackets (<>).

Create a data source

Note

If you have created a Simple Log Service data source, skip this step and create a data synchronization job. For more information, see the "Create a data synchronization job" section of this topic.

Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition (V3.0) tab, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose Data Ingestion > Data Sources.
In the upper-right corner of the page, click Create Data Source.

On the Create Data Source page, configure the parameters that are described in the following table.

Parameter	Description
Data Source Type	The type of the data source. Select SLS.
Data Source Name	The name of the data source. By default, the system generates a name based on the data source type and the current time. You can modify the name based on your business requirements.
Data Source Description	The description of the data source. For example, you can enter the use scenario and business limits.
Deployment Mode	The deployment mode of the data source. Only Alibaba Cloud Instance is supported.
Region of Simple Log Service Project	The region in which the source Simple Log Service project resides. The region in which the source Simple Log Service project resides. Note Only the region in which the AnalyticDB for MySQL cluster resides is supported.
Across Alibaba Cloud Accounts	Specifies whether to synchronize Simple Log Service data across Alibaba Cloud accounts. You can synchronize data from Simple Log Service to AnalyticDB for MySQL clusters within the same Alibaba Cloud account or across Alibaba Cloud accounts. Valid values: No. Yes. If you select Yes, you must configure the Alibaba Cloud Account and RAM Role parameters. For information about how to configure Resource Access Management (RAM) authorization, see the "(Optional) Configure RAM authorization" section of this topic. Note Alibaba Cloud Account: the ID of the Alibaba Cloud account to which the Simple Log Service project belongs. RAM Role: the name of the RAM role in the Alibaba Cloud account to which the Simple Log Service project belongs. The RAM role is the role that is created in Step 1 of the "(Optional) Configure RAM Authorization" section of this topic.
Simple Log Service Project	The name of the source Simple Log Service project.
Simple Log Service Logstore	The name of the source Simple Log Service Logstore.

Click Create.

Create a data synchronization job

In the left-side navigation pane, click Data Synchronization.
In the upper-right corner of the page, click Create Synchronization Job.

On the Simple Log Service Data Source tab of the Create Synchronization Job page, configure the parameters in the Source and Destination Settings, Destination Database and Table Settings, and Synchronization Settings sections.

The following table describes the parameters in the Source and Destination Settings section.

Parameter	Description
Job Name	The name of the data synchronization job. By default, the system generates a name based on the data source type and the current time. You can modify the name based on your business requirements.
Data Source	The data source. You can select an existing Simple Log Service data source or create a data source. You can select an existing Simple Log Service data source or create a data source.
Destination Type	The data storage type. Select Data Warehouse - AnalyticDB for MySQL Storage.
AnalyticDB for MySQL Account	The database account of the AnalyticDB for MySQL cluster.
AnalyticDB for MySQL Password	The password of the database account of the AnalyticDB for MySQL cluster.

The following table describes the parameters in the Destination Database and Table Settings section.

Parameter	Description
Database Name	The name of the database in the AnalyticDB for MySQL cluster.
Table Name	The name of the table in the AnalyticDB for MySQL cluster.
Source Data Preview	Click View Latest 10 Logstore Data Entries to view the latest 10 data entries of the source Simple Log Service Logstore.
Schema Field Mapping	The mapping settings of the source and destination fields. The system automatically populates the source fields in the Source Field column and the destination fields in the Destination Table Field column.AnalyticDB for MySQL If the mappings between the fields in the Destination Table Field and Source Field columns are incorrect, modify the fields. For example, if the name of a field in the table of the AnalyticDB for MySQL cluster is name and the name of the corresponding field in the source Simple Log Service Logstore is user_name, the system automatically populates the name field in the Source Field and Destination Table Field columns. In this case, you must change the name field in the Source Field column to user_name.

The following table describes the parameters in the Synchronization Settings section.

Parameter	Description
Start Offset	The point in time from which the system consumes Simple Log Service data when the data synchronization job starts. For example, if you set the Start Offset parameter to 2024-04-09 13:10, the system consumes Simple Log Service data from the first entry that is generated as of 13:10 on April 9, 2024.
Dirty Data Processing Mode	The action to take on dirty data. During data synchronization, if the data types of the fields in the destination table do not match the data types of the fields in the source Simple Log Service Logstore that you want to synchronize, the synchronization fails. For example, if the value of a field in the source Simple Log Service Logstore is `abc` and the data type of the corresponding field in the destination table is `int`, the value cannot be converted into the required data type. In this case, a synchronization exception occurs. Valid values: Stop Synchronization (default): stops the data synchronization. You must modify the data types of the fields in the destination table or set the Dirty Data Processing Mode parameter to the other option, and then restart the synchronization job. Treat as Null: writes dirty data to the destination table as NULL values and discards the original dirty data. For example, if a source Simple Log Service Logstore has a total of 10 rows of data in which two rows contain dirty data, the values in the two rows are converted into NULL values and the other eight rows are written as expected.
Convert UNIX Timestamp into Datetime	Specifies whether to enable the conversion of a UNIX timestamp into a value of the DATETIME data type. If the value of a source field is a UNIX timestamp (such as 1710604800) and the data type of the corresponding destination field is DATETIME or TIMESTAMP, you must enable this feature to convert the timestamp. After you enable this feature, you can select a timestamp precision based on the precision of the source timestamps. Valid values: Timestamp Accurate to Seconds, Timestamp Accurate to Milliseconds, and Timestamp Accurate to Microseconds.

Click Submit.

Start the data synchronization job

On the Data Synchronization page, select the created data synchronization job and click Start in the Actions column.
In the upper-right corner of the page, click Search. If the status of the data synchronization job changes to Running, the job is started.

Manage the data source

On the Data Synchronization page, you can perform the operations described in the following table in the Actions column.

Operation	Description
Start	Starts the data synchronization job.
View Details	Views the details of the data synchronization job, including source and destination settings, run logs, and monitoring results.
Edit	Modifies the parameters of the job, such as Start Offset and Schema Field Mapping.
Pause	Pauses the data synchronization job. After you pause a data synchronization job, you can click Start to restart the job. Data synchronization automatically resumes from the point in time when the job was paused.
Delete	Deletes the data synchronization job. After you delete a data synchronization job, the job cannot be recovered. Proceed with caution.