MySQL data source

MySQL data source provides you with bidirectional channels for reading and writing MySQL data. This topic describes the MySQL data synchronization capabilities supported by DataWorks.

Supported MySQL versions

Offline read and write:
MySQL 5.5.x, MySQL 5.6.x, MySQL 5.7.x, and MySQL 8.0.x are supported, compatible with Amazon RDS for MySQL, Azure MySQL, and Amazon Aurora MySQL.
Offline synchronization supports reading from view tables.
Real-time reading:
Data Integration real-time reading of MySQL data is implemented based on real-time subscription to MySQL. Currently, it only supports real-time synchronization of MySQL 5.5.x, MySQL 5.6.x, MySQL 5.7.x, and MySQL 8.0.x versions (excluding new features in 8.0, such as functional index, only compatible with existing features), compatible with Amazon RDS for MySQL, Azure MySQL, and Amazon Aurora MySQL.
Important
If you need to synchronize DRDS MySQL, do not configure DRDS MySQL as a MySQL data source. You can refer to the Configure DRDS data source document to directly configure it as a DRDS data source.

Limits

Real-time reading

Synchronization of data from MySQL read-only instance is not supported.
Synchronization of tables containing Functional index is not supported.
XA ROLLBACK is not supported.
For transaction data that has been XA PREPARE, real-time synchronization will synchronize it to the target end. If XA ROLLBACK occurs, real-time synchronization will not perform rollback write operations for the XA PREPARE data. To handle XA ROLLBACK scenarios, you need to manually remove the XA ROLLBACK table from the real-time synchronization task, then add the table again and resynchronize.
Only MySQL server Binlog configuration format set to ROW is supported.
Real-time synchronization will not synchronize associated table records that are cascade deleted.
For Amazon Aurora MySQL databases, you need to connect to your primary/write database because AWS does not allow activating the Binlog feature on Aurora MySQL read replicas. Real-time synchronization tasks require Binlog to perform incremental updates.
Real-time synchronization of online DDL changes only supports adding columns (Add Column) to MySQL tables through Data Management (DMS).
Reading stored procedures from MySQL is not supported.

Offline reading

When MySQL Reader plugin synchronizes multiple tables such as in sharding scenarios, if you want to split a single table, the number of concurrent tasks must be greater than the number of tables. Otherwise, the number of split tasks equals the number of tables.
Reading stored procedures from MySQL is not supported.

Supported field types

For the complete list of field types for various MySQL versions, see the MySQL official documentation. The following table lists the current support status of major fields using MySQL 8.0.x as an example.

Field type	Offline reading (MySQL Reader)	Offline writing (MySQL Writer)	Real-time reading	Real-time writing

Field type	Offline reading (MySQL Reader)	Offline writing (MySQL Writer)	Real-time reading	Real-time writing
TINYINT
SMALLINT
INTEGER
BIGINT
FLOAT
DOUBLE
DECIMAL/NUMBERIC
REAL
VARCHAR
JSON
TEXT
MEDIUMTEXT
LONGTEXT
VARBINARY
BINARY
TINYBLOB
MEDIUMBLOB
LONGBLOB
ENUM
SET
BOOLEAN
BIT
DATE
DATETIME
TIMESTAMP
TIME
YEAR
LINESTRING
POLYGON
MULTIPOINT
MULTILINESTRING
MULTIPOLYGON
GEOMETRYCOLLECTION

Preparation for data synchronization: MySQL environment preparation

Before performing data synchronization on DataWorks, you need to prepare the MySQL environment for data synchronization according to this document to ensure normal service when configuring and executing MySQL data synchronization tasks on DataWorks. The following describes the relevant environment preparation for MySQL synchronization.

Confirm MySQL version

Data Integration has requirements for MySQL versions. You can refer to the Supported MySQL versions section above to check if the MySQL to be synchronized meets the version requirements. You can check the current MySQL database version using the following statement in the MySQL database.

SELECT version();

Configure account permissions

It is recommended that you plan and create a MySQL account dedicated to DataWorks for accessing data sources in advance. The operation is as follows.

Optional: Create an account.
For operation details, see Create MySQL account.
Configure permissions.
- Offline
  In offline synchronization scenarios:
  - When reading MySQL data offline, this account needs to have read (SELECT) permission for the synchronized tables.
  - When writing MySQL data offline, this account needs to have write (INSERT, DELETE, UPDATE) permissions for the synchronized tables.
- Real-time
  In real-time synchronization scenarios, this account needs to have SELECT, REPLICATION SLAVE, and REPLICATION CLIENT permissions for the database.
You can refer to the following commands to add permissions to the account, or directly grant the account SUPER permission. When using the following execution statements, please replace 'sync account' with the account created above.
```
-- CREATE USER 'sync account'@'%' IDENTIFIED BY 'password'; //Create a sync account and set a password to allow login from any host. % represents any host.
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'sync account'@'%'; //Grant SELECT, REPLICATION SLAVE, REPLICATION CLIENT permissions on the database to the sync account.
```
*.* means granting the above permissions to the sync account for all tables in all databases. You can also specify granting the above permissions to the sync account for specific tables in the target database. For example, to grant the above permissions to the sync account for the user table in the test database, you can use the statement GRANT SELECT, REPLICATION CLIENT ON test.user TO 'sync account'@'%';.
Note
The REPLICATION SLAVE statement is a global permission and cannot be specified to grant the sync account permissions for specific tables in the target database.

(Required only for real-time synchronization) Enable MySQL binlog

Data Integration achieves real-time incremental data synchronization through real-time subscription to MySQL Binlog. You need to enable the MySQL Binlog service before configuring synchronization on DataWorks. The operation is as follows:

Important

If Binlog is being consumed, it cannot be deleted by the database. If the real-time synchronization task runs with a delay, it may cause the source Binlog to be consumed for a long time. Please reasonably configure the task's delay alert and pay attention to the database's disk space in a timely manner.
Binlog should be retained for at least 72 hours or more to avoid data loss due to the disappearance of Binlog after a task failure, making it impossible to reset the position to before the failure occurred when restarting (in this case, only full offline synchronization can be used to fill in the data).

Check if Binlog is enabled.
- Use the following statement to check if Binlog is enabled.
```
SHOW variables LIKE "log_bin";
```
  When the return result is ON, it indicates that Binlog is enabled.
- If you use a standby database to synchronize data, you can also check if Binlog is enabled using the following statement.
```
SHOW variables LIKE "log_slave_updates";
```
  When the return result is ON, it indicates that Binlog is enabled on the standby database.
If the returned result does not match the above results:
- For open-source MySQL, refer to the MySQL official documentation to enable Binlog.
- For Alibaba Cloud RDS MySQL, refer to RDS MySQL log backup to enable Binlog.
- For Alibaba Cloud PolarDB MySQL, refer to Enable Binlog to enable Binlog.
Query the Binlog format.
Use the following statement to query the Binlog format.
```
SHOW variables LIKE "binlog_format";
```
Explanation of return results:
- Returns ROW, indicating that the enabled Binlog format is ROW.
- Returns STATEMENT, indicating that the enabled Binlog format is STATEMENT.
- Returns MIXED, indicating that the enabled Binlog format is MIXED.
Important
DataWorks real-time synchronization only supports synchronizing MySQL server Binlog configuration format set to ROW. If the return is not ROW, please modify the Binlog Format.
Check if the complete Binlog log is enabled.
Use the following statement to check if the complete Binlog log is enabled.
```
SHOW variables LIKE "binlog_row_image";
```
Explanation of return results:
- Returns FULL, indicating that the complete log is enabled for Binlog.
- Returns MINIMAL, indicating that the minimal log is enabled for Binlog, and the complete log is not enabled.
Important
DataWorks real-time synchronization only supports synchronizing data from MySQL servers that have enabled the complete Binlog log. If the query result returns non-FULL, please modify the binlog_row_image configuration.

OSS binlog read authorization configuration

When adding a MySQL data source, if the Configuration Mode is set to Alibaba Cloud Instance Mode, and the RDS MySQL instance region is in the same region as the DataWorks project space, you can enable Support OSS Binlog Reading. After enabling this, when the RDS Binlog cannot be accessed, it will attempt to obtain Binlog from OSS to avoid interruption of real-time synchronization tasks.

If the selected OSS Binlog Access Identity is Alibaba Cloud RAM User or Alibaba Cloud RAM Role, you need to configure account authorization as follows.

Alibaba Cloud RAM user.
1. Log in to the RAM Access Control-Users console and find the user that needs authorization. Specific operation:
2. Click Actions column's Add Permissions.
3. Configure the following key parameters, then click Confirm.
  - Resource Scope: Account level.
  - Policy: System policy.
  - Policy Name: AliyunDataWorksAccessingRdsOSSBinlogPolicy.
Alibaba Cloud RAM role.
1. Log in to the RAM Access Control-Roles console and create a RAM role. For specific operations, see Create a RAM role with a trusted entity of an Alibaba Cloud account.
  Key parameters:
  - Trusted Entity Type: Alibaba Cloud account.
  - Trusted Entity Name: Other Alibaba Cloud account, you need to fill in the Alibaba Cloud account that owns the DataWorks workspace.
  - Role Name: Custom.
2. Grant precise permissions to the created RAM role. For specific operations, see Grant permissions to a RAM role.
  Key parameters:
  - Policy: System policy.
  - Policy Name: AliyunDataWorksAccessingRdsOSSBinlogPolicy.
3. Modify the trust policy for the created RAM role. For specific operations, see Modify the trust policy of a RAM role.
```
{
    "Statement": [
        {
            "Action": "sts:AssumeRole",
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "@cdp.aliyuncs.com",
                    "@dataworks.aliyuncs.com"
                ]
            }
        }
    ],
    "Version": "1"
}
```

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Add and manage data sources. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.

Data synchronization task development: MySQL synchronization process guide

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Single table offline synchronization task configuration guide

For the operation process, see Configure offline synchronization tasks using the codeless UI, Configure offline synchronization tasks using the code editor.
For the full parameters and script demo in code editor mode, see Appendix: MySQL script demo and parameter description below.

Single table real-time synchronization task configuration guide

For the operation process, see DataStudio real-time synchronization task configuration.

Configuration guide for database-level synchronization such as offline database, real-time full and incremental database, and real-time sharding database

For the operation process, see Data Integration synchronization task configuration.

FAQ

For more common Data Integration issues, see Common issues in Data Integration.

Appendix: MySQL script demo and parameter description

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script demo

This document provides configuration examples for single database single table and sharding:

Note

The comments in the JSON examples in this document are only used to show the meaning of some important parameters. When configuring, please remove the comment content.

Configure single database single table

{
  "type": "job",
  "version": "2.0",//Version number.
  "steps": [
    {
      "stepType": "mysql",//Plugin name.
      "parameter": {
        "column": [//Column names.
          "id"
        ],
        "connection": [
          {
            "querySql": [
              "select a,b from join1 c join join2 d on c.id = d.id;"
            ],
            "datasource": ""//Data source name.
          }
        ],
        "where": "",//Filter condition.
        "splitPk": "",//Split key.
        "encoding": "UTF-8"//Encoding format.
      },
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "stream",
      "parameter": {},
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting": {
    "errorLimit": {
      "record": "0"//Number of error records.
    },
    "speed": {
      "throttle": true,//When throttle is false, the mbps parameter is ineffective, indicating no rate limiting; when throttle is true, it indicates rate limiting.
      "concurrent": 1,//Number of concurrent jobs.
      "mbps": "12"//Rate limit, here 1mbps = 1MB/s.
    }
  },
  "order": {
    "hops": [
      {
        "from": "Reader",
        "to": "Writer"
      }
    ]
  }
}

Configure sharding