You can use MySQL Reader to read data from tables in your MySQL database in real time by subscribing to binary logs. This topic describes how to configure MySQL Reader. This topic also describes the network environment and permissions that you must prepare before you configure MySQL Reader.
Prerequisites
Before you configure MySQL Reader, make sure that the following operations are performed:
An account that is used to access a MySQL database is created, and required permissions are granted to the account. For more information, see Prepare an account that has the required permissions.
A real-time synchronization node for a MySQL data source accesses the specified MySQL database by using the account that is configured when you add the MySQL data source to DataWorks. You must make sure that the account is granted the following permissions on the ApsaraDB RDS for MySQL database:
SELECT
,REPLICATION SLAVE
, andREPLICATION CLIENT
.The binary logging feature is enabled for the ApsaraDB RDS for MySQL instance that you want to add to DataWorks as a MySQL data source. For more information, see Enable the binary logging feature for an ApsaraDB RDS for MySQL instance.
Real-time synchronization of incremental data from MySQL is performed based on real-time subscription to MySQL binary logs. Before you configure a real-time synchronization node to synchronize incremental data from an ApsaraDB RDS for MySQL instance, you must enable the binary logging feature for the instance.
An exclusive resource group for Data Integration that meets your business requirements is purchased. For more information, see Create and use an exclusive resource group for Data Integration.
A network connection between the exclusive resource group for Data Integration and the MySQL data source is established. For more information, see Network connectivity solutions.
Limits
Data Integration does not support real-time synchronization of data from a read-only ApsaraDB RDS for MySQL instance by using MySQL Reader.
Real-time synchronization of incremental data from MySQL is performed based on real-time subscription to MySQL binary logs. Real-time data synchronization from MySQL supports only ApsaraDB RDS for MySQL data sources that run MySQL
5.X
or8.X
. If you want to run a real-time synchronization node to synchronize data from a PolarDB-X 1.0 data source that runs MySQL, you can refer to operations described in Add a DRDS data source to add a PolarDB-X 1.0 data source to DataWorks and use the PolarDB-X 1.0 data source for the node.Real-time data synchronization does not support functional indexes.
Configure MySQL Reader
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
In the Scheduled Workflow pane of the DataStudio page, move the pointer over the
icon and choose .
Alternatively, find the desired workflow in the Scheduled Workflow pane, right-click the workflow name, and then choose
.In the Create Node dialog box, set the Sync Method parameter to End-to-end ETL and configure the Name and Path parameters.
ImportantThe node name cannot exceed 128 characters in length and can contain only letters, digits, underscores (_), and periods (.).
Click Confirm.
On the configuration tab of the real-time synchronization node, drag MySQL in the Input section to the canvas on the right.
Click the MySQL node. In the panel that appears, configure the parameters.
Parameter
Description
Data source
The name of the MySQL data source that you added to DataWorks. You can select only a MySQL data source.
If no data source is available, click New data source to go to the Data Sources page in Management Center to add a MySQL data source.
Table
The name of the table from which you want to read data. You can click Data preview on the right to preview the selected table.
You can run a real-time synchronization node to synchronize data from tables in sharded MySQL databases. In this case, MySQL Reader can read data from the databases and tables at the same time in real time.
ImportantTo prevent errors, make sure that the tables use the same schema.
Output Fields
The fields that you want to synchronize. The fields include management fields and data fields.
Management fields: the additional fields that are automatically added when the fields in the source tables are synchronized to the destination. These fields facilitate data management, sorting, and deduplication.
Data fields: the fields in the source tables that you want to synchronize.
For more information, see Fields used for real-time synchronization.
MySQL Reader supports the sharding feature. To use this feature, click Add Sharded Databases and Tables and select multiple data sources and tables from the drop-down lists to synchronize data from the databases and tables at the same time.
ImportantTo prevent errors, make sure that the selected tables use the same schema.
In the top toolbar of the configuration tab of the real-time synchronization node, click the
icon to save the node.