DataWorks provides DRDS (PolarDB-X) Reader and DRDS (PolarDB-X) Writer for you to read data from and write data to DRDS (PolarDB-X) data sources. This topic describes the capabilities of synchronizing data from or to DRDS (PolarDB-X) data sources.
Supported DRDS (PolarDB-X) versions
Real-time data read
Non-read-only instances of DRDS (PolarDB-X 1.0) and PolarDB-X 2.0 are supported. For more information about how to create a DRDS (PolarDB-X) instance, see Create a PolarDB-X 1.0 instance or Create a PolarDB-X 2.0 instance.
Limits
Batch data read and write
DRDS (PolarDB-X) Reader supports only MySQL engines. DRDS (PolarDB-X) is a distributed MySQL database service. Most communication protocols that are used by DRDS (PolarDB-X) are the same as the communication protocols that are used by MySQL.
Only exclusive resource groups for Data Integration can be used to read data from DRDS (PolarDB-X) instances that run MySQL 8.0.
DRDS (PolarDB-X) Writer connects to the proxy of a remote DRDS (PolarDB-X) database by using Java Database Connectivity (JDBC) and executes the
REPLACE INTO
statement to write data to the DRDS (PolarDB-X) database.To execute the
REPLACE INTO
statement, you must make sure that your table has the primary key or a unique index to prevent duplicate data.DRDS (PolarDB-X) Writer obtains data from a reader and executes the
REPLACE INTO
statement to write the data to the destination database. If no primary key conflict or unique index conflict occurs, data is processed in the same way as that when you execute theINSERT INTO
statement. If a conflict occurs, data in conflicting rows in the destination table is replaced by new data. DRDS (PolarDB-X) Writer sends data to the DRDS (PolarDB-X) proxy when the amount of buffered data reaches a specific threshold. The proxy determines whether to write the data to one or more tables and how to route the data when the data is written to multiple tables.NoteA synchronization task that uses DRDS (PolarDB-X) Writer must have at least the permissions to execute the
REPLACE INTO
statement. Whether other permissions are required depends on the SQL statements that you specify in the preSql and postSql parameters when you configure the task.Data of views can be read.
Real-time data read
Only the data sources that are added in DataWorks in Alibaba Cloud instance mode are supported. If you add a data source in connection string mode and use the data source in a data synchronization task, the task fails.
The storage type must be PolarDB for MySQL or ApsaraDB RDS (excluding ApsaraDB RDS for MySQL). ApsaraDB RDS can be used only for existing DRDS (PolarDB-X) instances and cannot be used for newly purchased DRDS (PolarDB-X) instances.
Real-time synchronization of data on which XA ROLLBACK statements are executed is not supported.
For transaction data on which XA PREPARE statements are executed, you can use the real-time synchronization feature to synchronize the data to a destination. If XA ROLLBACK statements are executed later on the data, the rollback changes to the data cannot be synchronized to the destination. If the tables that you want to synchronize include tables on which XA ROLLBACK statements are executed, you must remove the tables on which XA ROLLBACK statements are executed from the real-time synchronization task and add the removed tables back to synchronize data.
Data type mappings
Most DRDS (PolarDB-X) data types are supported. Make sure that the data types of your database are supported.
The following table lists the data type mappings based on which DRDS (PolarDB-X) Reader or DRDS (PolarDB-X) Writer converts data types.
Category | DRDS (PolarDB-X) data type |
Integer | INT, TINYINT, SMALLINT, MEDIUMINT, and BIGINT |
Floating point | FLOAT, DOUBLE, and DECIMAL |
String | VARCHAR, CHAR, TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT |
Date and time | DATE, DATETIME, TIMESTAMP, TIME, and YEAR |
Boolean | BIT and BOOLEAN |
Binary | TINYBLOB, MEDIUMBLOB, BLOB, LONGBLOB, and VARBINARY |
Preparations before data synchronization
You need to create an account that is used to log on to the DRDS (PolarDB-X) databases for subsequent operations and grant the required permissions to the account. For more information, see Create an account.
Develop a data synchronization task
For information about the entry point for and the procedure of configuring a data synchronization task, see the following sections. For information about the parameter settings, view the infotip of each parameter on the configuration tab of the task.
Add a data source
Before you configure a data synchronization task to synchronize data from or to a specific data source, you must add the data source to DataWorks. For more information, see Add and manage data sources.
Configure a batch synchronization task to synchronize data of a single table
For more information about the configuration procedure, see Configure a batch synchronization task by using the codeless UI and Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix: Code and parameters.
Configure synchronization settings to implement batch synchronization of all data in a database or real-time synchronization of full and incremental data in a single table or a database
For more information about the configuration procedure, see Configure a synchronization task in Data Integration.
Additional information
Consistent views
DRDS (PolarDB-X) cannot provide a consistent view for multiple tables in multiple databases because it is a distributed database service. DRDS (PolarDB-X) Reader obtains different snapshots from different table shards, but cannot obtain the snapshot of table shards at the same time slice. As a result, DRDS (PolarDB-X) Reader cannot ensure strong consistency for data queries.
Character encoding
DRDS (PolarDB-X) supports flexible encoding configurations. You can specify the encoding format for an entire instance and specific fields, tables, and databases. The configurations at the field, table, database, and instance levels are prioritized in descending order. We recommend that you use UTF-8 for a database.
DRDS (PolarDB-X) Reader uses JDBC to read data. This enables DRDS (PolarDB-X) Reader to automatically convert the encoding formats of characters. Therefore, you do not need to specify the encoding format.
If you specify an encoding format for a DRDS (PolarDB-X) database but data is written to the DRDS (PolarDB-X) database in a different encoding format, DRDS (PolarDB-X) Reader cannot recognize this inconsistency and may export garbled characters.
Incremental data synchronization
DRDS (PolarDB-X) Reader uses JDBC to connect to a database and uses a SELECT statement with a
WHERE
clause to read incremental data.For batch data, incremental add, update, and delete operations (including logically delete operations) are distinguished by timestamps. Specify the WHERE clause based on a specific timestamp. The time indicated by the timestamp must be later than the time indicated by the latest timestamp in the previous synchronization.
For streaming data, specify the WHERE clause based on the ID of a specific record. The ID must be greater than the maximum ID involved in the previous synchronization.
If the data that is added or modified cannot be distinguished, DRDS (PolarDB-X) Reader can read only full data.
You cannot configure filter conditions for physical tables in the WHERE clause.
Appendix: Code and parameters
Appendix: Configure a batch synchronization task by using the code editor
If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader and writer of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader and writer in the code editor.
Code for DRDS (PolarDB-X) Reader
{
"type":"job",
"version":"2.0",// The version number.
"steps":[
{
"stepType":"drds",// The plug-in name.
"parameter":{
"datasource":"",// The name of the data source.
"column":[// The names of the columns.
"id",
"name"
],
"where":"",// The WHERE clause.
"table":"",// The name of the table.
"splitPk": ""// The shard key.
},
"name":"Reader",
"category":"reader"
},
{
"stepType":"stream",// The plug-in name.
"parameter":{},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"// The maximum number of dirty data records allowed.
},
"speed":{
"throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true.
"concurrent":1,// The maximum number of parallel threads.
"mbps":"12"// The maximum transmission rate. Unit: MB/s.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}
Parameters in code for DRDS (PolarDB-X) Reader
Parameter | Description | Required | Default value |
datasource | The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor. | Yes | No default value |
table | The name of the table from which you want to read data. | Yes | No default value |
column | The names of the columns from which you want to read data. Specify the names in a JSON array. The default value is [*], which indicates all the columns in the source table.
| Yes | No default value |
where | The WHERE clause. DRDS (PolarDB-X) Reader generates an SQL statement based on the settings of the column, table, and where parameters, and uses the generated statement to read data.
For example, you can set this parameter to | No | No default value |
Code for DRDS (PolarDB-X) Writer
{
"type":"job",
"version":"2.0",// The version number.
"steps":[
{
"stepType":"stream",
"parameter":{},
"name":"Reader",
"category":"reader"
},
{
"stepType":"drds",// The plug-in name.
"parameter":{
"postSql":[],// The SQL statement that you want to execute after the synchronization task is run.
"datasource":"",// The name of the data source.
"column":[// The names of the columns.
"id"
],
"writeMode":"insert ignore",
"batchSize":"1024",// The number of data records to write at a time.
"table":"test",// The name of the table.
"preSql":[]// The SQL statement that you want to execute before the synchronization task is run.
},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"// The maximum number of dirty data records allowed.
},
"speed":{
"throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true.
"concurrent":1, // The maximum number of parallel threads.
"mbps":"12"// The maximum transmission rate. Unit: MB/s.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}
Parameters in code for DRDS (PolarDB-X) Writer
Parameter | Description | Required | Default value |
datasource | The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor. | Yes | No default value |
table | The name of the table to which you want to write data. | Yes | No default value |
writeMode | The write mode. Valid values:
| No | insert ignore |
column | The names of the columns to which you want to write data. Separate the names with commas (,), such as "column": ["id","name","age"]. If you want to write data to all the columns in the destination table, set this parameter to an asterisk (*), such as "column": ["*"]. | Yes | No default value |
preSql | The SQL statement that you want to execute before the synchronization task is run. You can execute only one SQL statement on the codeless UI and multiple SQL statements in the code editor. For example, you can set this parameter to | No | No default value |
postSql | The SQL statement that you want to execute after the synchronization task is run. You can execute only one SQL statement on the codeless UI and multiple SQL statements in the code editor. For example, you can set this parameter to | No | No default value |
batchSize | The number of data records to write at a time. Set this parameter to an appropriate value based on your business requirements. This greatly reduces the interactions between Data Integration and DRDS (PolarDB-X) and increases throughput. If you set this parameter to an excessively large value, an out of memory (OOM) error may occur during data synchronization. | No | 1,024 |