DataWorks provides DRDS (PolarDB-X) Reader and DRDS (PolarDB-X) Writer for you to read data from and write data to DRDS (PolarDB-X) data sources. This topic describes the capabilities of synchronizing data from or to DRDS (PolarDB-X) data sources.
Limits
Batch data read and write
DRDS (PolarDB-X) Reader supports only MySQL engines. DRDS (PolarDB-X) is a distributed MySQL database service. Most communication protocols that are used by DRDS (PolarDB-X) are the same as the communication protocols that are used by MySQL.
Only exclusive resource groups for Data Integration can be used to read data from DRDS (PolarDB-X) instances that run MySQL 8.0.
DRDS (PolarDB-X) Writer connects to the proxy of a remote DRDS (PolarDB-X) database by using Java Database Connectivity (JDBC) and executes the REPLACE INTO
statement to write data to the DRDS (PolarDB-X) database.
To execute the REPLACE INTO
statement, you must make sure that your table has the primary key or a unique index to prevent duplicate data.
DRDS (PolarDB-X) Writer obtains data from a reader and executes the REPLACE INTO
statement to write the data to the destination database. If no primary key conflict or unique index conflict occurs, data is processed in the same way as that when you execute the INSERT INTO
statement. If a conflict occurs, data in conflicting rows in the destination table is replaced by new data. DRDS (PolarDB-X) Writer sends data to the DRDS (PolarDB-X) proxy when the amount of buffered data reaches a specific threshold. The proxy determines whether to write the data to one or more tables and how to route the data when the data is written to multiple tables.
Note
A synchronization task that uses DRDS (PolarDB-X) Writer must have at least the permissions to execute the REPLACE INTO
statement. Whether other permissions are required depends on the SQL statements that you specify in the preSql and postSql parameters when you configure the task.
Data of views can be read.
Real-time data read
Only the data sources that are added in DataWorks in Alibaba Cloud instance mode are supported. If you add a data source in connection string mode and use the data source in a data synchronization task, the task fails.
The storage type must be PolarDB for MySQL or ApsaraDB RDS (excluding ApsaraDB RDS for MySQL). ApsaraDB RDS can be used only for existing DRDS (PolarDB-X) instances and cannot be used for newly purchased DRDS (PolarDB-X) instances.
Real-time synchronization of data on which XA ROLLBACK statements are executed is not supported.
For transaction data on which XA PREPARE statements are executed, you can use the real-time synchronization feature to synchronize the data to a destination. If XA ROLLBACK statements are executed later on the data, the rollback changes to the data cannot be synchronized to the destination. If the tables that you want to synchronize include tables on which XA ROLLBACK statements are executed, you must remove the tables on which XA ROLLBACK statements are executed from the real-time synchronization task and add the removed tables back to synchronize data.
Data type mappings
Most DRDS (PolarDB-X) data types are supported. Make sure that the data types of your database are supported.
The following table lists the data type mappings based on which DRDS (PolarDB-X) Reader or DRDS (PolarDB-X) Writer converts data types.
Category | DRDS (PolarDB-X) data type |
Category | DRDS (PolarDB-X) data type |
Integer | INT, TINYINT, SMALLINT, MEDIUMINT, and BIGINT |
Floating point | FLOAT, DOUBLE, and DECIMAL |
String | VARCHAR, CHAR, TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT |
Date and time | DATE, DATETIME, TIMESTAMP, TIME, and YEAR |
Boolean | BIT and BOOLEAN |
Binary | TINYBLOB, MEDIUMBLOB, BLOB, LONGBLOB, and VARBINARY |
Preparations before data synchronization
You need to create an account that is used to log on to the DRDS (PolarDB-X) databases for subsequent operations and grant the required permissions to the account. For more information, see Create an account.
Develop a data synchronization task
For information about the entry point for and the procedure of configuring a data synchronization task, see the following sections. For information about the parameter settings, view the infotip of each parameter on the configuration tab of the task.
Add a data source
Before you configure a data synchronization task to synchronize data from or to a specific data source, you must add the data source to DataWorks. For more information, see Add and manage data sources.
Configure a batch synchronization task to synchronize data of a single table
Configure synchronization settings to implement batch synchronization of all data in a database or real-time synchronization of full and incremental data in a single table or a database
For more information about the configuration procedure, see Configure a synchronization task in Data Integration.
Additional information
Consistent views
DRDS (PolarDB-X) cannot provide a consistent view for multiple tables in multiple databases because it is a distributed database service. DRDS (PolarDB-X) Reader obtains different snapshots from different table shards, but cannot obtain the snapshot of table shards at the same time slice. As a result, DRDS (PolarDB-X) Reader cannot ensure strong consistency for data queries.
Character encoding
DRDS (PolarDB-X) supports flexible encoding configurations. You can specify the encoding format for an entire instance and specific fields, tables, and databases. The configurations at the field, table, database, and instance levels are prioritized in descending order. We recommend that you use UTF-8 for a database.
DRDS (PolarDB-X) Reader uses JDBC to read data. This enables DRDS (PolarDB-X) Reader to automatically convert the encoding formats of characters. Therefore, you do not need to specify the encoding format.
If you specify an encoding format for a DRDS (PolarDB-X) database but data is written to the DRDS (PolarDB-X) database in a different encoding format, DRDS (PolarDB-X) Reader cannot recognize this inconsistency and may export garbled characters.
Incremental data synchronization
DRDS (PolarDB-X) Reader uses JDBC to connect to a database and uses a SELECT statement with a WHERE
clause to read incremental data.
For batch data, incremental add, update, and delete operations (including logically delete operations) are distinguished by timestamps. Specify the WHERE clause based on a specific timestamp. The time indicated by the timestamp must be later than the time indicated by the latest timestamp in the previous synchronization.
For streaming data, specify the WHERE clause based on the ID of a specific record. The ID must be greater than the maximum ID involved in the previous synchronization.
If the data that is added or modified cannot be distinguished, DRDS (PolarDB-X) Reader can read only full data.
You cannot configure filter conditions for physical tables in the WHERE clause.
Appendix: Code and parameters
Appendix: Configure a batch synchronization task by using the code editor
If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader and writer of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader and writer in the code editor.
Code for DRDS (PolarDB-X) Reader
{
"type":"job",
"version":"2.0",
"steps":[
{
"stepType":"drds",
"parameter":{
"datasource":"",
"column":[
"id",
"name"
],
"where":"",
"table":"",
"splitPk": ""
},
"name":"Reader",
"category":"reader"
},
{
"stepType":"stream",
"parameter":{},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"
},
"speed":{
"throttle":true,
"concurrent":1,
"mbps":"12"
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}
Parameters in code for DRDS (PolarDB-X) Reader
Parameter | Description | Required | Default value |
Parameter | Description | Required | Default value |
datasource | The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor. | Yes | No default value |
table | The name of the table from which you want to read data. | Yes | No default value |
column | The names of the columns from which you want to read data. Specify the names in a JSON array. The default value is [*], which indicates all the columns in the source table. You can select specific columns to read. The column order can be changed. This indicates that you can specify columns in an order different from the order specified by the schema of the source table. Constants are supported. The column names must be arranged in compliance with the SQL syntax supported by MySQL, such as ["id", "`table`", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3", "true"] . Description of the parameters in the preceding example: id: a column name. table: the name of a column that contains reserved keywords. 1: an integer constant. bazhen.csy: a string constant. null: a null pointer. to_char(a + 1): a function expression that is used to calculate the length of a string. 2.3: a floating-point constant. true: a Boolean value.
The column parameter must explicitly specify all the columns from which you want to read data. The parameter cannot be left empty.
| Yes | No default value |
where | The WHERE clause. DRDS (PolarDB-X) Reader generates an SQL statement based on the settings of the column, table, and where parameters, and uses the generated statement to read data. You can use the WHERE clause to read incremental data. If the where parameter is not configured or is left empty, DRDS (PolarDB-X) Reader reads full data.
For example, you can set this parameter to STRTODATE('${bdp.system.bizdate}','%Y%m%d') <= today AND today < DATEADD(STRTODATE('${bdp.system.bizdate}', '%Y%m%d'), interval 1 day) to read the data that is generated on the current day. | No | No default value |
Code for DRDS (PolarDB-X) Writer
{
"type":"job",
"version":"2.0",
"steps":[
{
"stepType":"stream",
"parameter":{},
"name":"Reader",
"category":"reader"
},
{
"stepType":"drds",
"parameter":{
"postSql":[],
"datasource":"",
"column":[
"id"
],
"writeMode":"insert ignore",
"batchSize":"1024",
"table":"test",
"preSql":[]
},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"
},
"speed":{
"throttle":true,
"concurrent":1,
"mbps":"12"
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}
Parameters in code for DRDS (PolarDB-X) Writer
Parameter | Description | Required | Default value |
Parameter | Description | Required | Default value |
datasource | The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor. | Yes | No default value |
table | The name of the table to which you want to write data. | Yes | No default value |
writeMode | The write mode. Valid values: insert ignore: If a primary key conflict or unique index conflict occurs, the source data cannot be written. replace into: If a primary key conflict or unique index conflict occurs, the original data is deleted, and new data is inserted.
| No | insert ignore |
column | The names of the columns to which you want to write data. Separate the names with commas (,), such as "column": ["id","name","age"]. If you want to write data to all the columns in the destination table, set this parameter to an asterisk (*), such as "column": ["*"]. | Yes | No default value |
preSql | The SQL statement that you want to execute before the synchronization task is run. You can execute only one SQL statement on the codeless UI and multiple SQL statements in the code editor. For example, you can set this parameter to delete * from table xxx; to delete data from the xxx table before data synchronization. You can configure this parameter based on your business requirements. | No | No default value |
postSql | The SQL statement that you want to execute after the synchronization task is run. You can execute only one SQL statement on the codeless UI and multiple SQL statements in the code editor. For example, you can set this parameter to delete * from table xxx where xx=xx; to delete specific data from the xxx table after data synchronization. You can configure this parameter based on your business requirements. | No | No default value |
batchSize | The number of data records to write at a time. Set this parameter to an appropriate value based on your business requirements. This greatly reduces the interactions between Data Integration and DRDS (PolarDB-X) and increases throughput. If you set this parameter to an excessively large value, an out of memory (OOM) error may occur during data synchronization. | No | 1,024 |