DataWorks provides ApsaraDB for OceanBase Reader and ApsaraDB for OceanBase Writer for you to read data from and write data to ApsaraDB for OceanBase data sources. You can configure data synchronization tasks for ApsaraDB for OceanBase data sources. This topic describes the capabilities of synchronizing data from or to ApsaraDB for OceanBase data sources.
Supported ApsaraDB for OceanBase versions
Batch data read and write
ApsaraDB for OceanBase Reader and ApsaraDB for OceanBase Writer connect to ApsaraDB for OceanBase databases by using an OceanBase database driver. Make sure that the driver version is compatible with the version of your ApsaraDB for OceanBase database. ApsaraDB for OceanBase Reader and ApsaraDB for OceanBase Writer use the OceanBase database driver of the following version:
<dependency>
<groupId>com.alipay.OceanBase</groupId>
<artifactId>OceanBase-connector-java</artifactId>
<version>3.1.0</version>
</dependency>
Limits
Batch data read
ApsaraDB for OceanBase supports the Oracle and MySQL tenant modes. Make sure that the WHERE clause and the columns that you specify in the column parameter comply with the SQL syntax constraints that Oracle or MySQL supports. Otherwise, the SQL statement may fail to be executed.
Data of views can be read.
When you use a batch synchronization task to read data from an ApsaraDB for OceanBase data source, do not modify the data that is being synchronized. Otherwise, data quality issues, such as data duplication or data loss, may occur.
Batch data write
INSERT INTO
: If a primary key conflict or unique index conflict occurs, data cannot be written to conflicting rows. In Oracle tenant mode, ApsaraDB for OceanBase supports only theINSERT INTO
write mode.INSERT INTO...ON DUPLICATE KEY UPDATE
: If no primary key conflict or unique index conflict occurs, data is processed in the same way as that inINSERT INTO
mode. If a primary key conflict or unique index conflict occurs, the specified fields in the rows in the destination table are updated. In MySQL tenant mode, ApsaraDB for OceanBase supports theINSERT INTO
andINSERT INTO...ON DUPLICATE KEY UPDATE
write modes.Data can be written to tables stored only in the primary database.
NoteA synchronization task that uses ApsaraDB for OceanBase Writer must have at least the permissions to execute
INSERT INTO
. Whether other permissions are required depends on the SQL statements that you specify in the preSql and postSql parameters when you configure the task.We recommend that you write data to the destination table in batches. ApsaraDB for OceanBase Writer submits a write request when the number of rows reaches a specific threshold.
ApsaraDB for OceanBase supports the Oracle and MySQL tenant modes. Make sure that the SQL statements that you specify in the preSql and postSql parameters comply with the related SQL syntax. Otherwise, the SQL statements may fail to be executed.
Real-time data read
ApsaraDB for OceanBase is a distributed relational database that can integrate data distributed in multiple physical databases into a unified logical database. However, you can synchronize data of only one physical AnalyticDB for MySQL to an AnalyticDB for MySQL cluster in real time.
For a real-time synchronization task used to synchronize data of a database, you cannot select a data source that is added in the connection string mode.
For a real-time synchronization task used to synchronize data of a ApsaraDB for OceanBase data source, the version of your ApsaraDB for OceanBase database must be V3.0 or later.
Prepare an ApsaraDB for OceanBase environment before data synchronization
Before you use DataWorks to synchronize data from or to an ApsaraDB for OceanBase data source, you must prepare an ApsaraDB for OceanBase environment. This ensures that a data synchronization task can be configured and can synchronize data from or to the ApsaraDB for OceanBase data source as expected. The following information describes how to prepare an ApsaraDB for OceanBase environment for data synchronization from or to an ApsaraDB for OceanBase data source.
Preparation 1: Configure an IP address whitelist
Add the elastic IP address (EIP) of the exclusive resource group for Data Integration and the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated to the whitelist of the ApsaraDB for OceanBase data source. To view and add the EIP and CIDR block to the whitelist, perform the following steps:
View and record the EIP and CIDR block of the exclusive resource group for Data Integration.
Log on to the DataWorks console.
In the left-side navigation pane, click Resource Groups.
On the Exclusive Resource Groups tab, find the exclusive resource group for Data Integration and click View Information in the Actions column.
In the Exclusive Resource Groups dialog box, view and record the values of the EIPAddress and CIDR Blocks parameters.
On the Exclusive Resource Groups tab, find the exclusive resource group for Data Integration and click Network Settings in the Actions column.
On the VPC Binding tab of the page that appears, view and record the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated.
Add the EIP of the exclusive resource group for Data Integration and the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated to the whitelist of the ApsaraDB for OceanBase data source. For more information, see Configure a whitelist.
Preparation 2: Prepare an account that has the required permissions
You must create an account to log on to the database of the ApsaraDB for OceanBase cluster. You must grant the required permissions to the account. For more information, see Create an account.
Develop a data synchronization task
For information about the entry point for and the procedure of configuring a data synchronization task, see the following sections. For information about the parameter settings, view the infotip of each parameter on the configuration tab of the task.
Add a data source
Before you configure a data synchronization task to synchronize data from or to a specific data source, you must add the data source to DataWorks. For more information, see Add and manage data sources.
Configure a batch synchronization task to synchronize data of a single table
For more information about the configuration procedure, see Configure a batch synchronization task by using the codeless UI and Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix: Code and parameters.
Configure a real-time synchronization task to synchronize data of a database
For more information about the configuration procedure, see Configure a real-time synchronization task in DataStudio.
Configure synchronization settings to implement (real-time) synchronization of full and incremental data in a single table or a database
For more information about the configuration procedure, see Configure a synchronization task in Data Integration.
Appendix: Code and parameters
Appendix: Configure a batch synchronization task by using the code editor
If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader and writer of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader and writer in the code editor.
Code for ApsaraDB for OceanBase Reader
{
"type": "job",
"steps": [
{
"stepType": "apsaradb_for_OceanBase", // The plug-in name.
"parameter": {
"datasource": "", // The name of the data source.
"where": "",
"column": [ // The names of the columns.
"id",
"name"
],
"splitPk": ""
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {
"print": false,
"fieldDelimiter": ","
},
"name": "Writer",
"category": "writer"
}
],
"version": "2.0",
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
},
"setting": {
"errorLimit": {
"record": "0" // The maximum number of dirty data records allowed.
},
"speed": {
"throttle": true, // Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true.
"concurrent": 1, // The maximum number of parallel threads.
"mbps":"12"// The maximum transmission rate. Unit: MB/s.
}
}
}
Parameters in code for ApsaraDB for OceanBase Reader
Parameter | Description | Required | Default value |
datasource | The name of the data source. It must be the same as the name of the added data source. You can connect to the ApsaraDB for OceanBase database based on the setting of the jdbcUrl or username parameter. | Yes | No default value |
jdbcUrl | The JDBC URL of the ApsaraDB for OceanBase database. You can specify multiple JDBC URLs in a JSON array for a database. If you specify multiple JDBC URLs, ApsaraDB for OceanBase Reader verifies the connectivity of the URLs in sequence to find a valid URL. If no URL is valid, ApsaraDB for OceanBase Reader returns an error. Note The jdbcUrl parameter must be included in the connection parameter. The value of the jdbcUrl parameter must comply with the standard format that is supported by ApsaraDB for OceanBase. You can also specify the information of the attachment facility. An example JDBC URL is | No | No default value |
username | The username that you can use to connect to the database. | No | No default value |
password | The password that you can use to connect to the database. | No | No default value |
table | The name of the table from which you want to read data. ApsaraDB for OceanBase Reader can read data from multiple tables. The tables are specified in a JSON array. If you specify multiple tables, make sure that the tables have the same schema. ApsaraDB for OceanBase Reader does not check whether the tables have the same schema. Note The table parameter must be included in the connection parameter. | Yes | No default value |
column | The names of the columns from which you want to read data. Specify the names in a JSON array. The default value is [ * ], which indicates all columns in the source table.
| Yes | No default value |
splitPk | The field that is used for data sharding when ApsaraDB for OceanBase Reader reads data. If you specify this parameter, the source table is sharded based on the value of this parameter. Data Integration then runs parallel threads to read data. This way, data can be synchronized more efficiently.
| No | No default value |
where | The WHERE clause. ApsaraDB for OceanBase Reader generates an SQL statement based on the column, table, and where parameters that you have configured and uses the generated statement to read data. For example, when you perform a test, you can set the where parameter to limit 10. To read the data that is generated on the current day, you can set the where parameter to
| No | No default value |
querySql | The SQL statement that is used for refined data filtering. If you specify this parameter, data is filtered based only on the value of this parameter. If you specify this parameter, ApsaraDB for OceanBase Reader ignores the settings of the table, column, where, and splitPk parameters. | No | No default value |
fetchSize | The number of data records to read at a time. This parameter determines the number of interactions between Data Integration and the database and affects read efficiency. Note If you set this parameter to a value greater than 2048, an out of memory (OOM) error may occur during data synchronization. | No | 1,024 |
Code for ApsaraDB for OceanBase Writer
{
"type":"job",
"version":"2.0",// The version number.
"steps":[
{
"stepType":"stream",
"parameter":{},
"name":"Reader",
"category":"reader"
},
{
"stepType":"apsaradb_for_OceanBase",// The plug-in name.
"parameter":{
"datasource": "The name of the data source",
"column": [// The names of the columns.
"id",
"name"
],
"table": "apsaradb_for_OceanBase_table",// The name of the table.
"preSql": [ // The SQL statement that you want to execute before the synchronization task is run.
"delete from @table where db_id = -1"
],
"postSql": [// The SQL statement that you want to execute after the synchronization task is run.
"update @table set db_modify_time = now() where db_id = 1"
],
"writeMode": "insert",
},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"// The maximum number of dirty data records allowed.
},
"speed":{
"throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true.
"concurrent":1, // The maximum number of parallel threads.
"mbps":"12"// The maximum transmission rate. Unit: MB/s.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}
Parameters in code for ApsaraDB for OceanBase Writer
Parameter | Description | Required | Default value |
datasource | The name of the data source. It must be the same as the name of the added data source. You can connect to the ApsaraDB for OceanBase database based on the setting of the jdbcUrl or username parameter. | No | No default value |
jdbcUrl | The JDBC URL of the ApsaraDB for OceanBase database. The jdbcUrl parameter must be included in the connection parameter.
| Yes | No default value |
username | The username that you can use to connect to the database. | Yes | No default value |
password | The password that you can use to connect to the database. | Yes | No default value |
table | The name of the table to which you want to write data. Specify the name in a JSON array. Note The table parameter must be included in the connection parameter. | Yes | No default value |
column | The names of the columns to which you want to write data. Separate the names with commas (,), such as Note The column parameter cannot be left empty. | Yes | No default value |
writeMode | The write mode. Valid values: | Yes | No default value |
preSql | The SQL statement that you want to execute before the synchronization task is run. Use | No | No default value |
postSql | The SQL statement that you want to execute after the synchronization task is run. | No | No default value |
batchSize | The number of data records to write at a time. Set this parameter to an appropriate value based on your business requirements. This greatly reduces the interactions between Data Integration and the database and increases throughput. Note If you set this parameter to a value greater than 2048, an OOM error may occur during data synchronization. | No | 1,024 |