DataWorks provides ApsaraDB for OceanBase Reader and ApsaraDB for OceanBase Writer for you to read data from and write data to ApsaraDB for OceanBase data sources. You can configure data synchronization tasks for ApsaraDB for OceanBase data sources. This topic describes the capabilities of synchronizing data from or to ApsaraDB for OceanBase data sources.
Supported ApsaraDB for OceanBase versions
Batch data read and write
ApsaraDB for OceanBase versions supported by ApsaraDB for OceanBase Reader and ApsaraDB for OceanBase Writer:
OceanBase 2.x
OceanBase 3.x
OceanBase 4.x
Limits
Batch data read
ApsaraDB for OceanBase supports the Oracle and MySQL tenant modes. Make sure that the WHERE clause and the columns that you specify in the column parameter comply with the SQL syntax constraints that Oracle or MySQL supports. Otherwise, the SQL statement may fail to be executed.
Data of views can be read.
When you use a batch synchronization task to read data from an ApsaraDB for OceanBase data source, do not modify the data that is being synchronized. Otherwise, data quality issues, such as data duplication or data loss, may occur.
Batch data write
Note
A synchronization task that uses ApsaraDB for OceanBase Writer must have at least the permissions to execute INSERT INTO
. Whether other permissions are required depends on the SQL statements that you specify in the preSql
and postSql
parameters when you configure the task.
We recommend that you write data to the destination table in batches. ApsaraDB for OceanBase Writer submits a write request when the number of rows reaches a specific threshold.
ApsaraDB for OceanBase supports the Oracle and MySQL tenant modes. Make sure that the SQL statements that you specify in the preSql and postSql parameters comply with the related SQL syntax. Otherwise, the SQL statements may fail to be executed.
Real-time data read
ApsaraDB for OceanBase is a distributed relational database that can integrate data distributed in multiple physical databases into a unified logical database. However, you can synchronize data of only one physical AnalyticDB for MySQL to an AnalyticDB for MySQL cluster in real time.
Note
For a real-time synchronization task used to synchronize data of a database, you cannot select a data source that is added in the connection string mode.
For a real-time synchronization task used to synchronize data of a ApsaraDB for OceanBase data source, the version of your ApsaraDB for OceanBase database must be V3.0 or later.
Prepare an ApsaraDB for OceanBase environment before data synchronization
Before you use DataWorks to synchronize data from or to an ApsaraDB for OceanBase data source, you must prepare an ApsaraDB for OceanBase environment. This ensures that a data synchronization task can be configured and can synchronize data from or to the ApsaraDB for OceanBase data source as expected. The following information describes how to prepare an ApsaraDB for OceanBase environment for data synchronization from or to an ApsaraDB for OceanBase data source.
Preparation 1: Configure an IP address whitelist
Add the elastic IP address (EIP) of the serverless resource group or exclusive resource group for Data Integration and the CIDR block of the vSwitch with which the serverless resource group or exclusive resource group for Data Integration is associated to the whitelist of the ApsaraDB for OceanBase data source. For more information, see Configure an IP address whitelist.
Preparation 2: Prepare an account that has the required permissions
You must create an account to log on to the database of the ApsaraDB for OceanBase cluster. You must grant the required permissions to the account. For more information, see Create an account.
Add a data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Add and manage data sources. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.
Develop a data synchronization task
For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.
Appendix: Code and parameters
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Code for ApsaraDB for OceanBase Reader
{
"type": "job",
"steps": [
{
"stepType": "apsaradb_for_OceanBase",
"parameter": {
"datasource": "",
"where": "",
"column": [
"id",
"name"
],
"splitPk": ""
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {
"print": false,
"fieldDelimiter": ","
},
"name": "Writer",
"category": "writer"
}
],
"version": "2.0",
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
},
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": true,
"concurrent": 1,
"mbps":"12"
}
}
}
Parameters in code for ApsaraDB for OceanBase Reader
Parameter | Description | Required | Default value |
datasource | The name of the data source. It must be the same as the name of the added data source. You can connect to the ApsaraDB for OceanBase database based on the setting of the jdbcUrl or username parameter. | Yes | No default value |
jdbcUrl | The JDBC URL of the ApsaraDB for OceanBase database. You can specify multiple JDBC URLs in a JSON array for a database. If you specify multiple JDBC URLs, ApsaraDB for OceanBase Reader verifies the connectivity of the URLs in sequence to find a valid URL. If no URL is valid, ApsaraDB for OceanBase Reader returns an error. Note The jdbcUrl parameter must be included in the connection parameter. The value of the jdbcUrl parameter must comply with the standard format that is supported by ApsaraDB for OceanBase. You can also specify the information of the attachment facility. An example JDBC URL is jdbc:OceanBase://127.0.0.1:3306/database. You must specify either jdbcUrl or username . | No | No default value |
username | The username that you can use to connect to the database. | No | No default value |
password | The password that you can use to connect to the database. | No | No default value |
table | The name of the table from which you want to read data. ApsaraDB for OceanBase Reader can read data from multiple tables. The tables are specified in a JSON array. If you specify multiple tables, make sure that the tables have the same schema. ApsaraDB for OceanBase Reader does not check whether the tables have the same schema. Note The table parameter must be included in the connection parameter. | Yes | No default value |
column | The names of the columns from which you want to read data. Specify the names in a JSON array. The default value is [ * ], which indicates all columns in the source table. You can select specific columns to read. The column order can be changed. This indicates that you can specify columns in an order different from the order specified by the schema of the source table. Constants are supported. Example: '123'. Functions are supported. Example: date('now'). The column parameter must explicitly specify all the columns that you want to synchronize. This parameter cannot be left empty.
| Yes | No default value |
splitPk | The field that is used for data sharding when ApsaraDB for OceanBase Reader reads data. If you specify this parameter, the source table is sharded based on the value of this parameter. Data Integration then runs parallel threads to read data. This way, data can be synchronized more efficiently. We recommend that you set the splitPk parameter to the name of a primary key column of the table. Data can be evenly distributed to different shards based on the primary key column, instead of being intensively distributed only to specific shards. The splitPk parameter supports sharding for data only of integer data types. If you set this parameter to a field of an unsupported data type, such as a string, floating point, or date data type, ApsaraDB for OceanBase Reader returns an error. If you leave the splitPk parameter empty, ApsaraDB for OceanBase Reader uses a single thread to read data.
| No | No default value |
where | The WHERE clause. ApsaraDB for OceanBase Reader generates an SQL statement based on the column, table, and where parameters that you have configured and uses the generated statement to read data. For example, when you perform a test, you can set the where parameter to limit 10. To read the data that is generated on the current day, you can set the where parameter to gmt_create > $bizdate . You can use the WHERE clause to read incremental data. If the where parameter is not provided or is left empty, all data is read.
| No | No default value |
querySql | The SQL statement that is used for refined data filtering. If you specify this parameter, data is filtered based only on the value of this parameter. If you specify this parameter, ApsaraDB for OceanBase Reader ignores the settings of the table, column, where, and splitPk parameters. | No | No default value |
fetchSize | The number of data records to read at a time. This parameter determines the number of interactions between Data Integration and the database and affects read efficiency. Note If you set this parameter to a value greater than 2048, an out of memory (OOM) error may occur during data synchronization. | No | 1,024 |
Code for ApsaraDB for OceanBase Writer
{
"type":"job",
"version":"2.0",
"steps":[
{
"stepType":"stream",
"parameter":{},
"name":"Reader",
"category":"reader"
},
{
"stepType":"apsaradb_for_OceanBase",
"parameter":{
"datasource": "The name of the data source",
"column": [
"id",
"name"
],
"table": "apsaradb_for_OceanBase_table",
"preSql": [
"delete from @table where db_id = -1"
],
"postSql": [
"update @table set db_modify_time = now() where db_id = 1"
],
"obWriteMode": "insert",
},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"
},
"speed":{
"throttle":true,
"concurrent":1,
"mbps":"12"
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}
Parameters in code for ApsaraDB for OceanBase Writer
Parameter | Description | Required | Default value |
datasource | The name of the data source. It must be the same as the name of the added data source. You can connect to the ApsaraDB for OceanBase database based on the setting of the jdbcUrl or username parameter. | No | No default value |
jdbcUrl | The JDBC URL of the ApsaraDB for OceanBase database. The jdbcUrl parameter must be included in the connection parameter. You can configure only one JDBC URL for a database. ApsaraDB for OceanBase Writer cannot write data to a database that has multiple primary databases. The value format of the jdbcUrl parameter must be in accordance with the official specifications of ApsaraDB for OceanBase. You can also specify additional JDBC connection properties in the value of this parameter. Example: jdbc:oceanbase://127.0.0.1:3306/database .
| Yes | No default value |
username | The username that you can use to connect to the database. | Yes | No default value |
password | The password that you can use to connect to the database. | Yes | No default value |
table | The name of the table to which you want to write data. Specify the name in a JSON array. Note The table parameter must be included in the connection parameter. | Yes | No default value |
column | The names of the columns to which you want to write data. Separate the names with commas (,), such as "column": ["id", "name", "age"] . Note The column parameter cannot be left empty. | Yes | No default value |
obWriteMode | Optional. The write mode. Valid values: insert: Corresponds to the insert into ... statement. If a primary key conflict or unique index conflict occurs, data cannot be written to conflicting rows. update: Corresponds to the ... on duplicate key update ... statement. This write mode is used in MySQL tenant mode. When a conflict occurs, data in conflicting rows is updated. merge: Corresponds to the merge into ... matched then update ... statement. This write mode is used in Oracle tenant mode. When a conflict occurs, data in conflicting rows is updated.
| No | insert |
onClauseColumns | Note This parameter is used in Oracle tenant mode and is required when the obWriteMode parameter is set to merge . If this parameter is left empty, the insert write mode is used. Set this parameter to the primary key field or unique constraint field. If you specify multiple fields, separate the fields with commas (,). Example: ID,C1 . | No | No default value |
obUpdateColumns | Note This parameter takes effect when the obWriteMode parameter is set to merge or update . The fields that need to be updated when a data write conflict occurs. If you specify multiple fields, separate the fields with commas (,). Example: c2,c3 . | No | All fields |
preSql | The SQL statement that you want to execute before the synchronization task is run. Use @table to indicate the name of the destination table in the SQL statement. When you execute this SQL statement, @table is replaced by the name of the destination table. | No | No default value |
postSql | The SQL statement that you want to execute after the synchronization task is run. | No | No default value |
batchSize | The number of data records to write at a time. Set this parameter to an appropriate value based on your business requirements. This greatly reduces the interactions between Data Integration and the database and increases throughput. Note If you set this parameter to a value greater than 2048, an OOM error may occur during data synchronization. | No | 1,024 |