All Products
Search
Document Center

DataWorks:ApsaraDB for OceanBase data source

Last Updated:Dec 10, 2024

DataWorks provides ApsaraDB for OceanBase Reader and ApsaraDB for OceanBase Writer for you to read data from and write data to ApsaraDB for OceanBase data sources. You can configure data synchronization tasks for ApsaraDB for OceanBase data sources. This topic describes the capabilities of synchronizing data from or to ApsaraDB for OceanBase data sources.

Supported ApsaraDB for OceanBase versions

Batch data read and write

ApsaraDB for OceanBase Reader and ApsaraDB for OceanBase Writer connect to ApsaraDB for OceanBase databases by using an OceanBase database driver. Make sure that the driver version is compatible with the version of your ApsaraDB for OceanBase database. ApsaraDB for OceanBase Reader and ApsaraDB for OceanBase Writer use the OceanBase database driver of the following version:

<dependency>
    <groupId>com.alipay.OceanBase</groupId>
    <artifactId>OceanBase-connector-java</artifactId>
    <version>3.1.0</version>
</dependency>

Limits

Batch data read

  • ApsaraDB for OceanBase supports the Oracle and MySQL tenant modes. Make sure that the WHERE clause and the columns that you specify in the column parameter comply with the SQL syntax constraints that Oracle or MySQL supports. Otherwise, the SQL statement may fail to be executed.

  • Data of views can be read.

  • When you use a batch synchronization task to read data from an ApsaraDB for OceanBase data source, do not modify the data that is being synchronized. Otherwise, data quality issues, such as data duplication or data loss, may occur.

Batch data write

  • INSERT INTO: If a primary key conflict or unique index conflict occurs, data cannot be written to conflicting rows. In Oracle tenant mode, ApsaraDB for OceanBase supports only the INSERT INTO write mode.

  • INSERT INTO...ON DUPLICATE KEY UPDATE: If no primary key conflict or unique index conflict occurs, data is processed in the same way as that in INSERT INTO mode. If a primary key conflict or unique index conflict occurs, the specified fields in the rows in the destination table are updated. In MySQL tenant mode, ApsaraDB for OceanBase supports the INSERT INTO and INSERT INTO...ON DUPLICATE KEY UPDATE write modes.

  • Data can be written to tables stored only in the primary database.

    Note

    A synchronization task that uses ApsaraDB for OceanBase Writer must have at least the permissions to execute INSERT INTO. Whether other permissions are required depends on the SQL statements that you specify in the preSql and postSql parameters when you configure the task.

  • We recommend that you write data to the destination table in batches. ApsaraDB for OceanBase Writer submits a write request when the number of rows reaches a specific threshold.

  • ApsaraDB for OceanBase supports the Oracle and MySQL tenant modes. Make sure that the SQL statements that you specify in the preSql and postSql parameters comply with the related SQL syntax. Otherwise, the SQL statements may fail to be executed.

Real-time data read

ApsaraDB for OceanBase is a distributed relational database that can integrate data distributed in multiple physical databases into a unified logical database. However, you can synchronize data of only one physical AnalyticDB for MySQL to an AnalyticDB for MySQL cluster in real time.

Note
  • For a real-time synchronization task used to synchronize data of a database, you cannot select a data source that is added in the connection string mode.

  • For a real-time synchronization task used to synchronize data of a ApsaraDB for OceanBase data source, the version of your ApsaraDB for OceanBase database must be V3.0 or later.

Prepare an ApsaraDB for OceanBase environment before data synchronization

Before you use DataWorks to synchronize data from or to an ApsaraDB for OceanBase data source, you must prepare an ApsaraDB for OceanBase environment. This ensures that a data synchronization task can be configured and can synchronize data from or to the ApsaraDB for OceanBase data source as expected. The following information describes how to prepare an ApsaraDB for OceanBase environment for data synchronization from or to an ApsaraDB for OceanBase data source.

Preparation 1: Configure an IP address whitelist

Add the elastic IP address (EIP) of the serverless resource group or exclusive resource group for Data Integration and the CIDR block of the vSwitch with which the serverless resource group or exclusive resource group for Data Integration is associated to the whitelist of the ApsaraDB for OceanBase data source. For more information, see Configure an IP address whitelist.

Preparation 2: Prepare an account that has the required permissions

You must create an account to log on to the database of the ApsaraDB for OceanBase cluster. You must grant the required permissions to the account. For more information, see Create an account.

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Add and manage data sources. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Configure a batch synchronization task to synchronize data of a single table

Configure a real-time synchronization task to synchronize data of a database

For more information about the configuration procedure, see Configure a real-time synchronization task in DataStudio.

Configure synchronization settings to implement (real-time) synchronization of full and incremental data in a single table or a database

For more information about the configuration procedure, see Configure a synchronization task in Data Integration.

Appendix: Code and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Code for ApsaraDB for OceanBase Reader

{
    "type": "job",
    "steps": [
        {
            "stepType": "apsaradb_for_OceanBase", // The plug-in name.
            "parameter": {
                "datasource": "", // The name of the data source.
                "where": "",
                "column": [ // The names of the columns.
                    "id",
                    "name"
                ],
                "splitPk": ""
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {
                "print": false,
                "fieldDelimiter": ","
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "version": "2.0",
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    },
    "setting": {
        "errorLimit": {
            "record": "0" // The maximum number of dirty data records allowed.
        },
        "speed": {
            "throttle": true, // Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent": 1, // The maximum number of parallel threads.
            "mbps":"12"// The maximum transmission rate. Unit: MB/s. 
        }
    }
}

Parameters in code for ApsaraDB for OceanBase Reader

Parameter

Description

Required

Default value

datasource

The name of the data source. It must be the same as the name of the added data source.

You can connect to the ApsaraDB for OceanBase database based on the setting of the jdbcUrl or username parameter.

Yes

No default value

jdbcUrl

The JDBC URL of the ApsaraDB for OceanBase database. You can specify multiple JDBC URLs in a JSON array for a database.

If you specify multiple JDBC URLs, ApsaraDB for OceanBase Reader verifies the connectivity of the URLs in sequence to find a valid URL.

If no URL is valid, ApsaraDB for OceanBase Reader returns an error.

Note

The jdbcUrl parameter must be included in the connection parameter.

The value of the jdbcUrl parameter must comply with the standard format that is supported by ApsaraDB for OceanBase. You can also specify the information of the attachment facility. An example JDBC URL is jdbc:OceanBase://127.0.0.1:3306/database. You must specify either jdbcUrl or username.

No

No default value

username

The username that you can use to connect to the database.

No

No default value

password

The password that you can use to connect to the database.

No

No default value

table

The name of the table from which you want to read data. ApsaraDB for OceanBase Reader can read data from multiple tables. The tables are specified in a JSON array.

If you specify multiple tables, make sure that the tables have the same schema. ApsaraDB for OceanBase Reader does not check whether the tables have the same schema.

Note

The table parameter must be included in the connection parameter.

Yes

No default value

column

The names of the columns from which you want to read data. Specify the names in a JSON array. The default value is [ * ], which indicates all columns in the source table.

  • You can select specific columns to read.

  • The column order can be changed. This indicates that you can specify columns in an order different from the order specified by the schema of the source table.

  • Constants are supported. Example: '123'.

  • Functions are supported. Example: date('now').

  • The column parameter must explicitly specify all the columns that you want to synchronize. This parameter cannot be left empty.

Yes

No default value

splitPk

The field that is used for data sharding when ApsaraDB for OceanBase Reader reads data. If you specify this parameter, the source table is sharded based on the value of this parameter. Data Integration then runs parallel threads to read data. This way, data can be synchronized more efficiently.

  • We recommend that you set the splitPk parameter to the name of a primary key column of the table. Data can be evenly distributed to different shards based on the primary key column, instead of being intensively distributed only to specific shards.

  • The splitPk parameter supports sharding for data only of integer data types. If you set this parameter to a field of an unsupported data type, such as a string, floating point, or date data type, ApsaraDB for OceanBase Reader returns an error.

  • If you leave the splitPk parameter empty, ApsaraDB for OceanBase Reader uses a single thread to read data.

No

No default value

where

The WHERE clause. ApsaraDB for OceanBase Reader generates an SQL statement based on the column, table, and where parameters that you have configured and uses the generated statement to read data.

For example, when you perform a test, you can set the where parameter to limit 10. To read the data that is generated on the current day, you can set the where parameter to gmt_create > $bizdate.

  • You can use the WHERE clause to read incremental data.

  • If the where parameter is not provided or is left empty, all data is read.

No

No default value

querySql

The SQL statement that is used for refined data filtering. If you specify this parameter, data is filtered based only on the value of this parameter.

If you specify this parameter, ApsaraDB for OceanBase Reader ignores the settings of the table, column, where, and splitPk parameters.

No

No default value

fetchSize

The number of data records to read at a time. This parameter determines the number of interactions between Data Integration and the database and affects read efficiency.

Note

If you set this parameter to a value greater than 2048, an out of memory (OOM) error may occur during data synchronization.

No

1,024

Code for ApsaraDB for OceanBase Writer

{
    "type":"job",
    "version":"2.0",// The version number. 
    "steps":[
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"apsaradb_for_OceanBase",// The plug-in name. 
            "parameter":{
                "datasource": "The name of the data source",
                "column": [// The names of the columns. 
                    "id",
                    "name"
                ],
                "table": "apsaradb_for_OceanBase_table",// The name of the table. 
                "preSql": [ // The SQL statement that you want to execute before the synchronization task is run. 
                    "delete from @table where db_id = -1"
                ],
                "postSql": [// The SQL statement that you want to execute after the synchronization task is run. 
                    "update @table set db_modify_time = now() where db_id = 1"
                ],
                "writeMode": "insert",
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed. 
        },
        "speed":{
            "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1, // The maximum number of parallel threads. 
            "mbps":"12"// The maximum transmission rate. Unit: MB/s. 
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Parameters in code for ApsaraDB for OceanBase Writer

Parameter

Description

Required

Default value

datasource

The name of the data source. It must be the same as the name of the added data source.

You can connect to the ApsaraDB for OceanBase database based on the setting of the jdbcUrl or username parameter.

No

No default value

jdbcUrl

The JDBC URL of the ApsaraDB for OceanBase database. The jdbcUrl parameter must be included in the connection parameter.

  • You can configure only one JDBC URL for a database. ApsaraDB for OceanBase Writer cannot write data to a database that has multiple primary databases.

  • The value format of the jdbcUrl parameter must be in accordance with the official specifications of ApsaraDB for OceanBase. You can also specify additional JDBC connection properties in the value of this parameter. Example: jdbc:oceanbase://127.0.0.1:3306/database.

Yes

No default value

username

The username that you can use to connect to the database.

Yes

No default value

password

The password that you can use to connect to the database.

Yes

No default value

table

The name of the table to which you want to write data. Specify the name in a JSON array.

Note

The table parameter must be included in the connection parameter.

Yes

No default value

column

The names of the columns to which you want to write data. Separate the names with commas (,), such as "column": ["id", "name", "age"].

Note

The column parameter cannot be left empty.

Yes

No default value

writeMode

The write mode. Valid values: INSERT INTO and ON DUPLICATE KEY UPDATE.

Yes

No default value

preSql

The SQL statement that you want to execute before the synchronization task is run. Use @table to indicate the name of the destination table in the SQL statement. When you execute this SQL statement, @table is replaced by the name of the destination table.

No

No default value

postSql

The SQL statement that you want to execute after the synchronization task is run.

No

No default value

batchSize

The number of data records to write at a time. Set this parameter to an appropriate value based on your business requirements. This greatly reduces the interactions between Data Integration and the database and increases throughput.

Note

If you set this parameter to a value greater than 2048, an OOM error may occur during data synchronization.

No

1,024