This topic was translated by AI and is currently in queue for revision by our editors. Alibaba Cloud does not guarantee the accuracy of AI-translated content. Request expedited revision

ApsaraDB for OceanBase data source

Updated at: 2025-02-21 18:05

DataWorks offers ApsaraDB for OceanBase Reader and Writer to facilitate data reading and writing to ApsaraDB for OceanBase data sources. You can set up data synchronization tasks for these sources. This topic describes the capabilities for data synchronization to and from ApsaraDB for OceanBase data sources.

Supported versions

Batch data read and write

The following OceanBase versions are supported by ApsaraDB for OceanBase Reader and Writer.

  • OceanBase 2.x

  • OceanBase 3.x

  • OceanBase 4.x

Limits

Batch data read

  • ApsaraDB for OceanBase operates in Oracle and MySQL tenant modes. When configuring where filter conditions or function columns in column, adhere to the SQL syntax of the corresponding tenant mode to avoid execution failures.

  • Data from views is readable.

  • Do not alter data during a batch synchronization task to prevent issues like data duplication or loss.

Batch data write

Note

The task requires at least the insert into... permission. Additional permissions depend on the statements in preSql and postSql configured for the task.

  • It is recommended to use the batch method for writing data. Initiate a write request when the number of rows meets a set threshold.

  • In Oracle and MySQL tenant modes of ApsaraDB for OceanBase, ensure that preSql and postSql configurations comply with the SQL syntax of the respective tenant mode to prevent execution failures.

Real-time data read

ApsaraDB for OceanBase, a distributed relational database, can integrate data from multiple physical databases into one logical database. However, real-time data synchronization from OceanBase to AnalyticDB for MySQL only supports single physical database data. Logical database data synchronization is not supported.

Note
  • Do not select a data source added in connection string mode for real-time synchronization tasks of a database.

  • The ApsaraDB for OceanBase database version must be V3.0 or later for real-time synchronization tasks of a database.

Preparations before data synchronization

To synchronize data with an ApsaraDB for OceanBase data source using DataWorks, prepare the ApsaraDB for OceanBase environment accordingly. The following steps outline the necessary preparations.

Preparation 1: Configure an IP address whitelist

Add the VPC CIDR block, where the Serverless resource group or the exclusive resource group for Data Integration is located, to OceanBase's whitelist. For more information, see how to add a whitelist.

Preparation 2: Prepare an account that has the required permissions

Plan for a database logon account with the necessary permissions for OceanBase operations. For details, see create an account.

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Add and manage data sources. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Configure a batch synchronization task to synchronize data of a single table

Configure a real-time synchronization task to synchronize data of a database

For operation details, see configure a real-time synchronization task on DataStudio.

Configure synchronization settings to implement (real-time) synchronization of full and incremental data in a single table or a database

For operation details, see configure a synchronization task on Data Integration.

Appendix: code and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Code for Reader

{
    "type": "job",
    "steps": [
        {
            "stepType": "apsaradb_for_OceanBase", // The plug-in name
            "parameter": {
                "datasource": "", // The name of the data source
                "where": "",
                "column": [ // The names of the columns
                    "id",
                    "name"
                ],
                "splitPk": ""
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {
                "print": false,
                "fieldDelimiter": ","
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "version": "2.0",
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    },
    "setting": {
        "errorLimit": {
            "record": "0" // The number of error records
        },
        "speed": {
            "throttle": true, // Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true.
            "concurrent": 1, // The maximum number of parallel threads
            "mbps":"12"// The maximum transmission rate. Unit: MB/s.
        }
    }
}

Parameters in code for Reader

Parameter

Description

Required

Default value

datasource

If the version of DataWorks that you use supports adding ApsaraDB for OceanBase data sources, you can reference the ApsaraDB for OceanBase data source that you added based on the data source name.

Includes two configuration methods: jdbcUrl and username.

Yes

None

jdbcUrl

The JDBC URL of the ApsaraDB for OceanBase database. Use a JSON array to describe the information, and support filling in multiple connection addresses for a database.

If you specify multiple JDBC URLs, ApsaraDB for OceanBase Reader verifies the connectivity of the URLs in sequence to find a valid URL.

If no URL is valid, ApsaraDB for OceanBase Reader returns an error.

Note

jdbcUrl must be included in the connection configuration unit.

jdbcUrl According to the official specifications of ApsaraDB for OceanBase, you can fill in connection attachment control information. For example, jdbc:oceanbase://127.0.0.1:3306/database, you need to choose between this and the username configuration method.

No

None

username

The username that you use to connect to the database.

No

None

password

The password that you use to connect to the database.

No

None

table

The name of the table from which you want to read data. Use a JSON array to describe the table names. You can read data from multiple tables.

If you specify multiple tables, make sure that the tables have the same schema. ApsaraDB for OceanBase Reader does not check whether the tables have the same schema.

Note

table must be included in the connection configuration unit.

Yes

None

column

The names of the columns from which you want to read data. Specify the names in a JSON array. The default value is [ * ], which indicates all columns in the source table.

  • The column order can be changed. This indicates that you can specify columns in an order different from the order specified by the schema of the source table.

  • You can select specific columns to read.

  • Constants can be configured. For example, '123'.

  • Function columns are supported. For example, date('now').

  • column must explicitly specify the collection of columns to be synchronized. It cannot be empty.

Yes

None

splitPk

When ApsaraDB for OceanBase Reader retrieves data, if you specify splitPk, it indicates that you want to use the field represented by splitPk for data sharding. As a result, concurrent tasks are started to synchronize data, which improves the efficiency of data synchronization.

  • It is recommended that you use the primary key of the table as splitPk because the primary key is usually evenly distributed. Therefore, the shards obtained are less likely to have data hot spots.

  • Currently, splitPk supports only integer data sharding. String, floating point, and date data types are not supported. If you set this parameter to a field of an unsupported data type, ApsaraDB for OceanBase Reader returns an error.

  • If you set splitPk to an empty value, the system considers that you do not allow sharding for a single table. Therefore, a single channel is used for extraction.

No

Empty

where

ApsaraDB for OceanBase Reader concatenates SQL statements based on the specified column, table, and where conditions, and retrieves data based on the SQL statements.

For example, during testing, you can specify the where condition as limit 10. In actual business scenarios, you usually select the data of the current day for synchronization and specify the where condition as gmt_create>$bizdate.

  • The where condition can effectively perform business incremental synchronization.

  • If the where condition is not configured or is empty, the system considers that the entire table is synchronized.

No

None

querySql

In some business scenarios, the where configuration item is not sufficient to describe the selected conditions. You can use this configuration item to customize the filter SQL. After this item is configured, the data synchronization system ignores the tables, columns, and splitPk configuration items and directly uses the content configured in this item to filter data.

When you configure querySql, ApsaraDB for OceanBase Reader directly ignores the table, column, where, and splitPk conditions.

No

None

fetchSize

This configuration item defines the number of data records that the plug-in and the database server retrieve at a time. This value determines the number of interactions between Data Integration and the database server and can greatly improve data extraction performance.

Note

If the fetchSize value is too large (>2048), the data synchronization process may cause an out-of-memory (OOM) error.

No

1,024

Code for Writer

{
    "type":"job",
    "version":"2.0",// The version number
    "steps":[
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"apsaradb_for_OceanBase",// The plug-in name
            "parameter":{
                "datasource": "The name of the data source",
                "column": [// The names of the columns
                    "id",
                    "name"
                ],
                "table": "apsaradb_for_OceanBase_table",// The name of the table
                "preSql": [ // The SQL statement that you want to execute before the synchronization task is run
                    "delete from @table where db_id = -1"
                ],
                "postSql": [// The SQL statement that you want to execute after the synchronization task is run
                    "update @table set db_modify_time = now() where db_id = 1"
                ],
                "obWriteMode": "insert",
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The number of error records
        },
        "speed":{
            "throttle":true,// Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true.
            "concurrent":1, // The maximum number of parallel threads
            "mbps":"12"// The maximum transmission rate. Unit: MB/s.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Parameters in code for Writer

Parameter

Description

Required

Default value

datasource

If the version of DataWorks that you use supports adding ApsaraDB for OceanBase data sources, you can reference the ApsaraDB for OceanBase data source that you added based on the data source name.

Includes two configuration methods: jdbcUrl and username.

No

None

jdbcUrl

The JDBC URL of the ApsaraDB for OceanBase database. jdbcUrl is included in the connection configuration unit.

  • You can configure only one JDBC URL for a database. ApsaraDB for OceanBase Writer cannot write data to a database that has multiple primary databases.

  • The format of jdbcUrl is consistent with the official format of ApsaraDB for OceanBase, and additional connection parameters can be specified. For example, jdbc:oceanbase://127.0.0.1:3306/database.

Yes

None

username

The username that you use to connect to the database.

Yes

None

password

The password that you use to connect to the database.

Yes

None

table

The name of the table to which you want to write data. Specify the name in a JSON array.

Note

table must be included in the connection configuration unit.

Yes

None

column

The names of the columns to which you want to write data. Separate the names with commas (,). For example, "column": ["id", "name", "age"].

Note

The column configuration item must be specified. It cannot be empty.

Yes

None

obWriteMode

Controls the mode used to write data to the target table. Optional configuration.

  • insert: insert into ... When a primary key or UNIQUE constraint conflict occurs, the conflicting row cannot be written.

  • update: ... on duplicate key update ... Used in MySQL tenant mode. When a conflict occurs, the conflicting row is updated.

  • merge: merge into ... matched then update ... Used in Oracle tenant mode. When a conflict occurs, the conflicting row is updated.

Yes

insert

onClauseColumns

Note

Used in Oracle tenant mode. When obWriteMode is set to merge, this parameter is required. If it is not configured, insert is still used to write data.

Configure the primary key field or UNIQUE constraint field. Separate multiple fields with commas (,). For example, ID,C1.

No

None

obUpdateColumns

Note

This parameter takes effect when obWriteMode is set to merge or update.

When a data write conflict occurs, specify the fields to be updated. Separate multiple fields with commas (,). For example, c2,c3.

No

All fields

preSql

The SQL statement that you want to execute before the synchronization task is run. If the SQL statement contains a table name to be operated on, use @table to represent it. This ensures that the variable is replaced with the actual table name when the SQL statement is executed.

No

None

postSql

The SQL statement that you want to execute after the synchronization task is run.

No

None

batchSize

The number of data records to write at a time. Set this parameter to an appropriate value based on your business requirements. This greatly reduces the interactions between Data Integration and the database and increases throughput.

Note

If the fetchSize value is too large (>2048), the data synchronization process may cause an out-of-memory (OOM) error.

No

1,024

  • On this page (1)
  • Supported versions
  • Batch data read and write
  • Limits
  • Batch data read
  • Batch data write
  • Real-time data read
  • Preparations before data synchronization
  • Preparation 1: Configure an IP address whitelist
  • Preparation 2: Prepare an account that has the required permissions
  • Add a data source
  • Develop a data synchronization task
  • Configure a batch synchronization task to synchronize data of a single table
  • Configure a real-time synchronization task to synchronize data of a database
  • Configure synchronization settings to implement (real-time) synchronization of full and incremental data in a single table or a database
  • Appendix: code and parameters
  • Configure a batch synchronization task by using the code editor
  • Code for Reader
  • Parameters in code for Reader
  • Code for Writer
  • Parameters in code for Writer
Feedback
phone Contact Us