ApsaraDB for OceanBase data source

DataWorks provides ApsaraDB for OceanBase Reader and ApsaraDB for OceanBase Writer for you to read data from and write data to ApsaraDB for OceanBase data sources. You can configure data synchronization tasks for ApsaraDB for OceanBase data sources. This topic describes the capabilities of synchronizing data from or to ApsaraDB for OceanBase data sources.

Supported ApsaraDB for OceanBase versions

Batch data read and write

ApsaraDB for OceanBase versions supported by ApsaraDB for OceanBase Reader and ApsaraDB for OceanBase Writer:

OceanBase 2.x
OceanBase 3.x
OceanBase 4.x

Limits

Batch data read

ApsaraDB for OceanBase supports the Oracle and MySQL tenant modes. Make sure that the WHERE clause and the columns that you specify in the column parameter comply with the SQL syntax constraints that Oracle or MySQL supports. Otherwise, the SQL statement may fail to be executed.
Data of views can be read.
When you use a batch synchronization task to read data from an ApsaraDB for OceanBase data source, do not modify the data that is being synchronized. Otherwise, data quality issues, such as data duplication or data loss, may occur.

Batch data write

Note

A synchronization task that uses ApsaraDB for OceanBase Writer must have at least the permissions to execute INSERT INTO. Whether other permissions are required depends on the SQL statements that you specify in the preSql and postSql parameters when you configure the task.

We recommend that you write data to the destination table in batches. ApsaraDB for OceanBase Writer submits a write request when the number of rows reaches a specific threshold.
ApsaraDB for OceanBase supports the Oracle and MySQL tenant modes. Make sure that the SQL statements that you specify in the preSql and postSql parameters comply with the related SQL syntax. Otherwise, the SQL statements may fail to be executed.

Real-time data read

ApsaraDB for OceanBase is a distributed relational database that can integrate data distributed in multiple physical databases into a unified logical database. However, you can synchronize data of only one physical AnalyticDB for MySQL to an AnalyticDB for MySQL cluster in real time.

Note

For a real-time synchronization task used to synchronize data of a database, you cannot select a data source that is added in the connection string mode.
For a real-time synchronization task used to synchronize data of a ApsaraDB for OceanBase data source, the version of your ApsaraDB for OceanBase database must be V3.0 or later.

Prepare an ApsaraDB for OceanBase environment before data synchronization

Before you use DataWorks to synchronize data from or to an ApsaraDB for OceanBase data source, you must prepare an ApsaraDB for OceanBase environment. This ensures that a data synchronization task can be configured and can synchronize data from or to the ApsaraDB for OceanBase data source as expected. The following information describes how to prepare an ApsaraDB for OceanBase environment for data synchronization from or to an ApsaraDB for OceanBase data source.

Preparation 1: Configure an IP address whitelist

Add the elastic IP address (EIP) of the serverless resource group or exclusive resource group for Data Integration and the CIDR block of the vSwitch with which the serverless resource group or exclusive resource group for Data Integration is associated to the whitelist of the ApsaraDB for OceanBase data source. For more information, see Configure an IP address whitelist.

Preparation 2: Prepare an account that has the required permissions

You must create an account to log on to the database of the ApsaraDB for OceanBase cluster. You must grant the required permissions to the account. For more information, see Create an account.

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Add and manage data sources. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Configure a batch synchronization task to synchronize data of a single table

For more information about the configuration procedure, see Configure a batch synchronization task by using the codeless UI and Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix: Code and parameters.

Configure a real-time synchronization task to synchronize data of a database

For more information about the configuration procedure, see Configure a real-time synchronization task in DataStudio.

Configure synchronization settings to implement (real-time) synchronization of full and incremental data in a single table or a database

For more information about the configuration procedure, see Configure a synchronization task in Data Integration.

Appendix: Code and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Code for ApsaraDB for OceanBase Reader

{
    "type": "job",
    "steps": [
        {
            "stepType": "apsaradb_for_OceanBase", // The plug-in name.
            "parameter": {
                "datasource": "", // The name of the data source.
                "where": "",
                "column": [ // The names of the columns.
                    "id",
                    "name"
                ],
                "splitPk": ""
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {
                "print": false,
                "fieldDelimiter": ","
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "version": "2.0",
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    },
    "setting": {
        "errorLimit": {
            "record": "0" // The maximum number of dirty data records allowed.
        },
        "speed": {
            "throttle": true, // Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent": 1, // The maximum number of parallel threads.
            "mbps":"12"// The maximum transmission rate. Unit: MB/s. 
        }
    }
}

Parameters in code for ApsaraDB for OceanBase Reader

Parameter	Description	Required	Default value
datasource	The name of the data source. It must be the same as the name of the added data source. You can connect to the ApsaraDB for OceanBase database based on the setting of the jdbcUrl or username parameter.	Yes	No default value
jdbcUrl	The JDBC URL of the ApsaraDB for OceanBase database. You can specify multiple JDBC URLs in a JSON array for a database. If you specify multiple JDBC URLs, ApsaraDB for OceanBase Reader verifies the connectivity of the URLs in sequence to find a valid URL. If no URL is valid, ApsaraDB for OceanBase Reader returns an error. Note The jdbcUrl parameter must be included in the connection parameter. The value of the jdbcUrl parameter must comply with the standard format that is supported by ApsaraDB for OceanBase. You can also specify the information of the attachment facility. An example JDBC URL is jdbc:OceanBase://127.0.0.1:3306/database. You must specify either jdbcUrl or `username`.	No	No default value
username	The username that you can use to connect to the database.	No	No default value
password	The password that you can use to connect to the database.	No	No default value
table	The name of the table from which you want to read data. ApsaraDB for OceanBase Reader can read data from multiple tables. The tables are specified in a JSON array. If you specify multiple tables, make sure that the tables have the same schema. ApsaraDB for OceanBase Reader does not check whether the tables have the same schema. Note The table parameter must be included in the connection parameter.	Yes	No default value
column	The names of the columns from which you want to read data. Specify the names in a JSON array. The default value is [ * ], which indicates all columns in the source table. You can select specific columns to read. The column order can be changed. This indicates that you can specify columns in an order different from the order specified by the schema of the source table. Constants are supported. Example: '123'. Functions are supported. Example: date('now'). The column parameter must explicitly specify all the columns that you want to synchronize. This parameter cannot be left empty.	Yes	No default value
splitPk	The field that is used for data sharding when ApsaraDB for OceanBase Reader reads data. If you specify this parameter, the source table is sharded based on the value of this parameter. Data Integration then runs parallel threads to read data. This way, data can be synchronized more efficiently. We recommend that you set the splitPk parameter to the name of a primary key column of the table. Data can be evenly distributed to different shards based on the primary key column, instead of being intensively distributed only to specific shards. The splitPk parameter supports sharding for data only of integer data types. If you set this parameter to a field of an unsupported data type, such as a string, floating point, or date data type, ApsaraDB for OceanBase Reader returns an error. If you leave the splitPk parameter empty, ApsaraDB for OceanBase Reader uses a single thread to read data.	No	No default value
where	The WHERE clause. ApsaraDB for OceanBase Reader generates an SQL statement based on the column, table, and where parameters that you have configured and uses the generated statement to read data. For example, when you perform a test, you can set the where parameter to limit 10. To read the data that is generated on the current day, you can set the where parameter to `gmt_create > $bizdate`. You can use the WHERE clause to read incremental data. If the where parameter is not provided or is left empty, all data is read.	No	No default value
querySql	The SQL statement that is used for refined data filtering. If you specify this parameter, data is filtered based only on the value of this parameter. If you specify this parameter, ApsaraDB for OceanBase Reader ignores the settings of the table, column, where, and splitPk parameters.	No	No default value
fetchSize	The number of data records to read at a time. This parameter determines the number of interactions between Data Integration and the database and affects read efficiency. Note If you set this parameter to a value greater than 2048, an out of memory (OOM) error may occur during data synchronization.	No	1,024

Code for ApsaraDB for OceanBase Writer

{
    "type":"job",
    "version":"2.0",// The version number. 
    "steps":[
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"apsaradb_for_OceanBase",// The plug-in name. 
            "parameter":{
                "datasource": "The name of the data source",
                "column": [// The names of the columns. 
                    "id",
                    "name"
                ],
                "table": "apsaradb_for_OceanBase_table",// The name of the table. 
                "preSql": [ // The SQL statement that you want to execute before the synchronization task is run. 
                    "delete from @table where db_id = -1"
                ],
                "postSql": [// The SQL statement that you want to execute after the synchronization task is run. 
                    "update @table set db_modify_time = now() where db_id = 1"
                ],
                "obWriteMode": "insert",
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed. 
        },
        "speed":{
            "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1, // The maximum number of parallel threads. 
            "mbps":"12"// The maximum transmission rate. Unit: MB/s. 
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Parameters in code for ApsaraDB for OceanBase Writer

Parameter	Description	Required	Default value
datasource	The name of the data source. It must be the same as the name of the added data source. You can connect to the ApsaraDB for OceanBase database based on the setting of the jdbcUrl or username parameter.	No	No default value
jdbcUrl	The JDBC URL of the ApsaraDB for OceanBase database. The jdbcUrl parameter must be included in the connection parameter. You can configure only one JDBC URL for a database. ApsaraDB for OceanBase Writer cannot write data to a database that has multiple primary databases. The value format of the jdbcUrl parameter must be in accordance with the official specifications of ApsaraDB for OceanBase. You can also specify additional JDBC connection properties in the value of this parameter. Example: `jdbc:oceanbase://127.0.0.1:3306/database`.	Yes	No default value
username	The username that you can use to connect to the database.	Yes	No default value
password	The password that you can use to connect to the database.	Yes	No default value
table	The name of the table to which you want to write data. Specify the name in a JSON array. Note The table parameter must be included in the connection parameter.	Yes	No default value
column	The names of the columns to which you want to write data. Separate the names with commas (,), such as `"column": ["id", "name", "age"]`. Note The column parameter cannot be left empty.	Yes	No default value
obWriteMode	Optional. The write mode. Valid values: insert: Corresponds to the `insert into ...` statement. If a primary key conflict or unique index conflict occurs, data cannot be written to conflicting rows. update: Corresponds to the `... on duplicate key update ...` statement. This write mode is used in MySQL tenant mode. When a conflict occurs, data in conflicting rows is updated. merge: Corresponds to the `merge into ... matched then update ...` statement. This write mode is used in Oracle tenant mode. When a conflict occurs, data in conflicting rows is updated.	No	insert
onClauseColumns	Note This parameter is used in Oracle tenant mode and is required when the `obWriteMode` parameter is set to `merge`. If this parameter is left empty, the `insert` write mode is used. Set this parameter to the primary key field or unique constraint field. If you specify multiple fields, separate the fields with commas (,). Example: `ID,C1`.	No	No default value
obUpdateColumns	Note This parameter takes effect when the `obWriteMode` parameter is set to `merge` or `update`. The fields that need to be updated when a data write conflict occurs. If you specify multiple fields, separate the fields with commas (,). Example: `c2,c3`.	No	All fields
preSql	The SQL statement that you want to execute before the synchronization task is run. Use `@table` to indicate the name of the destination table in the SQL statement. When you execute this SQL statement, @table is replaced by the name of the destination table.	No	No default value
postSql	The SQL statement that you want to execute after the synchronization task is run.	No	No default value
batchSize	The number of data records to write at a time. Set this parameter to an appropriate value based on your business requirements. This greatly reduces the interactions between Data Integration and the database and increases throughput. Note If you set this parameter to a value greater than 2048, an OOM error may occur during data synchronization.	No	1,024

Supported ApsaraDB for OceanBase versions

Batch data read and write

Limits

Batch data read

Batch data write

Real-time data read

Prepare an ApsaraDB for OceanBase environment before data synchronization

Preparation 1: Configure an IP address whitelist

Preparation 2: Prepare an account that has the required permissions

Add a data source

Develop a data synchronization task

Configure a batch synchronization task to synchronize data of a single table

Configure a real-time synchronization task to synchronize data of a database

Configure synchronization settings to implement (real-time) synchronization of full and incremental data in a single table or a database

Appendix: Code and parameters

Configure a batch synchronization task by using the code editor

Code for ApsaraDB for OceanBase Reader

Parameters in code for ApsaraDB for OceanBase Reader

Code for ApsaraDB for OceanBase Writer

Parameters in code for ApsaraDB for OceanBase Writer

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

Asia Accelerator Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)