All Products
Search
Document Center

DataWorks:Oracle data source

Last Updated:Nov 21, 2024

DataWorks provides Oracle Reader and Oracle Writer for you to read data from and write data to Oracle data sources. You can use the codeless user interface (UI) or code editor to configure data synchronization tasks for Oracle data sources. This topic describes the capabilities of synchronizing data from or to Oracle data sources.

Supported Oracle versions

Batch data read and write

Version

Oracle Reader for batch data read

Oracle Writer for batch data write

Oracle 11.2 or 11gR2

Supported

Supported

Oracle 12.1 or 12cR1

Supported

Supported

Oracle 12.2 or 12cR2

Supported (Not supported by features of Oracle of later versions)

Supported (Not supported by features of Oracle of later versions)

Oracle 18.3

Supported (Not supported by features of Oracle of later versions)

Supported (Not supported by features of Oracle of later versions)

Oracle 19.x

Supported (Not supported by features of Oracle of later versions)

Supported (Not supported by features of Oracle of later versions)

Oracle 21.1

Supported (Not supported by features of Oracle of later versions)

Supported (Not supported by features of Oracle of later versions)

Important

Oracle Writer uses the ojdbc6-12.1.1.jar driver.

Real-time data read

  • Supported Oracle database versions:

    12c non cdb, 18c non cdb, 19c non cdb, 19c cdb, and versions later than 11g R2

  • Unsupported Oracle database versions:

    12c cdb and 18c cdb

Note

An Oracle database of Oracle 12c or a later version can serve as a container database (CDB) to host multiple pluggable databases (PDBs).

Limits

  • The following encoding formats are supported for data synchronization: UTF-8, AL32UTF8, AL16UTF16, and ZHS16GBK.

  • A maximum of 500 GB incremental data can be synchronized from an Oracle instance in real time every day.

  • Real-time synchronization of data from an Oracle data source in Data Integration is implemented based on the Oracle LogMiner utility that is used to analyze log files. You can enable the supplemental logging feature for a primary or secondary database only in a primary Oracle database.

  • Data of views can be read during batch synchronization.

Data types

Data type

Oracle Reader for batch data read

Oracle Writer for batch data write

Real-time data read

NUMBER

Supported

Supported

Supported

BINARY FLOAT

Supported

Supported

Supported

BINARY DOUBLE

Supported

Supported

Supported

CHAR

Supported

Supported

Supported

NCHAR

Supported

Supported

Supported

VARCHAR2

Supported

Supported

Supported

NVARCHAR2

Supported

Supported

Supported

DATE

Supported

Supported

Supported

TIMESTAMP

Supported

Supported

Supported

TIMESTAMP WITH TIME ZONE

Supported

Supported

Not supported

TIMESTAMP WITH LOCAL TIME ZONE

Supported

Supported

Not supported

CLOB

Supported

Supported

Supported

BLOB

Supported

Supported

Supported

RAW

Supported

Supported

Supported

ROWID

Not supported

Not supported

Supported

UROWID

Not supported

Not supported

Supported

FLOAT

Supported

Supported

Supported

INTERVAL DAY TO SECOND

Not supported

Not supported

Supported

INTERVAL YEAR TO MONTH

Not supported

Not supported

Supported

BFILE

Not supported

Not supported

Not supported

LONG

Not supported

Not supported

Not supported

LONG RAW

Not supported

Not supported

Not supported

NCLOB

Supported

Supported

Not supported

STRUCT

Supported

Supported

Not supported

User-Defined Types

Not supported

Not supported

Not supported

AnyType

Not supported

Not supported

Not supported

AnyData

Not supported

Not supported

Not supported

AnyDataSet

Not supported

Not supported

Not supported

XmlType

Not supported

Not supported

Not supported

Spatial Types

Not supported

Not supported

Not supported

Media Types

Not supported

Not supported

Not supported

The following table lists the data type mappings based on which Oracle Reader converts data types.

Category

Oracle data type

Integer

NUMBER, ROWID, INTEGER, INT, and SMALLINT

Floating point

NUMERIC, DECIMAL, FLOAT, DOUBLE PRECISION, and REAL

String

LONG, CHAR, NCHAR, VARCHAR, VARCHAR2, NVARCHAR2, CLOB, NCLOB, CHARACTER, CHARACTER VARYING, CHAR VARYING, NATIONAL CHARACTER, NATIONAL CHAR, NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, and NCHAR VARYING

Date and time

TIMESTAMP and DATE

Boolean

BIT and BOOLEAN

Binary

BLOB, BFILE, RAW, and LONG RAW

Prepare an Oracle environment before data synchronization

Before you use DataWorks to synchronize data from or to an Oracle data source, you must prepare an Oracle environment. This ensures that a data synchronization task can be configured and can synchronize data from or to the Oracle data source as expected. The following information describes how to prepare an Oracle environment for data synchronization from or to an Oracle data source.

Preparation 1: Check the version of your Oracle database

The supported Oracle database versions vary based on synchronization scenarios. Check whether the version of your Oracle database is supported before you synchronize data from or to the database.

  1. You can execute one of the following statements to check the version of your Oracle database:

    • Statement 1:

      select * from v$version;
    • Statement 2:

      select version from v$instance;
  2. If the version of the Oracle database is 12c or 18c, you must execute the following statement to check whether the database can serve as a CDB. If the Oracle database can serve as a CDB, this Oracle database is not supported by real-time synchronization tasks in DataWorks.

    select name,cdb,open_mode,con_id from v$database;

Preparation 2: Prepare an account that has the required permissions

You must prepare an account that is used to log on to the Oracle database. The account must have the required permissions on the Oracle database.

  1. Create an account. For more information, see Create an account to access an Oracle database.

  2. Grant the required permissions to the account.

    You can execute the following statements to grant the required permissions to the account. Replace Account for data synchronization with the created account when you execute a specific statement.

    grant create session to 'Account for data synchronization';  // Authorize the synchronization account to access the database. 
    grant connect to 'Account for data synchronization';  // Authorize the synchronization account to connect to the database. 
    grant select on nls_database_parameters to 'Account for data synchronization';  // Authorize the synchronization account to query the settings of nls_database_parameters. 
    grant select on all_users to 'Account for data synchronization';  // Authorize the synchronization account to query all users in the database. 
    grant select on all_objects to 'Account for data synchronization';  // Authorize the synchronization account to query all objects in the database. 
    grant select on DBA_MVIEWS to 'Account for data synchronization';  // Authorize the synchronization account to check the materialized view of the database. 
    grant select on DBA_MVIEW_LOGS to 'Account for data synchronization';  // Authorize the synchronization account to view the materialized view logs of the database. 
    grant select on DBA_CONSTRAINTS to 'Account for data synchronization';  // Authorize the synchronization account to view the constraints on all tables in the database. 
    grant select on DBA_CONS_COLUMNS to 'Account for data synchronization';  // Authorize the synchronization account to view information about the columns that meet the specified constraints on all tables in the database. 
    grant select on all_tab_cols to 'Account for data synchronization';  // Authorize the synchronization account to view information about columns in tables, views, and clusters of the database. 
    grant select on sys.obj$ to 'Account for data synchronization';  // Authorize the synchronization account to view objects in the database. sys.obj$ indicates an object table that is contained in the data dictionary table. The object table contains all objects. 
    grant select on SYS.COL$ to 'Account for data synchronization';  // Authorize the synchronization account to view definitions of columns in tables of the database. SYS.COL$ stores definitions of columns in tables. 
    grant select on sys.USER$ to 'Account for data synchronization';  // Authorize the synchronization account to view the system table of the database. sys.USER$ indicates a default user session service. 
    grant select on sys.cdef$ to 'Account for data synchronization';  // Authorize the synchronization account to view the system table of the database. 
    grant select on sys.con$ to 'Account for data synchronization';  // Authorize the synchronization account to view the constraints of the database. sys.con$ records the constraints. 
    grant select on all_indexes to 'Account for data synchronization';  // Authorize the synchronization account to view all indexes of the database. 
    grant select on v_$database to 'Account for data synchronization';  // Authorize the synchronization account to check the v_$database view of the database. 
    grant select on V_$ARCHIVE_DEST to 'Account for data synchronization';  // Authorize the synchronization account to check the V_$ARCHIVE_DEST view of the database. 
    grant select on v_$log to 'Account for data synchronization';  // Authorize the synchronization account to check the v_$log view of the database. v_$log displays log information about control files. 
    grant select on v_$logfile to 'Account for data synchronization';  // Authorize the synchronization account to check the v_$logfile view of the database. v_$logfile contains information about redo log files. 
    grant select on v_$archived_log to 'Account for data synchronization';  // Authorize the synchronization account to check the v$archived_log view of the database. v$archived_log contains information about archived logs. 
    grant select on V_$LOGMNR_CONTENTS to 'Account for data synchronization';  // Authorize the synchronization account to check the V_$LOGMNR_CONTENTS view of the database. 
    grant select on DUAL to 'Account for data synchronization';   // Authorize the synchronization account to view the DUAL table of the database. DUAL is a virtual table that contains SELECT syntax rules. In Oracle, only one record is retained in the DUAL table. 
    grant select on v_$parameter to 'Account for data synchronization';  // Authorize the synchronization account to check the v_$parameter view of the database. v$parameter is a dynamic dictionary table that stores the values of parameters in the database. 
    grant select any transaction to 'Account for data synchronization';  // Authorize the synchronization account to view transactions of the database. 
    grant execute on SYS.DBMS_LOGMNR to 'Account for data synchronization';  // Authorize the synchronization account to use the LOGMNR tool. The LOGMNR tool helps you analyze transactions and retrieve lost data. 
    grant alter session to 'Account for data synchronization';  // Authorize the synchronization account to modify connection settings of the database. 
    grant select on dba_objects to 'Account for data synchronization';  // Authorize the synchronization account to view all objects in the database. 
    grant select on v_$standby_log to 'Account for data synchronization';  // Authorize the synchronization account to check the v_$standby_log view of the database. v_$standby_log contains archived logs of the secondary database. 
    grant select on v_$ARCHIVE_GAP to 'Account for data synchronization';  // Authorize the synchronization account to query missing archived logs.

    To implement batch synchronization of full data, you must also execute the following statement to grant the query permissions on all tables to the synchronization account:

    grant select any table to 'Account for data synchronization';

    For an Oracle database of Oracle 12c or a later version, you must execute the following statement to grant the log mining permissions to the synchronization account. The log mining feature is a built-in feature for Oracle databases of versions earlier than 12c. You do not need to execute the following statement in such databases.

    grant LOGMINING TO 'Account for data synchronization';

Preparation 3: Enable archive logging and supplemental logging, and switch to another redo log file

  1. Execute the following SQL statements to enable archive logging:

    shutdown immediate;
    startup mount;
    alter database archivelog;
    alter database open;
  2. Enable supplemental logging.

    SQL statements:

    alter database add supplemental log data(primary key) columns; // Enable supplemental logging for primary key columns. 
    alter database add supplemental log data(unique) columns; // Enable supplemental logging for unique index columns.

  3. Switch to another redo log file.

    After you enable supplemental logging, you must execute the following statement multiple times to switch to another redo log file. We recommend that you execute the following statement for five times:

    alter system switch logfile;
    Note

    This ensures that data can be written to the next log file after the current log file is full. Records of performed operations will not be lost. This facilitates data recovery.

Preparation 4: Check character encoding formats of your Oracle database

Execute the following statement to check character encoding formats of the database:

select * from v$nls_parameters where PARAMETER IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');
  • v$nls_parameters stores values of parameters in the database.

  • NLS_CHARACTERSET indicates a database character set. NLS_NCHAR_CHARACTERSET indicates a national character set. These two sets are used to store data of the respective character type.

Note

The following encoding formats are supported for data synchronization: UTF-8, AL32UTF8, AL16UTF16, and ZHS16GBK. If the database contains the character encoding formats that are not supported for data synchronization, change the formats before you synchronize data.

Preparation 5: Check the data types of tables in your Oracle database

You can execute the SELECT statement to query the data types of tables in the database. Sample statement that is executed to query the data types of the 'tablename' table:

select COLUMN_NAME,DATA_TYPE from all_tab_columns where TABLE_NAME='tablename'; 
  • COLUMN_NAME: the name of the column.

  • DATA_TYPE: the data type of the column.

  • all_tab_columns: the view that stores information about all columns in tables of the database.

  • TABLE_NAME: the name of the table that you want to query. When you execute the preceding statement, replace 'tablename' with the name of the table that you want to query.

You can also execute the select * from 'tablename'; statement to query the information about the desired table and obtain the data types of columns in the table.

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Add and manage data sources. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Configure a batch synchronization task to synchronize data of a single table

Configure a real-time synchronization task to synchronize data of a single table

For more information about the configuration procedure, see Configure a real-time synchronization task in DataStudio.

Configure synchronization settings to implement batch synchronization of all data in a database, real-time synchronization of full data or incremental data in a database, and real-time synchronization of data from sharded tables in a sharded database

For more information about the configuration procedure, see Configure a synchronization task in Data Integration.

Additional information

  • Why are errors repeatedly reported when a real-time synchronization node is run to synchronize data from Oracle, PolarDB, or MySQL?

  • Data synchronization between primary and secondary databases

    A secondary Oracle database can be deployed for disaster recovery. The secondary database continuously synchronizes data from the primary database based on binary logs. Data latency between the primary and secondary databases cannot be prevented. This may result in data inconsistency.

  • Data consistency control

    Oracle is a relational database management system (RDBMS) that supports strong consistency for data queries. A database snapshot is created before a synchronization task starts. Oracle Reader reads data from the database snapshot. Therefore, if new data is written to the database during data synchronization, Oracle Reader cannot obtain the new data.

    Data consistency cannot be ensured if you enable if you enable Oracle Reader to use parallel threads to read data in a synchronization task.

    Oracle Reader shards the source table based on the value of the splitPk parameter and uses parallel threads to read data. These parallel threads belong to different transactions and read data at different points in time. Therefore, the parallel threads observe different snapshots.

    Data inconsistencies cannot be prevented if parallel threads are used for a synchronization task. The following workarounds can be used:

    • Enable Oracle Reader to use a single thread to read data in a synchronization task. This indicates that you do not need to specify a shard key for Oracle Reader. This way, data consistency is ensured, but data is synchronized at low efficiency.

    • Make sure that no data is written to the source table during data synchronization. This ensures that the data in the source table remains unchanged during data synchronization. For example, you can lock the source table or disable data synchronization between primary and secondary databases. This way, data is efficiently synchronized, but your ongoing services may be interrupted.

  • Character encoding

    Oracle Reader uses Java Database Connectivity (JDBC) to read data. This enables Oracle Reader to automatically convert the encoding format of characters. Therefore, you do not need to specify the encoding format.

  • Incremental data synchronization

    Oracle Reader connects to a database by using JDBC and uses a SELECT statement with a WHERE clause to read incremental data.

    • For batch data, incremental add, update, and delete operations (including logically delete operations) are distinguished by timestamps. Specify the WHERE clause based on a specific timestamp. The time indicated by the timestamp must be later than the time indicated by the latest timestamp in the previous synchronization.

    • For streaming data, specify the WHERE clause based on the ID of a specific record. The ID must be greater than the maximum ID involved in the previous synchronization.

    If the data that is added or modified cannot be distinguished, Oracle Reader can read only full data.

  • Syntax validation

    Oracle Reader allows you to specify custom SELECT statements by using the querySql parameter but does not verify the syntax of these statements.

Appendix: Code and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Code for Oracle Reader

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "oracle",
            "parameter": {
                "selectedDatabase": "AUTOTEST",
                "indexes": [],
                "datasource": "oracle_test",
                "envType": 0,
                "useSpecialSecret": true,
                "column": [
                    "id"
                ],
                "where": "",
                "splitPk": "id",
                "encoding": "UTF-8",
                "table": "AUTOTEST.table01"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "odps",
            "parameter": {
            },
            "name": "Writer",
            "category": "writer"
        },
        {
            "name": "Processor",
            "stepType": null,
            "category": "processor",
            "copies": 1,
            "parameter": {
                "nodes": [],
                "edges": [],
                "groups": [],
                "version": "2.0"
            }
        }
    ],
    "setting": {
        "executeMode": null,
        "errorLimit": {
            "record": ""
        },
        "speed": {
            "concurrent": 2,
            "throttle": false
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Parameters in code for Oracle Reader

Parameter

Description

Required

Default value

datasource

The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor.

Yes

No default value

selectedDatabase

The schema of the database from which you want to read data.

Yes

No default value

table

The name of the table from which you want to read data. The name of the table must be in the schema.tableName format.

Note

For example, if the selectedDatabase parameter is set to AUTOTEST and the table name is table01, you must set the table parameter to AUTOTEST.table01.

Yes

No default value

column

The names of the columns from which you want to read data. Specify the names in a JSON array. The default value is ["*"], which indicates all the columns in the source table.

  • You can select specific columns to read.

  • The column order can be changed. This indicates that you can specify columns in an order different from the order specified by the schema of the source table.

  • Constants are supported. The column names must be arranged in JSON format.

    ["id", "1", "'mingya.wmy'", "null", "to_char(a + 1)", "2.3" , "true"]
    • id: a column name.

    • 1: an integer constant.

    • 'mingya.wmy': a string constant, which is enclosed in single quotation marks (').

    • null: a null pointer.

    • to_char(a + 1): a function expression.

    • 2.3: a floating-point constant.

    • true: a Boolean value.

  • The column parameter cannot be left empty.

Yes

No default value

splitFactor

The sharding factor, which determines the number of parts into which data to be synchronized is sharded. If you configure parallelism for your batch synchronization task, the number of parts is calculated based on the following formula: Number of parallel threads × Sharding factor. For example, if the number of parallel threads and the sharding factor are 5, the number of parts into which data to be synchronized is sharded is 25.

Note

We recommend that you specify a sharding factor that ranges from 1 to 100. If you specify a sharding factor that is greater than 100, an out of memory (OOM) error may occur.

No

5

splitMode

The shard mode. Valid values:

  • averageInterval: average sampling. In this mode, the maximum and minimum values of all data are identified based on the splitPk parameter. Then, data is evenly distributed based on the number of shards.

  • randomSampling: random sampling. In this mode, data entries are randomly identified as sharding points.

Note

The splitMode parameter must be used together with the splitPk parameter.

  • If the splitPk parameter is set to a numeric field, set the splitMode parameter to averageInterval.

  • If the splitPk parameter is set to a string field, set the splitMode parameter to randomSampling.

No

randomSampling

splitPk

The field that is used for data sharding when Oracle Reader reads data. If you configure this parameter, the source table is sharded based on the value of this parameter. Data Integration then runs parallel threads to read data. This improves data synchronization efficiency.

  • We recommend that you set the splitPk parameter to the name of a primary key column of the table. Data can be evenly distributed to different shards based on the primary key column, instead of being intensively distributed only to specific shards.

  • The splitPk parameter supports data sharding for data types such as numeric and string. The splitMode parameter must be used together with the splitPk parameter.

    • If the splitPk parameter is set to a numeric field, set the splitMode parameter to averageInterval.

    • If the splitPk parameter is set to a string field, set the splitMode parameter to randomSampling.

  • If you do not configure the splitPk parameter, Oracle Reader uses a single thread to read all data in the source table.

Note

If you use Oracle Reader to read data from a view, you cannot set the splitPk parameter to a field of the ROWID data type.

No

No default value

where

The WHERE clause. Oracle Reader generates an SQL statement based on the settings of the table, column, and where parameters and uses the statement to read data. For example, you can set this parameter to row_number() in a test.

  • You can use the WHERE clause to read incremental data.

  • If the where parameter is not provided or is left empty, Data Integration reads all data.

No

No default value

querySql (available only in the code editor)

The SQL statement that is used for refined data filtering. If you configure this parameter, Data Integration filters data based on the value of this parameter. For example, if you want to join multiple tables for data synchronization, you can set this parameter to select a,b from table_a join table_b on table_a.id = table_b.id. If you configure this parameter, Oracle Reader ignores the settings of the table, column, and where parameters.

No

No default value

fetchSize

The number of data records to read at a time. This parameter determines the number of interactions between Data Integration and the database and affects read efficiency.

Note

If you set this parameter to a value greater than 2048, an OOM error may occur during data synchronization.

No

1,024

Code for Oracle Writer

{
    "type":"job",
    "version":"2.0",// The version number. 
    "steps":[
        { 
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"oracle",// The plug-in name. 
            "parameter":{
                "postSql":[],// The SQL statement that you want to execute after the synchronization task is run. 
                "datasource":"",
                "session":[],// The settings of the session to the database. 
                "column":[// The names of the columns. 
                    "id",
                    "name"
                ],
                "encoding":"UTF-8",// The encoding format. 
                "batchSize":1024,// The number of data records to write at a time. 
                "table":"",// The name of the table. 
                "preSql":[]// The SQL statement that you want to execute before the synchronization task is run. 
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed. 
        },
        "speed":{
            "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1, // The maximum number of parallel threads. 
            "mbps":"12"// The maximum transmission rate. Unit: MB/s. 
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Parameters in code for Oracle Writer

Parameter

Description

Required

Default value

datasource

The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor.

Yes

No default value

table

The name of the table to which you want to write data. If the table uses the default schema for the destination database, you must set this parameter to the name of the table to which you want to write data. If the table uses a custom schema, you must configure this parameter in the Schema name.Name of the table to which you want to write data format.

Yes

No default value

writeMode

The write mode. Valid value: insert into. If a primary key conflict or unique index conflict occurs, data cannot be written to the conflicting rows, and the data that is not written to these rows is regarded as dirty data.

No

insert into

column

The names of the columns to which you want to write data. Separate the names with commas (,), such as "column": ["id","name","age"]. If you want to write data to all the columns in the destination table, set this parameter to an asterisk (*), such as "column":["*"].

Yes

No default value

preSql

The SQL statement that you want to execute before the synchronization task is run. For example, you can set this parameter to the SQL statement that is used to delete outdated data. You can execute only one SQL statement on the codeless UI and multiple SQL statements in the code editor.

No

No default value

postSql

The SQL statement that you want to execute after the synchronization task is run. For example, you can set this parameter to the SQL statement that is used to add a timestamp. You can execute only one SQL statement on the codeless UI and multiple SQL statements in the code editor.

No

No default value

batchSize

The number of data records to write at a time. Set this parameter to an appropriate value based on your business requirements. This greatly reduces the interactions between Data Integration and Oracle and increases throughput. If you set this parameter to an excessively large value, an OOM error may occur during data synchronization.

No

1024