The Oracle data source lets you read data from and write data to Oracle. You can configure data synchronization tasks using the codeless UI or the code editor. This topic describes the capabilities of DataWorks for Oracle data synchronization.
Supported versions
Offline read and write
Version | Offline read (Oracle Reader) | Offline write (Oracle Writer) |
Oracle 11.2 or 11gR2 | Supported | Support |
Oracle 12.1 or 12cR1 | Support | Support |
Oracle 12.2 or 12cR2 | Supported (New features are not supported) | Supported (New features are not supported) |
Oracle 18.3 | Supported (New features are not supported) | Supported (New features are not supported) |
Oracle 19.x | Supported (New features are not supported) | Supported (New features are not supported) |
Oracle 21.1 | Supported (New features are not supported) | Supported (New features are not supported) |
The Oracle Writer plugin uses the ojdbc6-12.1.1.jar driver.
Real-time read
Supported:
Databases with versions later than
11g R2,12c non cdb,18c non cdb,19c non cdb, or19c cdb.Not supported:
12c cdband18c cdbdatabases.
A container database (CDB) is a feature introduced in Oracle Database 12c and later versions. A CDB can contain multiple pluggable databases (PDBs).
Limits
The NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET character sets of the Oracle instance must be one of the following: AL32UTF8, AL16UTF16, ZHS16GBK, or UTF8.
A single Oracle instance supports real-time synchronization of up to 500 GB of incremental data per day.
When DataWorks reads data from a secondary Oracle database in real time, only physical secondary databases are supported. DataWorks pulls only archived logs. Therefore, the latency is higher than synchronization from a primary database. The latency depends on the generation speed of the archived logs.
Real-time synchronization does not support views. Offline synchronization supports reading data from views.
Real-time data reading from Oracle in DataWorks is based on the Oracle LogMiner log analysis tool. To use this feature, you must enable supplemental logging and archived logs for the primary or secondary database in the primary Oracle database.
Index-organized tables cannot be synchronized.
Tables in which all columns are of a Large Object (LOB) type, such as BLOB, CLOB, or NCLOB, are not supported.
Chinese characters are not supported in Oracle table names and column names.
Oracle table names cannot exceed 30 bytes.
Supported field types
Field type | Offline read (Oracle Reader) | Offline write (Oracle Writer) | Real-time read |
NUMBER | Support | Support | Support |
BINARY FLOAT | Supported | Supported | Support |
BINARY DOUBLE | Support | Support | Supported |
CHAR | Support | Support | Supported |
NCHAR | Support | Support | Support |
VARCHAR2 | Support | Support | Support |
NVARCHAR2 | Support | Supported | Support |
DATE | Support | Support | Supported |
TIMESTAMP | Supported | Support | Support |
TIMESTAMP WITH TIME ZONE | Support | Support | Not supported |
TIMESTAMP WITH LOCAL TIME ZONE | Support | Supported | Not supported |
CLOB | Support | Support | Supported |
BLOB | Supported | Support | Supported |
RAW | Support | Support | Support |
ROWID | Not supported | Not supported | Support |
UROWID | Not supported | Not supported | Support |
FLOAT | Support | Support | Supported |
INTERVAL DAY TO SECOND | Not supported | Not supported | Support |
INTERVAL YEAR TO MONTH | Not supported | Not supported | Support |
BFILE | Not supported | Not supported | Not supported |
LONG | Not supported | Not supported | Not supported |
LONG RAW | Not supported | Not supported | Not supported |
NCLOB | Support | Support | Not supported |
STRUCT | Support | Supported | Not supported |
User-Defined Types | Not supported | Not supported | Not supported |
AnyType | Not supported | Not supported | Not supported |
AnyData | Not supported | Not supported | Not supported |
AnyDataSet | Not supported | Not supported | Not supported |
XmlType | Not supported | Not supported | Not supported |
Spatial Types | Not supported | Not supported | Not supported |
Media Types | Not supported | Not supported | Not supported |
The following table lists the Oracle data types and their corresponding classifications in Oracle Reader.
Type classification | Oracle data type |
Integer | NUMBER, RAWID, INTEGER, INT, and SMALLINT |
Floating-point | NUMERIC, DECIMAL, FLOAT, DOUBLE PRECISIOON, and REAL |
String | LONG, CHAR, NCHAR, VARCHAR, VARCHAR2, NVARCHAR2, CLOB, NCLOB, CHARACTER, CHARACTER VARYING, CHAR VARYING, NATIONAL CHARACTER, NATIONAL CHAR, NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, and NCHAR VARYING |
Date and time | TIMESTAMP and DATE |
Boolean | BIT and BOOL |
Binary class | BLOB, BFILE, RAW, and LONG RAW |
Prerequisites: Prepare the Oracle environment
Before you can use DataWorks to synchronize data, you must prepare the Oracle environment as described in this topic. This preparation ensures that the Oracle data synchronization tasks can be configured and run properly in DataWorks. The following sections describe how to prepare the Oracle environment.
Confirm the Oracle database version
The supported Oracle database versions vary based on the synchronization scenario. Before you synchronize data, check whether your Oracle database version is supported.
You can run either of the following statements to view the version of your Oracle database.
Statement 1:
select * from v$version;Statement 2:
select version from v$instance;
If the database version is
12cor18c, run the following statement to check whether it is acdbdatabase. Real-time sync tasks in Data Integration do not support12cand18ccdbOracle databases.select name,cdb,open_mode,con_id from v$database;
Create an account and configure permissions
You need to plan a database logon account for subsequent operations. This account requires the necessary Oracle permissions.
Create an account. For more information, see Create an Oracle account.
Configure permissions.
You can use the following commands to grant permissions to the account. When you run the statements, replace
'sync_account'with the account that you created.grant create session to 'sync_account'; // Grants the logon permission to the sync account. grant connect to 'sync_account'; // Grants the database connection permission to the sync account. grant select on nls_database_parameters to 'sync_account'; // Grants the permission to query the nls_database_parameters system configuration. grant select on all_users to 'sync_account'; // Grants the permission to query all users in the database. grant select on all_objects to 'sync_account'; // Grants the permission to query all objects in the database. grant select on DBA_MVIEWS to 'sync_account'; // Grants the permission to view materialized views in the database. grant select on DBA_MVIEW_LOGS to 'sync_account'; // Grants the permission to view materialized view logs in the database. grant select on DBA_CONSTRAINTS to 'sync_account'; // Grants the permission to view the constraint information of all tables. grant select on DBA_CONS_COLUMNS to 'sync_account'; // Grants the permission to view information about all columns in the specified constraints of all tables. grant select on all_tab_cols to 'sync_account'; // Grants the permission to view information about columns in tables, views, and clusters. grant select on sys.obj$ to 'sync_account'; // Grants the permission to view objects in the database. The sys.obj$ table is the base object table in the Oracle data dictionary and stores all Oracle objects. grant select on SYS.COL$ to 'sync_account'; // Grants the permission to view column definition information. SYS.COL$ stores column definitions. grant select on sys.USER$ to 'sync_account'; // Grants the permission to view system tables. sys.USER$ is the default service for user sessions. grant select on sys.cdef$ to 'sync_account'; // Grants the permission to view system tables. grant select on sys.con$ to 'sync_account'; // Grants the permission to view constraint information. sys.con$ records Oracle constraint information. grant select on all_indexes to 'sync_account'; // Grants the permission to view all indexes in the database. grant select on v_$database to 'sync_account'; // Grants the permission to view the v_$database view. grant select on V_$ARCHIVE_DEST to 'sync_account'; // Grants the permission to view the V_$ARCHIVE_DEST view. grant select on v_$log to 'sync_account'; // Grants the permission to view the v_$log view. v_$log displays log file information from the control file. grant select on v_$logfile to 'sync_account'; // Grants the permission to view the v_$logfile view. v_$logfile contains information about redo log files. grant select on v_$archived_log to 'sync_account'; // Grants the permission to view the v$archived_log view. v$archived_log contains information about archived logs. grant select on V_$LOGMNR_CONTENTS to 'sync_account'; // Grants the permission to view the V_$LOGMNR_CONTENTS view. grant select on DUAL to 'sync_account'; // Grants the permission to view the DUAL table. DUAL is a virtual table used to conform to the SELECT syntax rules. In Oracle, DUAL contains only one record. grant select on v_$parameter to 'sync_account'; // Grants the permission to view the v_$parameter view. v$parameter is a dynamic dictionary table that stores database parameter settings. grant select any transaction to 'sync_account'; // Grants the permission to view any transaction. grant execute on SYS.DBMS_LOGMNR to 'sync_account'; // Grants the permission to use the Logmnr tool. The Logmnr tool helps you analyze transactions and recover lost data. grant alter session to 'sync_account'; // Grants the permission to alter database sessions. grant select on dba_objects to 'sync_account'; // Grants the permission to view all objects in the database. grant select on v_$standby_log to 'sync_account'; // Grants the permission to view the v_$standby_log view. v_$standby_log contains archived logs of the secondary database. grant select on v_$ARCHIVE_GAP to 'sync_account'; // Grants the permission to query for missing archived logs. grant select on sys.ICOL$ to 'sync_account'; // Grants the permission to view objects in the database. The sys.obj$ table is the base object table in the Oracle data dictionary and stores all Oracle objects. grant select on V$THREAD to 'sync_account'; // Grants the permission to view instance thread information. The V$THREAD view is a thread state table in Oracle's dynamic performance views. It stores the thread status of each instance in a RAC cluster and is used to determine the instance status and log source during incremental parsing.To perform offline full data synchronization, you must also run the following command to grant the query permission on all tables to the sync account.
grant select any table to 'sync_account';For Oracle 12c and later versions, run the following command to grant the log mining permission to the sync account. For versions earlier than Oracle 12c, this command is not required because the log mining feature is built-in.
grant LOGMINING TO 'sync_account';
Enable archived logs and supplemental logging, and switch redo log files
Enable archived logs. The SQL statements are as follows.
shutdown immediate; startup mount; alter database archivelog; alter database open;Enable supplemental logging.
To enable the appropriate supplemental logging, use the following SQL statement.
// For real-time synchronization to capture complete data changes, you must enable supplemental logging. You can enable it at the database level. The SQL statements are as follows: alter database add supplemental log data(primary key) columns; // Enables supplemental logging for primary key columns of the database. alter database add supplemental log data(unique) columns; // Enables supplemental logging for unique index columns of the database. // You can also enable it at the table level: alter schema.table add supplemental log data(primary key) columns; // Enables supplemental logging for primary key columns of a specific table. alter schema.table add supplemental log data(unique) columns; // Enables supplemental logging for unique index columns of a specific table.Switch redo log files.
After you enable supplemental logging, run the following command multiple times (five times is recommended) to switch the redo log files.
alter system switch logfile;NoteRunning the command multiple times ensures that the system switches to the next log file when the current log file is full. This process prevents the loss of operation records and helps with subsequent data restoration.
Check the database character encoding
Run the following command in your database to check its character encoding.
select * from v$nls_parameters where PARAMETER IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');v$nls_parameters stores the settings of database parameters.
NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET are the database character set and the national character set, respectively. They indicate the storage class for the two main types of character data in Oracle.
Data Integration supports only the UTF8, AL32UTF8, AL16UTF16, and ZHS16GBK encoding formats for data synchronization. If your database uses an unsupported character encoding, modify it before you synchronize data.
Check the data types of database tables
You can use a SELECT statement to query the data types in a database table. The following example shows how to view the data types in the 'tablename' table.
select COLUMN_NAME,DATA_TYPE from all_tab_columns where TABLE_NAME='tablename'; COLUMN_NAME: The name of the table column.
DATA_TYPE: The data type of the corresponding column.
all_tab_columns: A view that stores information about all columns in the database tables.
TABLE_NAME: The name of the target table to query. When you run the statement, replace 'tablename' with the actual name of your table.
You can also run select * from 'tablename'; to query all information about the target table and retrieve the data types.
Add a data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data Source Management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.
Develop a data synchronization task: Oracle synchronization process guide
For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.
Guide to configuring an offline sync task for a single table
For more information about the procedure, see Configure a task in the codeless UI and Configure a task in the code editor.
For all parameters and a script demo for the code editor, see Appendix: Script demo and parameter descriptions.
Guide to configuring a real-time sync task for a single table
For more information about the procedure, see Configure a real-time sync task in DataStudio.
Guide to configuring whole-database synchronization
For more information about the procedure, see Configure a real-time sync task for an entire database.
FAQ
Real-time sync tasks for Oracle, PolarDB, and MySQL report repeated errors
Data restoration for active/standby synchronization
Active/standby synchronization issues can occur when you use an active/standby disaster recovery architecture for Oracle. If the primary database fails and the system switches to the secondary database, the secondary database continuously recovers data from the primary database using binlogs. Due to the time lag in data synchronization between the primary and secondary databases, network latency and other conditions can cause significant discrepancies between the data recovered by the secondary database and the data in the primary database. As a result, the data synchronized from the secondary database is not a complete snapshot of the data at that point in time.
Consistency constraints
Oracle is a relational database management system (RDBMS) that provides strongly consistent data query interfaces. For example, during the execution of a sync task, if another data writer writes data to the database, Oracle Reader does not retrieve the new data because of the snapshot feature of the database.
The consistency described above is achieved when Oracle Reader uses a single-threaded model for data synchronization. If Oracle Reader uses concurrent tasks to pull data based on your configurations, strong data consistency cannot be guaranteed.
When Oracle Reader splits data based on splitPk, it starts multiple concurrent tasks to synchronize the data. These tasks do not belong to the same read transaction and have time intervals between them. Therefore, the synchronized data is not a complete and consistent data snapshot.
A technical solution for consistent snapshots in a multi-threaded environment is not currently available. You can address this issue only from an engineering perspective. Engineering solutions involve trade-offs. The following are possible solutions. You can choose one based on your needs.
Use single-threaded synchronization. This means that data is not split into chunks. The disadvantage is that the speed is slow, but it ensures consistency.
Disable other data writers to ensure that the data is static. For example, you can lock tables or disable synchronization to the secondary database. The disadvantage is that this may affect online services.
Database encoding issues
Oracle Reader uses Java Database Connectivity (JDBC) to pull data. JDBC is compatible with various encodings and performs encoding conversion at the underlying layer. Therefore, you do not need to specify an encoding for Oracle Reader. It can automatically obtain and transcode the encoding.
Methods for incremental data synchronization
Oracle Reader uses
SELECT…WHERE…statements to pull data. Therefore, you can use these statements to perform incremental data synchronization in the following ways:When an online application writes data to the database, it populates a `modify` field with a timestamp for changes, such as additions, updates, and logical deletions. For these applications, Oracle Reader only needs to add a `WHERE` clause with the timestamp of the last synchronization.
For new, stream-like data, Oracle Reader can add a `WHERE` clause with the maximum auto-increment ID from the previous stage.
If your business logic does not use a field to distinguish between new and modified data, Oracle Reader cannot perform incremental data synchronization. It can only synchronize the full data.
SQL security
Oracle Reader provides the querySql feature, which lets you customize SELECT statements for data pulling. Oracle Reader does not perform any security checks on the querySql.
Appendix: Script demo and parameter descriptions
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a task in the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Reader script demo
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "oracle",
"parameter": {
"selectedDatabase": "AUTOTEST",
"indexes": [],
"datasource": "oracle_test",
"envType": 0,
"useSpecialSecret": true,
"column": [
"id"
],
"where": "",
"splitPk": "id",
"encoding": "UTF-8",
"table": "AUTOTEST.table01"
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "odps",
"parameter": {
},
"name": "Writer",
"category": "writer"
},
{
"name": "Processor",
"stepType": null,
"category": "processor",
"copies": 1,
"parameter": {
"nodes": [],
"edges": [],
"groups": [],
"version": "2.0"
}
}
],
"setting": {
"executeMode": null,
"errorLimit": {
"record": ""
},
"speed": {
"concurrent": 2,
"throttle": false
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}Reader script parameters
Parameter | Description | Required | Default value |
datasource | The name of the data source. The code editor supports adding data sources. The value of this parameter must be the same as the name of the added data source. | Yes | None |
selectedDatabase | The schema of the database to be synchronized. | Yes | None |
table | The name of the table to be synchronized. The format must be Note For example, if selectedDatabase is | Yes | None |
column | The set of columns to be synchronized. Use a JSON array to describe the field information. By default, all columns are used, for example, ["*"].
| Yes | None |
splitFactor | The splitting factor. You can configure the number of chunks to split the data into for synchronization. If you configure multiple concurrent tasks, the data is split into concurrency × splitFactor chunks. For example, if concurrency is 5 and splitFactor is 5, the data is split into 5 × 5 = 25 chunks and processed by five concurrent threads. Note The recommended value range is 1 to 100. A value that is too large may cause an out-of-memory (OOM) error. | No | 5 |
splitMode | The splitting mode. Valid values:
Note The splitMode parameter must be used with the splitPk parameter.
| No | randomSampling |
splitPk | When Oracle Reader extracts data, if you specify splitPk, it indicates that you want to use the field specified by splitPk for data sharding. As a result, data synchronization starts concurrent tasks, which improves the efficiency of data synchronization.
Note The splitPK field cannot be ROWID for views. | No | None |
where | The filter condition. Oracle Reader constructs a SQL statement based on the specified column, table, and where clause, and then pulls data based on that SQL statement. For example, you can specify row_number() as the where condition during testing.
| No | None |
querySql (Advanced mode, not supported in the codeless UI) | In some business scenarios, the where clause is not sufficient to describe the filter conditions. You can use this parameter to customize the filter SQL. When you configure this parameter, the data synchronization system ignores configurations such as table and column and uses this SQL to filter data. For example, to join multiple tables before synchronization, use | No | None |
fetchSize | This parameter defines the number of records to fetch in each batch from the database server. This value determines the number of network interactions between the data synchronization system and the server, which can significantly improve data pulling performance. Note A fetchSize value that is too large (>2,048) may cause an OOM error in the data synchronization process. | No | 1,024 |
Writer script demo
{
"type":"job",
"version":"2.0",// Version number.
"steps":[
{
"stepType":"stream",
"parameter":{},
"name":"Reader",
"category":"reader"
},
{
"stepType":"oracle",// Plugin name.
"parameter":{
"postSql":[],// SQL statement to execute after the data synchronization task.
"datasource":"",
"session":[],// Database connection session parameters.
"column":[// Fields.
"id",
"name"
],
"encoding":"UTF-8",// Encoding format.
"batchSize":1024,// The number of records in a batch.
"table":"",// Table name.
"preSql":[]// SQL statement to execute before the data synchronization task.
},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"// Number of error records.
},
"speed":{
"throttle":true,// If throttle is false, mbps is ignored and throttling is disabled. If throttle is true, throttling is enabled.
"concurrent":1, // Job concurrency.
"mbps":"12"// Throttling rate. 1 mbps = 1 MB/s.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}Writer script parameters
Parameter | Description | Required | Default value |
datasource | The name of the data source. The code editor supports adding data sources. The value of this parameter must be the same as the name of the added data source. | Yes | None |
table | The name of the destination table. If the schema of the table is different from the username configured above, use the schema.table format for the table information. | Yes | None |
writeMode | The import mode. Only insert into is supported. If a primary key or unique index conflict occurs, the conflicting rows are not written and are treated as dirty data. | No | insert into |
column | The fields in the destination table to which data is to be written. Separate field names with commas. For example, | Yes | None |
preSql | The SQL statement to execute before the data synchronization task starts. The codeless UI supports only one SQL statement. The code editor supports multiple SQL statements, for example, to clear old data. | No | None |
postSql | The SQL statement to execute after the data synchronization task is complete. The codeless UI supports only one SQL statement. The code editor supports multiple SQL statements, for example, to add a timestamp. | No | None |
batchSize | The number of records in a batch. A larger value can significantly reduce the number of network interactions between the data synchronization system and Oracle, and improve overall throughput. If this value is set too high, it may cause an OOM error in the data synchronization process. | No | 1,024 |