DataWorks provides Oracle Reader and Oracle Writer for you to read data from and write data to Oracle data sources. You can use the codeless user interface (UI) or code editor to configure data synchronization tasks for Oracle data sources. This topic describes the capabilities of synchronizing data from or to Oracle data sources.
Supported Oracle versions
Batch data read and write
Version | Oracle Reader for batch data read | Oracle Writer for batch data write |
Oracle 11.2 or 11gR2 | Supported | Supported |
Oracle 12.1 or 12cR1 | Supported | Supported |
Oracle 12.2 or 12cR2 | Supported (Not supported by features of Oracle of later versions) | Supported (Not supported by features of Oracle of later versions) |
Oracle 18.3 | Supported (Not supported by features of Oracle of later versions) | Supported (Not supported by features of Oracle of later versions) |
Oracle 19.x | Supported (Not supported by features of Oracle of later versions) | Supported (Not supported by features of Oracle of later versions) |
Oracle 21.1 | Supported (Not supported by features of Oracle of later versions) | Supported (Not supported by features of Oracle of later versions) |
Oracle Writer uses the ojdbc6-12.1.1.jar driver.
Real-time data read
Supported Oracle database versions:
12c non cdb
,18c non cdb
,19c non cdb
,19c cdb
, andversions later than 11g R2
Unsupported Oracle database versions:
12c cdb
and18c cdb
An Oracle database of Oracle 12c or a later version can serve as a container database (CDB) to host multiple pluggable databases (PDBs).
Limits
The following encoding formats are supported for data synchronization: UTF-8, AL32UTF8, AL16UTF16, and ZHS16GBK.
A maximum of 500 GB incremental data can be synchronized from an Oracle instance in real time every day.
Real-time synchronization of data from an Oracle data source in Data Integration is implemented based on the Oracle LogMiner utility that is used to analyze log files. You can enable the supplemental logging feature for a primary or secondary database only in a primary Oracle database.
Data of views can be read during batch synchronization.
Data types
Data type | Oracle Reader for batch data read | Oracle Writer for batch data write | Real-time data read |
NUMBER | Supported | Supported | Supported |
BINARY FLOAT | Supported | Supported | Supported |
BINARY DOUBLE | Supported | Supported | Supported |
CHAR | Supported | Supported | Supported |
NCHAR | Supported | Supported | Supported |
VARCHAR2 | Supported | Supported | Supported |
NVARCHAR2 | Supported | Supported | Supported |
DATE | Supported | Supported | Supported |
TIMESTAMP | Supported | Supported | Supported |
TIMESTAMP WITH TIME ZONE | Supported | Supported | Not supported |
TIMESTAMP WITH LOCAL TIME ZONE | Supported | Supported | Not supported |
CLOB | Supported | Supported | Supported |
BLOB | Supported | Supported | Supported |
RAW | Supported | Supported | Supported |
ROWID | Not supported | Not supported | Supported |
UROWID | Not supported | Not supported | Supported |
FLOAT | Supported | Supported | Supported |
INTERVAL DAY TO SECOND | Not supported | Not supported | Supported |
INTERVAL YEAR TO MONTH | Not supported | Not supported | Supported |
BFILE | Not supported | Not supported | Not supported |
LONG | Not supported | Not supported | Not supported |
LONG RAW | Not supported | Not supported | Not supported |
NCLOB | Supported | Supported | Not supported |
STRUCT | Supported | Supported | Not supported |
User-Defined Types | Not supported | Not supported | Not supported |
AnyType | Not supported | Not supported | Not supported |
AnyData | Not supported | Not supported | Not supported |
AnyDataSet | Not supported | Not supported | Not supported |
XmlType | Not supported | Not supported | Not supported |
Spatial Types | Not supported | Not supported | Not supported |
Media Types | Not supported | Not supported | Not supported |
The following table lists the data type mappings based on which Oracle Reader converts data types.
Category | Oracle data type |
Integer | NUMBER, ROWID, INTEGER, INT, and SMALLINT |
Floating point | NUMERIC, DECIMAL, FLOAT, DOUBLE PRECISION, and REAL |
String | LONG, CHAR, NCHAR, VARCHAR, VARCHAR2, NVARCHAR2, CLOB, NCLOB, CHARACTER, CHARACTER VARYING, CHAR VARYING, NATIONAL CHARACTER, NATIONAL CHAR, NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, and NCHAR VARYING |
Date and time | TIMESTAMP and DATE |
Boolean | BIT and BOOLEAN |
Binary | BLOB, BFILE, RAW, and LONG RAW |
Prepare an Oracle environment before data synchronization
Before you use DataWorks to synchronize data from or to an Oracle data source, you must prepare an Oracle environment. This ensures that a data synchronization task can be configured and can synchronize data from or to the Oracle data source as expected. The following information describes how to prepare an Oracle environment for data synchronization from or to an Oracle data source.
Preparation 1: Check the version of your Oracle database
The supported Oracle database versions vary based on synchronization scenarios. Check whether the version of your Oracle database is supported before you synchronize data from or to the database.
You can execute one of the following statements to check the version of your Oracle database:
Statement 1:
select * from v$version;
Statement 2:
select version from v$instance;
If the version of the Oracle database is
12c
or18c
, you must execute the following statement to check whether the database can serve as aCDB
. If the Oracle database can serve as aCDB
, this Oracle database is not supported by real-time synchronization tasks in DataWorks.select name,cdb,open_mode,con_id from v$database;
Preparation 2: Prepare an account that has the required permissions
You must prepare an account that is used to log on to the Oracle database. The account must have the required permissions on the Oracle database.
Create an account. For more information, see Create an account to access an Oracle database.
Grant the required permissions to the account.
You can execute the following statements to grant the required permissions to the account. Replace
Account for data synchronization
with the created account when you execute a specific statement.grant create session to 'Account for data synchronization'; // Authorize the synchronization account to access the database. grant connect to 'Account for data synchronization'; // Authorize the synchronization account to connect to the database. grant select on nls_database_parameters to 'Account for data synchronization'; // Authorize the synchronization account to query the settings of nls_database_parameters. grant select on all_users to 'Account for data synchronization'; // Authorize the synchronization account to query all users in the database. grant select on all_objects to 'Account for data synchronization'; // Authorize the synchronization account to query all objects in the database. grant select on DBA_MVIEWS to 'Account for data synchronization'; // Authorize the synchronization account to check the materialized view of the database. grant select on DBA_MVIEW_LOGS to 'Account for data synchronization'; // Authorize the synchronization account to view the materialized view logs of the database. grant select on DBA_CONSTRAINTS to 'Account for data synchronization'; // Authorize the synchronization account to view the constraints on all tables in the database. grant select on DBA_CONS_COLUMNS to 'Account for data synchronization'; // Authorize the synchronization account to view information about the columns that meet the specified constraints on all tables in the database. grant select on all_tab_cols to 'Account for data synchronization'; // Authorize the synchronization account to view information about columns in tables, views, and clusters of the database. grant select on sys.obj$ to 'Account for data synchronization'; // Authorize the synchronization account to view objects in the database. sys.obj$ indicates an object table that is contained in the data dictionary table. The object table contains all objects. grant select on SYS.COL$ to 'Account for data synchronization'; // Authorize the synchronization account to view definitions of columns in tables of the database. SYS.COL$ stores definitions of columns in tables. grant select on sys.USER$ to 'Account for data synchronization'; // Authorize the synchronization account to view the system table of the database. sys.USER$ indicates a default user session service. grant select on sys.cdef$ to 'Account for data synchronization'; // Authorize the synchronization account to view the system table of the database. grant select on sys.con$ to 'Account for data synchronization'; // Authorize the synchronization account to view the constraints of the database. sys.con$ records the constraints. grant select on all_indexes to 'Account for data synchronization'; // Authorize the synchronization account to view all indexes of the database. grant select on v_$database to 'Account for data synchronization'; // Authorize the synchronization account to check the v_$database view of the database. grant select on V_$ARCHIVE_DEST to 'Account for data synchronization'; // Authorize the synchronization account to check the V_$ARCHIVE_DEST view of the database. grant select on v_$log to 'Account for data synchronization'; // Authorize the synchronization account to check the v_$log view of the database. v_$log displays log information about control files. grant select on v_$logfile to 'Account for data synchronization'; // Authorize the synchronization account to check the v_$logfile view of the database. v_$logfile contains information about redo log files. grant select on v_$archived_log to 'Account for data synchronization'; // Authorize the synchronization account to check the v$archived_log view of the database. v$archived_log contains information about archived logs. grant select on V_$LOGMNR_CONTENTS to 'Account for data synchronization'; // Authorize the synchronization account to check the V_$LOGMNR_CONTENTS view of the database. grant select on DUAL to 'Account for data synchronization'; // Authorize the synchronization account to view the DUAL table of the database. DUAL is a virtual table that contains SELECT syntax rules. In Oracle, only one record is retained in the DUAL table. grant select on v_$parameter to 'Account for data synchronization'; // Authorize the synchronization account to check the v_$parameter view of the database. v$parameter is a dynamic dictionary table that stores the values of parameters in the database. grant select any transaction to 'Account for data synchronization'; // Authorize the synchronization account to view transactions of the database. grant execute on SYS.DBMS_LOGMNR to 'Account for data synchronization'; // Authorize the synchronization account to use the LOGMNR tool. The LOGMNR tool helps you analyze transactions and retrieve lost data. grant alter session to 'Account for data synchronization'; // Authorize the synchronization account to modify connection settings of the database. grant select on dba_objects to 'Account for data synchronization'; // Authorize the synchronization account to view all objects in the database. grant select on v_$standby_log to 'Account for data synchronization'; // Authorize the synchronization account to check the v_$standby_log view of the database. v_$standby_log contains archived logs of the secondary database. grant select on v_$ARCHIVE_GAP to 'Account for data synchronization'; // Authorize the synchronization account to query missing archived logs.
To implement batch synchronization of full data, you must also execute the following statement to grant the query permissions on all tables to the synchronization account:
grant select any table to 'Account for data synchronization';
For an Oracle database of Oracle 12c or a later version, you must execute the following statement to grant the log mining permissions to the synchronization account. The log mining feature is a built-in feature for Oracle databases of versions earlier than 12c. You do not need to execute the following statement in such databases.
grant LOGMINING TO 'Account for data synchronization';
Preparation 3: Enable archive logging and supplemental logging, and switch to another redo log file
Execute the following SQL statements to enable archive logging:
shutdown immediate; startup mount; alter database archivelog; alter database open;
Enable supplemental logging.
SQL statements:
alter database add supplemental log data(primary key) columns; // Enable supplemental logging for primary key columns. alter database add supplemental log data(unique) columns; // Enable supplemental logging for unique index columns.
Switch to another redo log file.
After you enable supplemental logging, you must execute the following statement multiple times to switch to another redo log file. We recommend that you execute the following statement for five times:
alter system switch logfile;
NoteThis ensures that data can be written to the next log file after the current log file is full. Records of performed operations will not be lost. This facilitates data recovery.
Preparation 4: Check character encoding formats of your Oracle database
Execute the following statement to check character encoding formats of the database:
select * from v$nls_parameters where PARAMETER IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');
v$nls_parameters stores values of parameters in the database.
NLS_CHARACTERSET indicates a database character set. NLS_NCHAR_CHARACTERSET indicates a national character set. These two sets are used to store data of the respective character type.
The following encoding formats are supported for data synchronization: UTF-8, AL32UTF8, AL16UTF16, and ZHS16GBK. If the database contains the character encoding formats that are not supported for data synchronization, change the formats before you synchronize data.
Preparation 5: Check the data types of tables in your Oracle database
You can execute the SELECT statement to query the data types of tables in the database. Sample statement that is executed to query the data types of the 'tablename'
table:
select COLUMN_NAME,DATA_TYPE from all_tab_columns where TABLE_NAME='tablename';
COLUMN_NAME: the name of the column.
DATA_TYPE: the data type of the column.
all_tab_columns: the view that stores information about all columns in tables of the database.
TABLE_NAME: the name of the table that you want to query. When you execute the preceding statement, replace 'tablename' with the name of the table that you want to query.
You can also execute the select * from 'tablename';
statement to query the information about the desired table and obtain the data types of columns in the table.
Add a data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Add and manage data sources. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.
Develop a data synchronization task
For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.
Configure a batch synchronization task to synchronize data of a single table
For more information about the configuration procedure, see Configure a batch synchronization task by using the codeless UI and Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix: Code and parameters.
Configure a real-time synchronization task to synchronize data of a single table
For more information about the configuration procedure, see Configure a real-time synchronization task in DataStudio.
Configure synchronization settings to implement batch synchronization of all data in a database, real-time synchronization of full data or incremental data in a database, and real-time synchronization of data from sharded tables in a sharded database
For more information about the configuration procedure, see Configure a synchronization task in Data Integration.
Additional information
Data synchronization between primary and secondary databases
A secondary Oracle database can be deployed for disaster recovery. The secondary database continuously synchronizes data from the primary database based on binary logs. Data latency between the primary and secondary databases cannot be prevented. This may result in data inconsistency.
Data consistency control
Oracle is a relational database management system (RDBMS) that supports strong consistency for data queries. A database snapshot is created before a synchronization task starts. Oracle Reader reads data from the database snapshot. Therefore, if new data is written to the database during data synchronization, Oracle Reader cannot obtain the new data.
Data consistency cannot be ensured if you enable if you enable Oracle Reader to use parallel threads to read data in a synchronization task.
Oracle Reader shards the source table based on the value of the splitPk parameter and uses parallel threads to read data. These parallel threads belong to different transactions and read data at different points in time. Therefore, the parallel threads observe different snapshots.
Data inconsistencies cannot be prevented if parallel threads are used for a synchronization task. The following workarounds can be used:
Enable Oracle Reader to use a single thread to read data in a synchronization task. This indicates that you do not need to specify a shard key for Oracle Reader. This way, data consistency is ensured, but data is synchronized at low efficiency.
Make sure that no data is written to the source table during data synchronization. This ensures that the data in the source table remains unchanged during data synchronization. For example, you can lock the source table or disable data synchronization between primary and secondary databases. This way, data is efficiently synchronized, but your ongoing services may be interrupted.
Character encoding
Oracle Reader uses Java Database Connectivity (JDBC) to read data. This enables Oracle Reader to automatically convert the encoding format of characters. Therefore, you do not need to specify the encoding format.
Incremental data synchronization
Oracle Reader connects to a database by using JDBC and uses a
SELECT statement with a WHERE clause
to read incremental data.For batch data, incremental add, update, and delete operations (including logically delete operations) are distinguished by timestamps. Specify the WHERE clause based on a specific timestamp. The time indicated by the timestamp must be later than the time indicated by the latest timestamp in the previous synchronization.
For streaming data, specify the WHERE clause based on the ID of a specific record. The ID must be greater than the maximum ID involved in the previous synchronization.
If the data that is added or modified cannot be distinguished, Oracle Reader can read only full data.
Syntax validation
Oracle Reader allows you to specify custom SELECT statements by using the querySql parameter but does not verify the syntax of these statements.
Appendix: Code and parameters
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.