Overview of the full and incremental synchronization feature

DataWorks allows you to create different types of full and incremental synchronization tasks to synchronize data between different data sources in various data synchronization scenarios, such as real-time synchronization, batch full synchronization, and batch incremental synchronization. Full and incremental synchronization tasks help you migrate your business data to the cloud in an efficient and convenient manner.

Background information

In some business scenarios, data cannot be synchronized by using only one or more simple batch or real-time synchronization tasks. Instead, multiple batch synchronization tasks, real-time synchronization tasks, and data processing tasks are required to synchronize data. In this case, complex configurations are required.

To resolve this issue, DataWorks provides a scenario-specific synchronization solution that allows you to synchronize data between different types of data sources by using simple configurations. For example, you can create a one-click real-time synchronization task to easily synchronize data to Elasticsearch, Hologres, or MaxCompute. This simplifies data synchronization.

Note

For example, a large amount of data is stored in your database system, and you want to synchronize full and incremental data from your database to MaxCompute for analysis. The traditional data synchronization method allows you to perform full synchronization or perform incremental synchronization based on fields such as modify_time in database tables. However, these fields may not exist in database tables in an actual scenario. Therefore, you cannot use the Java Database Connectivity (JDBC) driver to extract data for incremental synchronization. To synchronize the full and incremental data to MaxCompute, you can create a one-click real-time synchronization task. After the synchronization is complete, the full and incremental data is automatically merged in MaxCompute. This simplifies data synchronization.

A full and incremental synchronization task has the following benefits:

Synchronizes full data at a time.
Synchronizes incremental data in real time.
Automatically merges incremental and full data on a regular basis and writes the merged data to the related partition in a table that is used to store full data.

Overview

The following figure shows the capabilities of the full and incremental synchronization feature. Architecture

Capability	Description

Capability	Description
Data synchronization from or to data sources that are deployed in complex network environments	The full and incremental synchronization feature supports data synchronization from or to Alibaba Cloud databases, data centers, self-managed data sources that are hosted on Elastic Compute Service (ECS) instances, and data sources that do not belong to Alibaba Cloud. You can select appropriate network connectivity solutions to establish network connections between your resource group and data sources based on the network environments in which the data sources are deployed. Before you configure a full and incremental synchronization task, you must make sure that network connections are established between your resource group for Data Integration and data sources. For more information about how to establish a network connection between a resource group and a data source, see Establish a network connection between a resource group and a data source.
Data synchronization scenarios	The full and incremental synchronization feature supports the synchronization of data from a single table to another single table, from tables in sharded databases to a single table, and from multiple tables in a database to multiple tables. DataWorks allows you to create the following types of full and incremental synchronization tasks: batch synchronization of all data in a database (one-time full synchronization, periodic full synchronization, one-time full synchronization and periodic incremental synchronization, one-time incremental synchronization, and periodic incremental synchronization), and one-click real-time synchronization (one-time full synchronization and real-time incremental synchronization). For more information, see Supported data source types and data synchronization solutions.
Configurations for full and incremental synchronization tasks	For information about how to configure a full and incremental synchronization task, see Configure a synchronization task in Data Integration. For information about the capabilities that you can use when you configure a full and incremental synchronization task, see the Capabilities that you can use when you configure a full and incremental synchronization task section in this topic.
O&M for full and incremental synchronization tasks	The full and incremental synchronization feature allows you to easily add tables to or remove tables from a full and incremental synchronization task that is run. For more information, see Manage a full and incremental synchronization task. Data Integration allows you to view the status and resource usage of a full and incremental synchronization task within a specified period of time. Data Integration also allows you to view the status distribution and data synchronization speed of the batch and real-time synchronization subtasks generated by the synchronization task, the amount of data and number of data records synchronized by the subtasks, and the latency of the real-time synchronization subtask that is running. For more information, see View the status overview of synchronization tasks. A full and incremental synchronization task can generate a real-time synchronization subtask or multiple batch synchronization subtasks based on the attributes that you configure for the synchronization task. You can configure monitoring and alerting rules for the subtasks. For information about how to configure alert rules for a real-time synchronization subtask, see O&M for real-time synchronization nodes. For information about how to configure alert rules for a batch synchronization subtask, see O&M for batch synchronization nodes.

Capabilities that you can use when you configure a full and incremental synchronization task

Capability	Description

Capability	Description
Refresh mappings	Click Refresh Mappings Between Source Tables and Destination Tables. Then, the system displays the mappings between source tables and destination tables. In the preceding figure, `hudi_b.tb_order_1` is an existing destination Hologres table, and `hudi_b.tb_order_2` and `hudi_b.tb_order_3` are destination Hologres tables that are automatically created.
View or modify the schema of a destination table	Find a mapping record and click the name of the destination table to open the dialog box that displays the schema of the table. In the dialog box, you can modify the schema of the table based on your business requirements. In the preceding figure, the `add_col` field is added to the automatically created destination Hologres table `hudi_b.tb_order_3`, the data type of the field is set to TEXT, and the description of the field is set to Add a field to an automatically created destination table. After the modification is complete, click Apply and Refresh Mapping to save the modifications. Important When you modify the schema of an automatically created destination table, you must take note of the following items about the fields that have the same names as fields in the source table: You cannot delete these fields or adjust their order. You cannot change the names of these fields. You can modify the configurations such as the data types and descriptions of these fields. You can also specify whether to use one of the fields as a primary key. In the preceding figure, the `add_col` field is added to the existing destination Hologres table `hudi_b.tb_order_1`, the data type of the field is set to TEXT, and the description of the field is set to Add a field to an existing destination table. After the modification is complete, click Apply and Refresh Mapping to save the modifications. Important When you modify the schema of an existing destination table, you must take note of the following items: For existing fields in the destination table, you can modify only the descriptions of the fields. You can add fields to the destination table.
Modify the schemas of multiple destination tables at a time	Select multiple mapping records and click Batch Modify Table Schema. In the dialog box that appears, you can modify the schemas of the destination tables at a time. After the modification is complete, click Apply and Refresh Mapping to save the modifications. Important After you click Batch Modify Table Schema, the system displays the fields contained in all the selected destination tables in the dialog box that appears. In the dialog box that appears, you can add fields. The preceding figure shows that the `batch_add_col` field is added to all the selected destination tables. Then, click the name of a destination table on which the batch operation is performed to view the new schema of the table.
Specify the name of a destination schema or table	By default, data is written to the destination schema and table that are named the same as the source database and table. If no such destination schema or table exists, the system automatically creates the schema or table in the destination. You can specify the name of the destination schema or table to which you want to write data. For more information, see Configure rules for mapping databases or tables. Note You can use a regular expression to map the names of the destination schemas or tables to the names of source schemas or tables. You can also use built-in variables to specify the names of the destination tables and add a prefix or a suffix to the names of the destination tables.
Assign values to the fields that are newly added to a destination table	By default, source fields specified in a full and incremental synchronization task are mapped to destination fields that are named the same as the source fields. The values of the source fields are written to the destination fields that are named the same as the source fields. You can add fields to a destination table and assign constants or variables to the fields as values. You can find the desired mapping record and click Edit in the Value Assignment for Destination Field column. In the dialog box that appears, the new schema of the destination table to which fields are added is displayed. The id and name fields in the hudi_b.tb_order_3 destination Hologres table are also contained in the mapped source table. In this case, the fields are automatically assigned the same values as the values of the mapped source fields, and the values assigned to the destination fields cannot be changed. `add_col` and `batch_add_col` are the fields that are newly added to the hudi_b.tb_order_3 destination Hologres table. You can assign values to the fields. Manually assign a value to a field: You can enter a constant string in the Assignment column of the add_col field. This way, the constant string is written to the add_col field during data synchronization. Use a variable as the value of a field: You can select a built-in variable that represents specific business meaning from the Assignment column of the batch_add_col field. This way, the value that is indicated by the variable is written to the batch_add_col field during data synchronization. Note Descriptions for the supported variables: DATASOURCE_NAME_SRC: the name of a source DB_NAME_SRC: the name of a source database TABLE_NAME_SRC: the name of a source table DATASOURCE_NAME_DEST: the name of a destination DB_NAME_DEST: the name of a destination database TABLE_NAME_DEST: the name of a destination table DB_NAME_SRC_TRANSED: the name of a source database after conversion EXECUTE_TIME: the time when the event for a change operation on a source occurs
Configure rules to process DDL or DML messages	When you configure a full and incremental synchronization task, you can configure rules to process different types of DDL messages for destination tables. For information about the DDL and DML operations supported by different types of destinations, see Supported DML and DDL operations. Note When you configure a batch synchronization task used to synchronize all data in a database, you cannot configure rules to process DDL or DML messages. DataWorks allows you to configure a logical deletion rule for a destination table to process DML messages for deletion operations. After you configure such a rule for a destination table, the system automatically adds the `_data_integration_deleted_` field to the table and sets the data type of the field to BOOLEAN. If a `deletion` operation is performed on a source table in a synchronization task used to synchronize data to Hologres, Hologres automatically converts the original SQL statement that is used to write data to the mapped destination Hologres table to `update tb_xxx set _data_integration_deleted_=true where id=xxx` and executes the new statement to write data to the mapped destination Hologres table. Logical deletion is implemented during the data write process. After you configure a logical deletion rule for a destination Hologres table, you can view the schema of the table.

Background information

Overview

Capabilities that you can use when you configure a full and incremental synchronization task

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

China Gateway Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic Desktop Service (EDS) Featured

Cloud Phone Beta

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)