DataWorks allows you to create different types of full and incremental synchronization tasks to synchronize data between different data sources in various data synchronization scenarios, such as real-time synchronization, batch full synchronization, and batch incremental synchronization. Full and incremental synchronization tasks help you migrate your business data to the cloud in an efficient and convenient manner.
Background information
In some business scenarios, data cannot be synchronized by using only one or more simple batch or real-time synchronization tasks. Instead, multiple batch synchronization tasks, real-time synchronization tasks, and data processing tasks are required to synchronize data. In this case, complex configurations are required.
To resolve this issue, DataWorks provides a scenario-specific synchronization solution that allows you to synchronize data between different types of data sources by using simple configurations. For example, you can create a one-click real-time synchronization task to easily synchronize data to Elasticsearch, Hologres, or MaxCompute. This simplifies data synchronization.
For example, a large amount of data is stored in your database system, and you want to synchronize full and incremental data from your database to MaxCompute for analysis. The traditional data synchronization method allows you to perform full synchronization or perform incremental synchronization based on fields such as modify_time in database tables. However, these fields may not exist in database tables in an actual scenario. Therefore, you cannot use the Java Database Connectivity (JDBC) driver to extract data for incremental synchronization. To synchronize the full and incremental data to MaxCompute, you can create a one-click real-time synchronization task. After the synchronization is complete, the full and incremental data is automatically merged in MaxCompute. This simplifies data synchronization.
A full and incremental synchronization task has the following benefits:
Synchronizes full data at a time.
Synchronizes incremental data in real time.
Automatically merges incremental and full data on a regular basis and writes the merged data to the related partition in a table that is used to store full data.
Overview
The following figure shows the capabilities of the full and incremental synchronization feature.
Capability | Description |
Data synchronization from or to data sources that are deployed in complex network environments | The full and incremental synchronization feature supports data synchronization from or to Alibaba Cloud databases, data centers, self-managed data sources that are hosted on Elastic Compute Service (ECS) instances, and data sources that do not belong to Alibaba Cloud. You can select appropriate network connectivity solutions to establish network connections between your resource group and data sources based on the network environments in which the data sources are deployed. Before you configure a full and incremental synchronization task, you must make sure that network connections are established between your resource group for Data Integration and data sources. For more information about how to establish a network connection between a resource group and a data source, see Establish a network connection between a resource group and a data source. |
Data synchronization scenarios | The full and incremental synchronization feature supports the synchronization of data from a single table to another single table, from tables in sharded databases to a single table, and from multiple tables in a database to multiple tables. DataWorks allows you to create the following types of full and incremental synchronization tasks: batch synchronization of all data in a database (one-time full synchronization, periodic full synchronization, one-time full synchronization and periodic incremental synchronization, one-time incremental synchronization, and periodic incremental synchronization), and one-click real-time synchronization (one-time full synchronization and real-time incremental synchronization). For more information, see Supported data source types and data synchronization solutions. |
Configurations for full and incremental synchronization tasks | For information about how to configure a full and incremental synchronization task, see Configure a synchronization task in Data Integration. For information about the capabilities that you can use when you configure a full and incremental synchronization task, see the Capabilities that you can use when you configure a full and incremental synchronization task section in this topic. |
O&M for full and incremental synchronization tasks |
|
Capabilities that you can use when you configure a full and incremental synchronization task
Capability | Description |
Refresh mappings | Click Refresh Mappings Between Source Tables and Destination Tables. Then, the system displays the mappings between source tables and destination tables. In the preceding figure, |
View or modify the schema of a destination table | Find a mapping record and click the name of the destination table to open the dialog box that displays the schema of the table. In the dialog box, you can modify the schema of the table based on your business requirements. In the preceding figure, the Important When you modify the schema of an automatically created destination table, you must take note of the following items about the fields that have the same names as fields in the source table:
In the preceding figure, the Important When you modify the schema of an existing destination table, you must take note of the following items:
|
Modify the schemas of multiple destination tables at a time | Select multiple mapping records and click Batch Modify Table Schema. In the dialog box that appears, you can modify the schemas of the destination tables at a time. After the modification is complete, click Apply and Refresh Mapping to save the modifications. Important
Then, click the name of a destination table on which the batch operation is performed to view the new schema of the table. |
Specify the name of a destination schema or table | By default, data is written to the destination schema and table that are named the same as the source database and table. If no such destination schema or table exists, the system automatically creates the schema or table in the destination. You can specify the name of the destination schema or table to which you want to write data. For more information, see Configure rules for mapping databases or tables. Note
|
Assign values to the fields that are newly added to a destination table | By default, source fields specified in a full and incremental synchronization task are mapped to destination fields that are named the same as the source fields. The values of the source fields are written to the destination fields that are named the same as the source fields. You can add fields to a destination table and assign constants or variables to the fields as values. You can find the desired mapping record and click Edit in the Value Assignment for Destination Field column. In the dialog box that appears, the new schema of the destination table to which fields are added is displayed.
Note Descriptions for the supported variables:
|
Configure rules to process DDL or DML messages | When you configure a full and incremental synchronization task, you can configure rules to process different types of DDL messages for destination tables. For information about the DDL and DML operations supported by different types of destinations, see Supported DML and DDL operations. Note
|