Configure a batch synchronization node to synchronize only incremental data

0.0.201

Data Integration allows you to specify a filter condition for a batch synchronization node to enable the node to synchronize only incremental data. If you specify a filter condition when you configure a batch synchronization node, only the data that meets the filter condition can be synchronized. You can use a filter condition together with scheduling parameters. This way, the filter condition can dynamically change with the settings of the scheduling parameters, and incremental data can be synchronized. This topic describes how to configure a batch synchronization node to synchronize only incremental data.

Usage notes

Incremental synchronization is not supported for some types of data sources, such as HBase and OTSStream data sources. You can refer to the topics that introduce the Reader plug-ins of the related data sources to check whether incremental synchronization is supported.

The parameters that you must configure vary based on the Reader plug-ins that you use to synchronize incremental data. For more information, see Supported data source types, readers, and writers. The following table provides examples.

Reader plug-in	Parameter required for incremental synchronization	Supported syntax


Reader plug-in	Parameter required for incremental synchronization	Supported syntax
MySQL Reader	where Note If you use the codeless UI to configure a batch synchronization node that uses MySQL Reader, you must configure the Filter parameter.	Use the syntax of the related database. Note You can use the filter condition together with scheduling parameters to read the data that is generated during a specified period of time every day.
MongoDB Reader	query Note If you use the codeless UI to configure a batch synchronization node that uses MongoDB Reader, you must configure the Search Condition parameter.	Use the syntax of the related database. Note You can use the filter condition together with scheduling parameters to read the data that is generated during a specified period of time every day.
OSS Reader	Object	Specify the object path. Note You can use the filter condition together with scheduling parameters to read data from specified objects every day.
...	...	...

If you use a batch synchronization node to synchronize data, you can configure scheduling parameters for the node to specify the path and scope of the data that you want to synchronize and the location to which you want to write the data. The method used to configure scheduling parameters for a batch synchronization node is the same as the method used to configure scheduling parameters for other types of nodes.

When a batch synchronization node is run, the scheduling parameters configured for the node are replaced with the actual values based on the value formats of the scheduling parameters. Then, the batch synchronization node synchronizes data based on the values.

In the example in this section, a batch synchronization node is configured to synchronize data from a MySQL data source.

If you do not specify a filter condition when you configure the node, the node automatically synchronizes all data from the source to the destination.
If you specify a filter condition when you configure the node, the node synchronizes only the data that meets the filter condition to the destination.

The partition in the destination MaxCompute table to which data is written is also specified by a scheduling parameter. $bizdate specifies the data timestamp of the node. When the node is run, the partition filter expression configured for the node is replaced with the data timestamp specified by the scheduling parameter. For information about how to configure and use scheduling parameters, see Configure and use scheduling parameters. Example of incremental synchronization

Before you configure a batch synchronization node to synchronize only incremental data, take note of the following items:

If you want to synchronize incremental data of a time data type, you can use scheduling parameters when you specify a filter condition for the node. This way, when the node is scheduled, the scheduling parameters are replaced with the actual values based on the data timestamp of the node. For more information about scheduling parameters, see Overview of scheduling parameters.
If you want to synchronize incremental data of a data type other than time, you can use an assignment node to convert the data type of the source fields to the data type supported by the destination and then transmit the processed data to Data Integration for data synchronization. For more information about how to use an assignment node, see Configure an assignment node.

Sample scenarios

Synchronize historical incremental data: If you want to synchronize historical incremental data from a source table to the related time partition in a destination table, you can use the data backfill feature provided in Operation Center. For more information about the data backfill feature, see Backfill data for an auto triggered node and view data backfill instances generated for the node.
Synchronize incremental data from ApsaraDB RDS to MaxCompute

Feedback