Capability | Description |
Data synchronization between various data sources | The real-time synchronization feature allows you to combine multiple types of data sources to form a star-shaped data synchronization link. You can synchronize data between different types of data sources. For more information, see Data source types that support real-time synchronization. |
Data synchronization from or to data sources that are deployed in complex network environments | The real-time synchronization feature supports data synchronization from or to Alibaba Cloud data sources, data centers, data sources that are hosted on Elastic Compute Service (ECS) instances, and data sources that do not belong to Alibaba Cloud. You can select appropriate network connectivity solutions to establish network connections between your resource group and data sources based on the network environments in which the data sources are deployed. Before you configure a data synchronization node, you must make sure that network connections are established between your resource group for Data Integration and data sources. For more information about how to establish a network connection between a resource group and a data source, see Network connectivity solutions. |
Data synchronization scenarios | The real-time synchronization feature allows you to synchronize incremental data from a single table to another single table in real time, synchronize incremental data from tables in sharded databases to a single table in real time, and synchronize incremental data from multiple tables in a database to multiple tables in real time. Real-time synchronization of incremental data from a single table: supports real-time extract, transform, and load (ETL) of incremental data from a single table. Real-time synchronization of incremental data from tables in one or more databases: Supports synchronization of logs for changes from all tables in a source database to a destination. In most cases, this synchronization mode is used to collect real-time logs. Supports synchronization of data from multiple tables in multiple databases of the same source at a time. You can specify a maximum of 3,000 source tables in a data synchronization node.
Note The real-time synchronization feature can be used to synchronize only incremental data in real time. If you want to synchronize full data from a source at a time and then synchronize incremental data from the source in real time, you can use the solution-based synchronization feature. You can use the solution-based synchronization feature to continuously synchronize data from a source to a destination, which helps ensure the consistency between data in the destination and data in the source in real time. For more information about how to select a data synchronization feature, see Overview. |
Configurations for real-time synchronization nodes | The real-time synchronization feature provides the following capabilities to allow you to configure a real-time synchronization node. You do not need to write code to configure the node. You need to only make simple configurations for the node to perform real-time ETL of incremental data from a single table or real-time synchronization of incremental data from multiple tables in a database. For more information, see Configure a real-time synchronization node to synchronize incremental data from a single table and Create a real-time synchronization node to synchronize all incremental data from a database. |
O&M for real-time synchronization nodes | Configure alerting and monitoring settings for a real-time synchronization node Resumable uploads are supported. You can configure alerting and monitoring settings for a real-time synchronization node based on one of the following conditions: business delay, failover, support for DDL statements, and heartbeat check. For more information, see O&M for real-time synchronization nodes. You can configure DataWorks to send alert notifications by email, text message, or DingTalk message to the specified alert recipient. This helps the alert recipient identify and troubleshoot exceptions at the earliest opportunity. You can control alerting frequency. To prevent a large number of alerts from being generated within a short period of time, DataWorks allows you to control alerting frequency for real-time synchronization nodes. You can configure the related settings to enable DataWorks to send only one alert notification based on the alert rule within a specified period of time.
Specify the maximum number of dirty data records allowed and the impacts of dirty data records on a real-time synchronization node If you do not allow the generation of dirty data and dirty data records are generated during data synchronization, the real-time synchronization node fails. If you allow the generation of dirty data and specify the maximum number of dirty data records that are allowed, the number of generated dirty data records determines whether the node fails. If the number of generated dirty data records does not exceed the specified limit, the dirty data is ignored and the node continues to run. If the number of generated dirty data records exceeds the specified limit, the node fails.
Note For more information about dirty data records, see Terms.
|