This topic describes how to use the Data Integration service of DataWorks to synchronize data from databases to Hologres in real time.
Prerequisites
DataWorks is activated. For more information, see Overview.
The Alibaba Cloud database service from which you want to synchronize data is activated.
If the preceding services are activated in different regions, check how to synchronize data across regions. For more information, see Establish a network connection between a resource group and a data source.
Background information
Hologres is a real-time interactive analytics engine that seamlessly integrates with the big data ecosystem. Hologres integrates with the intelligent R&D platform DataWorks to support data queries and analysis with high concurrency and low latency. You can use real-time sync nodes provided by the Data Integration service of DataWorks to synchronize data from databases to Hologres, and then query, analyze, and process the data with high concurrency and low latency.
Common types of databases from which you can synchronize data by using real-time sync nodes include Oracle, PolarDB, and PolarDB for MySQL.
For more information about supported database types, see Data source types that support real-time synchronization.
For more information about how data is synchronized, see MySQL Reader, Oracle Reader, PolarDB Reader, SQL Server Reader, and Hologres Writer.
Process
To use the Data Integration service of DataWorks to synchronize data from different types of databases to Hologres in real time, perform the following steps. Such a synchronization process provides high stability and efficiency.
Configure a connection to the source database.
Before you start the synchronization process, you must configure a connection to the source database. For example, if you want to synchronize data from a MySQL database to Hologres in real time, you must configure a MySQL connection. You can customize a connection based on your business requirements. For more information, see Connection configuration.
Configure a connection to Hologres.
NoteSuch a connection must use an exclusive resource group for Data Integration.
Before you start the synchronization process, you must configure a connection to Hologres. For more information, see Add a Hologres data source.
Configure a real-time sync node.
After you complete the preceding two steps, configure a real-time sync node. The following table describes the three modes of real-time synchronization supported by Data Integration. You can select a synchronization mode based on your business requirements.
Synchronization mode
Scenario
Supported types of data sources
References for configuring connections
References for configuring sync nodes
Single-table real-time synchronization
Synchronize the changes in partial data from the source database to the destination Hologres instance in real time. This keeps the data in the destination Hologres instance updated.
MySQL Binlog
DataHub
LogHub
Kafka
PolarDB
SQL Server
Create a real-time synchronization node to synchronize incremental data from a single table
Real-time database synchronization
Synchronize the changes in full data from the source database to the destination Hologres instance in real time. This keeps the data in the destination Hologres instance updated.
PolarDB MySQL
PolarDB
MySQL
Data synchronization solution
DataWorks provides solutions for various data synchronization scenarios, such as real-time synchronization, offline full synchronization, and offline incremental synchronization. These solutions help enterprises migrate data to the cloud in a more efficient and convenient manner. The following data synchronization solutions are provided:
Initialize full data.
Write incremental data in real time.
Automatically merge the full and incremental data at a scheduled time and write the data to the partitions of a new table.
PolarDB MySQL
Oracle
MySQL
PolarDB-X
PostgreSQL
NoteWhen you use real-time sync nodes of DataWorks to synchronize data from databases to Hologres, you can add fields to the destination table of Hologres. For example, you can add the UPDATE_TIME field. For more information, see Configure and manage a real-time synchronization node.