Before you synchronize data in a single PolarDB table in real time, you must perform operations in this topic to configure the network environment, IP address whitelist, and permissions for the data source.
Prerequisites
- Prepare data sources: A PolarDB for MySQL cluster and a destination are prepared. The destination can be MaxCompute, Hologres, Elasticsearch, DataHub, or Kafka. In this topic, a PolarDB for MySQL cluster is used as the source.
- Plan and prepare resources: An exclusive resource group for data integration is purchased and configured. For more information, see Plan and configure resources.
- Evaluate and plan the network environment: Before you perform data integration, you must select a network connection method based on your business requirements and use the method to connect the data sources to the exclusive resource group for Data Integration. After the data sources and the exclusive resource group for Data Integration are connected, you can refer to the operations described in this topic to configure access settings such as vSwitches and whitelists.
- If the data sources and the exclusive resource group for Data Integration reside in the same region and virtual private cloud (VPC), they are automatically connected.
- If the data sources and the exclusive resource group for Data Integration reside in different network environments, you must connect the data sources and the resource group by using methods such as a VPN gateway.
Background information
- Configure whitelists for the data sourcesIf the data sources and the exclusive resource group for data integration reside in the same VPC, you must add the CIDR block of the exclusive resource group for data integration to the whitelists of the data sources. This ensures that the exclusive resource group for data integration can be used to access the data sources.
- Create an account and authorize the account
You must create an account that can be used to access the data sources, read data from the source, and write data to the destination during the data synchronization process.
- Enable the binary logging feature
If the source is a PolarDB for MySQL cluster, you must enable the binary logging feature for the cluster. Alibaba Cloud PolarDB for MySQL is fully compatible with MySQL and uses high-level physical logs to replace binary logs. To facilitate the integration between PolarDB and the MySQL ecosystem, you can enable the binary logging feature for PolarDB clusters.
Limits
- Only PolarDB for MySQL clusters can be used as sources in data synchronization solutions. Other types of PolarDB data sources are not supported. In this topic, PolarDB indicates PolarDB for MySQL data sources.
- Only data stored on the primary node of a PolarDB for MySQL cluster can be synchronized.
- You cannot use the real-time synchronization feature to synchronize data on which XA ROLLBACK statements are executed. For transaction data on which XA PREPARE statements are executed, you can use the real-time synchronization feature to synchronize the data to a destination. If XA ROLLBACK statements are executed later on the data, the rollback changes to the data cannot be synchronized to the destination. If the tables that you want to synchronize contain tables on which XA ROLLBACK statements are executed, you must remove the tables on which XA ROLLBACK statements are executed and add the removed tables again to initialize full data in the source and synchronize incremental data.
Procedure
- Configure a whitelist for the PolarDB for MySQL cluster. To add the CIDR block of the VPC where the exclusive resource group for Data Integration resides to a whitelist of the PolarDB for MySQL cluster, perform the following steps:
- Create an account and grant the required permissions to the account. You must create an account to log on to the database of the PolarDB for MySQL cluster. You must grant the
SELECT, REPLICATION SLAVE, and REPLICATION CLIENT
permissions to the account. - Enable the binary logging feature for the PolarDB for MySQL cluster. For more information, see Enable binary logging.
What to do next
After the data sources are configured, the source, destination, and exclusive resource group for data integration are connected, and you can use the authorized account to access the data sources. You can add both the source and destination to DataWorks, and associate them with a data synchronization solution when you create the solution. For more information about how to add a data source, see Add data sources.