When you configure a task to synchronize data to a Kafka cluster, you can specify the policy for synchronizing data to Kafka partitions. The policy allows you to improve the synchronization performance. For example, you can synchronize data to different partitions based on hash values.
Hash algorithm
Data Transmission Service (DTS) uses the hashCode() method in Java to calculate hash values.
Configuration method
In the Select Objects to Synchronize step of a task creating wizard, you can specify the policy for synchronizing data to Kafka partitions. For more information, see Synchronize data from an ApsaraDB RDS for MySQL instance to a self-managed Kafka cluster and Overview of data synchronization scenarios.
Warning After a data synchronization task is started, do not change the number of partitions in the destination topic. Otherwise, data synchronization fails.
Synchronization policies
Policy | Description | Advantage and disadvantage |
---|---|---|
Synchronize All Data to Partition 0 | DTS synchronizes all data and DDL statements to Partition 0 of the destination topic. |
|
Synchronize Data to Separate Partitions Based on Hash Values of Database and Table Names | DTS uses the database and table names as the partition key to calculate the hash value. Then, DTS synchronizes the data and DDL statements of each table to the corresponding partition of the destination topic. Note
|
|
Synchronize Data to Separate Partitions Based on Hash Values of Primary Keys | DTS uses a table column as the partition key to calculate the hash value. The table column is the primary key by default. If a table does not have a primary key, the unique key is used as the partition key. DTS synchronizes each row to the corresponding partition of the destination topic. You can specify one or more columns as partition keys to calculate the hash value. Note
|
|