Data Transmission Service (DTS) enables two-way synchronization between ApsaraDB for MongoDB instances with sharded cluster architecture, suitable for a range of scenarios including active geo-redundancy and geo-disaster recovery. This topic describes the steps to configure two-way data synchronization.
Prerequisites
-
The source and destination ApsaraDB for MongoDB sharded cluster instances are created. For more information, see how to create a sharded cluster instance.
-
All shard nodes within the source ApsaraDB for MongoDB sharded cluster instance, including the source instance for the reverse task, must obtain connection addresses, and the account passwords for each shard need to be consistent. For more information, see how to obtain connection addresses for Shard or ConfigServer nodes.
-
For a list of supported versions, refer to the Overview of synchronization solutions.
-
-
We recommend that the storage space for the destination ApsaraDB for MongoDB instance should be at least 10% larger than the space used by the source ApsaraDB for MongoDB instance.
-
You need to configure the replication.oplogGlobalIdEnabled parameter to true on both the Shard and ConfigServer nodes for the source and destination instances. For more information, see Set database parameters.
If true is not selected, the instance precheck will fail, or the error
two-way mongo must have gid
will appear. -
You need to create the necessary databases and collections for sharding in both the source and destination ApsaraDB for MongoDB instances based on your business requirements, set up data sharding, activate the Balancer (which must remain active during incremental synchronization for the source ApsaraDB for MongoDB instance), and carry out pre-sharding. For more information, see how to optimize shard performance through data sharding and how to address uneven data distribution in a MongoDB sharded cluster.
-
Configuring data sharding helps prevent data from being synchronized to the same shard, which could degrade cluster performance. Enabling the Balancer and pre-sharding can mitigate data skew issues.
-
This configuration scenario uses the setup of a DTS task prior to purchase as an example. There is no need to specify the number of shards for the source ApsaraDB for MongoDB with sharded cluster architecture.
If you purchase before configuring the DTS task, you must enter the correct number of shards when purchasing the DTS task.
-
Precautions
Category | Description |
Category | Description |
Limits on the source and destination databases |
|
Other limits |
|
Billing
Synchronization type | Task configuration fee |
Synchronization type | Task configuration fee |
Schema synchronization and full data synchronization | Free of charge. |
Incremental data synchronization | Charged. For more information, see Billing overview. |
Supported two-way synchronization architectures
Currently, DTS supports two-way synchronization exclusively between two ApsaraDB for MongoDB instances with sharded cluster architecture. It does not support two-way synchronization across multiple ApsaraDB for MongoDB instances.
Supported conflict detection
To ensure data consistency, make sure that data records with the same primary key, business primary key, or unique key are updated only on one of the synchronization nodes.
DTS checks and fixes conflicts to maximize the stability of two-way synchronization instances. DTS can detect the following types of conflicts:
Uniqueness conflicts caused by INSERT operations
If the record that you want to insert into the destination instance by executing the INSERT statement conflicts with an existing record, DTS automatically ignores the INSERT operation.
Inconsistent records caused by UPDATE operations
If the record that you want to update by executing the UPDATE statement does not exist in the destination instance or conflicts with another record, DTS automatically ignores the UPDATE operation.
Non-existent records to be deleted
If the record that you want to delete from the destination instance by executing the DELETE statement does not exist, DTS automatically ignores the DELETE operation.
During two-way synchronization, the system time of the source and destination instances may be different and synchronization latency may occur. Therefore, DTS does not ensure that the conflict detection mechanism can prevent all data conflicts. To perform two-way synchronization, make sure that records with the same primary key, business primary key, or unique key are updated only on one of the synchronization nodes.
By default, DTS sets Conflict Resolution Policy to Ignore to resolve the preceding conflicts during data synchronization. You cannot change the value of Conflict Resolution Policy.
Task step description
Synchronization type | Description |
Synchronization type | Description |
Schema synchronization | Synchronize the structure of the synchronization objects in the source ApsaraDB for MongoDB to the destination ApsaraDB for MongoDB. |
Full synchronization | Synchronize all historical data of the synchronization objects in the source ApsaraDB for MongoDB to the destination ApsaraDB for MongoDB. Supports full synchronization of DATABASE and COLLECTION. |
Incremental synchronization | On the basis of full synchronization, synchronize the incremental updates of the source ApsaraDB for MongoDB to the destination ApsaraDB for MongoDB. Incremental synchronization does not support newly created databases after the task starts running. The supported incremental updates are as follows:
|
Procedure
This configuration scenario uses the pre-purchase configuration of a DTS task as an example. There is no need to specify the number of shards for the source ApsaraDB for MongoDB (sharded cluster architecture).
If you purchase before configuring the DTS task, you must enter the correct number of shards when purchasing the DTS task.
Use one of the following methods to go to the Data Synchronization page and select the region in which the data synchronization instance resides.
DTS consoleDMS consoleLog on to the DTS console.
In the left-side navigation pane, click Data Synchronization.
In the upper-left corner of the page, select the region in which the data synchronization instance resides.
The actual operations may vary based on the mode and layout of the DMS console. For more information, see Simple mode and Customize the layout and style of the DMS console.
Log on to the DMS console.
In the top navigation bar, move the pointer over Data Development and choose
.From the drop-down list to the right of Data Synchronization Tasks, select the region in which the data synchronization instance resides.
Click Create Task to go to the task configuration page.
Optional. Click New Configuration Page in the upper-right corner of the page.
Skip this step if the Back to Previous Version button is displayed in the upper-right corner of the page.
Specific parameters in the new and previous versions of the configuration page may be different. We recommend that you use the new version of the configuration page.
-
Configure the source and destination databases. The following table describes the parameters.
After you configure the source and destination databases, we recommend that you read the Limits that are displayed on the page. Otherwise, the task may fail or data inconsistency may occur.
Category
Configuration
Description
Category
Configuration
Description
None
Task Name
The name of the DTS task. DTS automatically generates a task name. We recommend that you specify a descriptive name that makes it easy to identify the task. You do not need to specify a unique task name.
Source Database
Select Existing Connection
The database that you want to use. You can choose whether to use an existing database based on your business requirements.
If you select an existing database, DTS automatically populates the parameters for the database.
If you do not select an existing database, you must configure the following database information.
In the DTS console, register a database with DTS on the Database Connections page or the new configuration page. For more information, see Manage database connections.
In the DMS console, you can select an existing database from the Select a DMS database instance. drop-down list. You can also click Add DMS Database Instance or go back to the homepage of the DMS console to register a database with DMS. For more information, see Register an Alibaba Cloud database instance and Register a database hosted on a third-party cloud service or a self-managed database.
Database Type
Select MongoDB.
Access Method
Select Cloud Instance.
Instance Region
Select the region where the source ApsaraDB for MongoDB resides.
Replicate Data Across Alibaba Cloud Accounts
In this example, a database of the current Alibaba Cloud account is used. Select No.
Architecture Type
Select Sharded Cluster Architecture.
Migration Method
Select Oplog.
Instance ID
Select the source ApsaraDB for MongoDB instance ID.
Authentication Database Name
Enter the database name to which the database account of the source ApsaraDB for MongoDB instance belongs. If not modified, the default is admin.
Database Account
Enter the database account of the source ApsaraDB for MongoDB, which must have read permissions on the databases to be synchronized, the config database, the admin database, and the local database.
Database Password
The password that is used to access the database.
Shard account
Enter the Shard account of the source ApsaraDB for MongoDB.
Shard password
Enter the Shard password of the source ApsaraDB for MongoDB.
Encryption
Specifies whether to encrypt the connection to the source database. You can select Non-encrypted, SSL-encrypted, or Mongo Atlas SSL based on your business requirements. The options available for the Encryption parameter are determined by the values selected for the Access Method and Architecture parameters. The options displayed in the DTS console prevail.
If the Architecture parameter is set to Sharded Cluster, and the Migration Method parameter is set to Oplog for the ApsaraDB for MongoDB database, the Encryption parameter SSL-encrypted is unavailable.
If the source database is a self-managed MongoDB database that uses the Replica Set architecture, the Access Method parameter is not set to Alibaba Cloud Instance, and the Encryption parameter is set to SSL-encrypted, you can upload a certification authority (CA) certificate to verify the connection to the source database.
Destination Database
Select Existing Connection
The database that you want to use. You can choose whether to use an existing database based on your business requirements.
If you select an existing database, DTS automatically populates the parameters for the database.
If you do not select an existing database, you must configure the following database information.
In the DTS console, register a database with DTS on the Database Connections page or the new configuration page. For more information, see Manage database connections.
In the DMS console, you can select an existing database from the Select a DMS database instance. drop-down list. You can also click Add DMS Database Instance or go back to the homepage of the DMS console to register a database with DMS. For more information, see Register an Alibaba Cloud database instance and Register a database hosted on a third-party cloud service or a self-managed database.
Database Type
Select MongoDB.
Access Method
Select Cloud Instance.
Instance Region
Select the region where the destination ApsaraDB for MongoDB resides.
Replicate Data Across Alibaba Cloud Accounts
In this example, a database of the current Alibaba Cloud account is used. Select No.
Architecture Type
Select Sharded Cluster Architecture.
Instance ID
Select the destination ApsaraDB for MongoDB instance ID.
Authentication Database Name
Enter the database name to which the database account of the destination ApsaraDB for MongoDB instance belongs. If not modified, the default is admin.
Database Account
Enter the database account of the destination ApsaraDB for MongoDB, which must have dbAdminAnyDatabase permissions, readWrite permissions on the destination database, and read permissions on the local database.
Database Password
The password that is used to access the database.
Encryption
Specifies whether to encrypt the connection to the destination database. You can select Non-encrypted, SSL-encrypted, or Mongo Atlas SSL based on your business requirements. The options available for the Encryption parameter are determined by the values selected for the Access Method and Architecture parameters. The options displayed in the DTS console prevail.
If the destination database is an ApsaraDB for MongoDB instance and the Architecture parameter is set to Sharded Cluster, the Encryption parameter SSL-encrypted is unavailable.
If the destination database is a self-managed MongoDB database that uses the Replica Set architecture, the Access Method parameter is not set to Alibaba Cloud Instance, and the Encryption parameter is set to SSL-encrypted, you can upload a CA certificate to verify the connection to the destination database.
In the lower part of the page, click Test Connectivity and Proceed.
Make sure that the CIDR blocks of DTS servers can be automatically or manually added to the security settings of the source and destination databases to allow access from DTS servers. For more information, see Add the CIDR blocks of DTS servers.
-
Configure the objects to be synchronized.
-
In the Configure Objects step, configure the objects that you want to synchronize.
Configuration
Description
Configuration
Description
Synchronization Types
The synchronization types. By default, Incremental Data Synchronization is selected. You must also select Schema Synchronization and Full Data Synchronization. After the precheck is complete, DTS synchronizes the historical data of the selected objects from the source database to the destination cluster. The historical data is the basis for subsequent incremental synchronization.
Processing Mode of Conflicting Tables
Precheck and Report Errors: checks whether the destination database contains collections that have the same names as the collections in the source database. If the source and destination databases do not contain collections that have identical collection names, the precheck is passed. Otherwise, an error is returned during the precheck, and the data synchronization task cannot be started.
If the source and destination databases have collections with identical names and the collections in the destination database cannot be deleted or renamed, you can use the object name mapping feature to rename the collections that are synchronized to the destination database. For more information, see Rename an object to be synchronized.
Ignore Errors and Proceed: skips the precheck for identical collection names in the source and destination databases.
If you select Ignore Errors and Proceed, data inconsistency may occur and your business may be exposed to potential risks.
If a data record in the destination database has the same primary key value or unique key value as a data record in the source database, DTS does not synchronize the data record to the destination database. The existing data record in the destination database is retained.
Data may fail to be initialized, only specific columns are synchronized, or the data synchronization task fails.
Synchronization Topology
Select Two-way Synchronization.
Filter DDL
Select Yes: Do not synchronize DDL operations.
Select No: Synchronize DDL operations.
Limits on DDL synchronization directions. To ensure the stability of the two-way synchronization link, only forward synchronization tasks support synchronizing DDL. Reverse synchronization tasks do not support synchronizing DDL.
Conflict Resolution Policy
If the above Supported Conflict Detection occurs, select an appropriate conflict resolution strategy based on your business needs.
TaskFailed
If a conflict occurs during data synchronization, the data synchronization task reports an error and exits the process. The task enters a failed state, and you must manually resolve the conflict.
Ignore
If a conflict occurs during data synchronization, the data synchronization task ignores the current statement and continues the process. The conflicting records in the destination database are used.
Overwrite
If a conflict occurs during data synchronization, the conflicting records in the destination database are overwritten.
This scenario only supports Ignore.
Source Objects
Select one or more objects from the Source Objects section and click the
icon to add the objects to the Selected Objects section.
The selection granularity of synchronization objects is DATABASE and COLLECTION.
Selected Objects
To rename an object to be synchronized in the destination database, right-click the object in the Selected Objects section. For more information, see Map object names.
To remove a selected object, click the object in the Selected Objects section and then click the
icon to move the object to the Source Objects section.
If you need to select incremental synchronization operations at the database or collection level, right-click the objects to be synchronized in the Selected Objects and make selections in the dialog box that appears.
If you need to set filter conditions for data (filter conditions are supported during the full synchronization phase but not during the incremental synchronization phase), right-click the tables to be synchronized in the Selected Objects and set them in the dialog box that appears. For more information, see Set filter conditions.
If the object name mapping feature is used (specifying the database or collection to receive data), it may cause other objects that depend on this object to fail to synchronize.
-
Click Next: Advanced Settings to configure advanced settings.
Configuration
Description
Configuration
Description
Dedicated Cluster for Task Scheduling
By default, DTS schedules the task to the shared cluster if you do not specify a dedicated cluster. If you want to improve the stability of data synchronization tasks, purchase a dedicated cluster. For more information, see What is a DTS dedicated cluster.
Retry Time for Failed Connections
The retry time range for failed connections. If the source or destination database fails to be connected after the data synchronization task is started, DTS immediately retries a connection within the time range. Valid values: 10 to 1440. Unit: minutes. Default value: 720. We recommend that you set this parameter to a value greater than 30. If DTS reconnects to the source and destination databases within the specified time range, DTS resumes the data synchronization task. Otherwise, the data synchronization task fails.
If you specify different retry time ranges for multiple data synchronization tasks that have the same source or destination database, the shortest retry time range takes precedence.
When DTS retries a connection, you are charged for the DTS instance. We recommend that you specify the retry time range based on your business requirements. You can also release the DTS instance at your earliest opportunity after the source and destination instances are released.
Retry Time for Other Issues
The retry time range for other issues. For example, if the DDL or DML operations fail to be performed after the data synchronization task is started, DTS immediately retries the operations within the time range. Valid values: 1 to 1440. Unit: minutes. Default value: 10. We recommend that you set this parameter to a value greater than 10. If the failed operations are successfully performed within the specified time range, DTS resumes the data synchronization task. Otherwise, the data synchronization task fails.
The value of the Retry Time for Other Issues parameter must be smaller than the value of the Retry Time for Failed Connections parameter.
Enable Throttling for Full Data Migration
During full data synchronization, DTS uses the read and write resources of the source and destination databases. This may increase the load on the database servers. You can configure the Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s) parameters for full data synchronization tasks to reduce the load on the destination database server.
This parameter is displayed only if Full Data Synchronization is selected for the Synchronization Types parameter.
Only one data type for primary key _id in a single table
Whether the data type for primary key
_id
in a collection of the data to be synchronized is unique. Valif value:This parameter is displayed only if Full Data Synchronization is selected for the Synchronization Types parameter.
Yes: The data type is unique. During full data synchronization, DTS does not scan the data type for primary key
_id
of the data to be synchronized from the source database.No: The data type is not unique. During full data synchronization, DTS scans the data type for primary key
_id
of the data to be synchronized from the source database.
Enable Throttling for Incremental Data Synchronization
Specifies whether to enable throttling for incremental data synchronization. You can enable throttling for incremental data synchronization based on your business requirements. To configure throttling, you must configure the RPS of Incremental Data Synchronization and Data synchronization speed for incremental synchronization (MB/s) parameters. This reduces the load on the destination database server.
Environment Tag
You can select an environment tag to identify the instance based on your actual needs. This example does not require selection.
Configure ETL
Specifies whether to enable the extract, transform, and load (ETL) feature. For more information, see What is ETL? Valid values:
Yes: configures the ETL feature. You can enter data processing statements in the code editor. For more information, see Configure ETL in a data migration or data synchronization task.
No: does not configure the ETL feature.
Monitoring and Alerting
Specifies whether to configure alerting for the data synchronization task. If the task fails or the synchronization latency exceeds the specified threshold, alert contacts will receive notifications. Valid values:
No: does not enable alerting.
Yes: configures alerting. In this case, you must also configure the alert threshold and alert notification settings. For more information, see the "Configure monitoring and alerting when you create a DTS task" section of the Configure monitoring and alerting topic.
Click Next Step: Data Verification to configure data verification.
For more information about how to use the data verification feature, see Configure a data verification task.
-
Save the task settings and run a precheck.
To view the parameters to be specified when you call the relevant API operation to configure the DTS task, move the pointer over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.
If you do not need to view or have viewed the parameters, click Next: Save Task Settings and Precheck in the lower part of the page.
Before you can start the data synchronization task, DTS performs a precheck. You can start the data synchronization task only after the task passes the precheck.
If the data synchronization task fails the precheck, click View Details next to each failed item. After you analyze the causes based on the check results, troubleshoot the issues. Then, rerun the precheck.
If an alert is triggered for an item during the precheck:
If an alert item cannot be ignored, click View Details next to the failed item and troubleshoot the issue. Then, run a precheck again.
If an alert item can be ignored, click Confirm Alert Details. In the View Details dialog box, click Ignore. In the message that appears, click OK. Then, click Precheck Again to run a precheck again. If you ignore the alert item, data inconsistency may occur, and your business may be exposed to potential risks.
-
Purchase an instance.
Wait until the Success Rate becomes 100%. Then, click Next: Purchase Instance.
On the buy page, configure the Billing Method and Instance Class parameters for the data synchronization instance. The following table describes the parameters.
Section
Parameter
Description
Section
Parameter
Description
New Instance Class
Billing Method
Subscription: You pay for a subscription when you create a data synchronization instance. The subscription billing method is more cost-effective than the pay-as-you-go billing method for long-term use.
Pay-as-you-go: A pay-as-you-go instance is billed on an hourly basis. The pay-as-you-go billing method is suitable for short-term use. If you no longer require a pay-as-you-go data synchronization instance, you can release the instance to reduce costs.
Resource Group Settings
The resource group to which the data synchronization instance belongs. Default value: default resource group. For more information, see What is Resource Management?
Instance Class
DTS provides instance classes that vary in synchronization speed. You can select an instance class based on your business requirements. For more information, see Instance classes of data synchronization instances.
Subscription Duration
If you select the subscription billing method, specify the subscription duration and the number of data synchronization instances that you want to create. The subscription duration can be one to nine months, one year, two years, three years, or five years.
This parameter is available only if you select the Subscription billing method.
Read and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start. In the dialog box that appears, click OK.
You can view the progress of the task in the task list.
-
Configure the reverse synchronization task.
-
Wait for the forward synchronization task to complete its initialization until the Status is Running.
-
Find the reverse synchronization task and click Configure Task.
-
For configuring the reverse synchronization task, see Step 4 through Step 7.
-
When configuring the reverse task, select the correct source and destination instances. In reverse synchronization, the source instance is the destination instance in forward synchronization, and the destination instance is the source instance in forward synchronization. Confirm the consistency of instance information, such as database name, account, and password.
-
It is not recommended to change the mapping name when configuring the reverse task, as it may cause data inconsistency.
-
The Instance Region for both the source and destination databases in a reverse synchronization task is fixed and cannot be altered. Additionally, reverse synchronization tasks require the configuration of fewer parameters than forward synchronization tasks. For more details, please consult the console interface.
-
The Processing Mode of Conflicting Tables setting for the reverse synchronization task does not verify the tables that the forward synchronization task has synchronized to the destination instance.
-
The reverse synchronization task cannot synchronize objects listed under Selected Objects from the forward task.
-
The reverse synchronization task will automatically filter DDL operations.
-
-
Success Rate displays 100%. Click Back.
-
-
After setting up the second synchronization task, wait until the Status for both synchronization tasks shows as Running. This indicates that the two-way data synchronization configuration is complete.