This topic describes how to use Data Transmission Service (DTS) to migrate data from an ApsaraDB for MongoDB instance with a replica set architecture to another ApsaraDB for MongoDB instance that uses either a replica set or sharded cluster architecture.
Supported source and destination databases
Source database (replica set architecture) | Destination database (replica set or sharded cluster architecture) |
ApsaraDB for MongoDB | ApsaraDB for MongoDB |
Self-managed database hosted on ECS | Self-managed database hosted on ECS |
Self-managed database connected over a leased line, VPN Gateway, or Smart Access Gateway | Self-managed database connected over a leased line, VPN Gateway, or Smart Access Gateway |
Self-managed database with a public IP address | Self-managed database with a public IP address |
This topic explains the configuration process using ApsaraDB for MongoDB (ReplicaSet architecture) and ApsaraDB for MongoDB (ReplicaSet or sharded cluster architecture) as examples. The configuration is similar for other data sources.
Prerequisites
Create the source ApsaraDB for MongoDB instance (replica set architecture) and the destination ApsaraDB for MongoDB instance (replica set or sharded cluster architecture). For more information, see Create a replica set instance and Create a sharded cluster instance.
NoteFor supported versions, see Overview of migration solutions.
Ensure that the storage capacity of the destination ApsaraDB for MongoDB instance is at least 10% larger than that of the source ApsaraDB for MongoDB instance.
If the target ApsaraDB for MongoDB instance is a sharded cluster, you need to create the databases and collections to be sharded, configure data sharding, enable the Balancer, and perform pre-sharding in the target ApsaraDB for MongoDB instance as needed. For more information, see Configure data sharding to maximize shard performance and How to handle uneven data distribution in a MongoDB sharded cluster.
NoteConfiguring data partitioning prevents all migrated data from being stored on a single shard, which would limit cluster performance. Enabling the Balancer and performing pre-sharding helps avoid data skew.
Notes
Type | Description |
Source database limits |
|
Other limits |
|
Special cases | If the source is a self-managed MongoDB database:
Note If you choose to migrate the entire database, you can also create a heartbeat table that is updated or written to every second. |
Billing
|
Migration type |
Instance configuration fee |
Internet traffic fee |
|
Schema migration and full data migration |
Free of charge. |
When the Access Method parameter of the destination database is set to Public IP Address, you are charged for Internet traffic. For more information, see Billing overview. |
|
Incremental data migration |
Charged. For more information, see Billing overview. |
Migration types
Migration type | Description |
Schema migration | Migrates the structure of objects from the source ApsaraDB for MongoDB instance to the destination ApsaraDB for MongoDB instance. Note Supported objects for schema migration include DATABASE, COLLECTION, and INDEX. |
Full migration | Migrates all historical data of the selected objects from the source ApsaraDB for MongoDB instance to the destination ApsaraDB for MongoDB instance. Note Supports migrating data in DATABASE and COLLECTION objects. |
Incremental migration | After full migration completes, migrates incremental updates from the source ApsaraDB for MongoDB instance to the destination ApsaraDB for MongoDB instance. Using OplogIncremental migration does not support databases created after the task starts. Supported incremental updates include the following:
Using ChangeStreamSupported incremental updates include the following:
|
Database account permissions
Database | Schema migration | Full migration | Incremental migration |
Source ApsaraDB for MongoDB | Read permission on the databases to be migrated and the config database. | Read permission on the databases to be migrated, the admin database, and the local database. | |
Destination ApsaraDB for MongoDB | dbAdminAnyDatabase permission, readWrite permission on the destination database, and read permission on the local database. | ||
For instructions on creating and authorizing database accounts for the source and destination ApsaraDB for MongoDB instances, see Manage MongoDB database users in DMS.
Procedure
-
Navigate to the migration task list page for the destination region using one of the following methods.
From the DTS console
-
Log on to the Data Transmission Service (DTS) console.
-
In the navigation pane on the left, click Data Migration.
-
In the upper-left corner of the page, select the region where the migration instance is located.
From the DMS console
NoteThe actual operations may vary based on the mode and layout of the DMS console. For more information, see Simple mode console and Customize the layout and style of the DMS console.
-
Log on to the Data Management (DMS) console.
-
In the top menu bar, choose .
-
To the right of Data Migration Tasks, select the region where the migration instance is located.
-
-
Click Create Task to navigate to the task configuration page.
Configure the source and destination databases.
WarningAfter you select the source and destination instances, we recommend that you carefully read the limits displayed at the top of the page. Otherwise, the task may fail or data inconsistency may occur.
Category
Configuration
Description
None
Task Name
DTS automatically generates a task name. We recommend that you specify a descriptive name for easy identification. The name does not need to be unique.
Source Database
Select Existing Connection
-
To use a database instance that has been added to the system (created or saved), select the desired database instance from the drop-down list. The database information below will be automatically configured.
NoteIn the DMS console, this parameter is named Select a DMS database instance..
-
If you have not registered the database instance with the system, or do not need to use a registered instance, manually configure the database information below.
Database Type
Select MongoDB.
Connection Type
Select Cloud Instance.
Instance Region
Select the region where the source ApsaraDB for MongoDB instance resides.
Replicate Data Across Alibaba Cloud Accounts
In this example, a database instance under the current Alibaba Cloud account is used. Select No.
Architecture Type
Select Replica Set Architecture.
Replica Set Architecture: Achieves high availability and read/write splitting through multiple node types. For more information, see Replica set architecture.
Sharded Cluster Architecture: Provides three components—Mongos, Shard, and ConfigServer—and allows flexible selection of Mongos and Shard counts and configurations. For more information, see Sharded cluster architecture.
Migration Method
Select an incremental data migration method based on your situation.
Oplog (recommended):
Available if Oplog is enabled on the source database.
NoteOplog is enabled by default for self-managed MongoDB and ApsaraDB for MongoDB. Using Oplog results in lower latency for incremental migration tasks (faster log retrieval), so we recommend selecting Oplog.
ChangeStream: Available if Change Streams (Change Streams) are enabled on the source database.
NoteIf the source database is Amazon DocumentDB (non-elastic cluster), you can only select ChangeStream.
If Architecture is set to Sharded Cluster, you do not need to enter Shard account and Shard password.
Instance ID
Select the instance ID of the source ApsaraDB for MongoDB instance.
Authentication Database Name
Enter the name of the database to which the source ApsaraDB for MongoDB database account belongs. The default value is admin if you have not changed it.
Database Account
Enter the database account for the source ApsaraDB for MongoDB instance. For permission requirements, see Database account permissions.
Database Password
Enter the password for the database account.
Encryption
DTS supports three connection types: Non-encrypted, SSL-encrypted, and Mongo Atlas SSL. The options available for the Encryption parameter are determined by the values selected for the Access Method and Architecture parameters. The options displayed in the DTS console prevail.
NoteMongoDB databases where the Architecture is Sharded Cluster and the Migration Method is Oplog do not support SSL-encrypted.
If the source database is a self-managed MongoDB database that uses the Replica Set, the Access Method is not set to Alibaba Cloud Instance, and you have selected SSL-encrypted, you can also upload a certification authority (CA) certificate to verify the connection to the source database.
Destination Database
Select Existing Connection
-
To use a database instance that has been added to the system (created or saved), select the desired database instance from the drop-down list. The database information below will be automatically configured.
NoteIn the DMS console, this parameter is named Select a DMS database instance..
-
If you have not registered the database instance with the system, or do not need to use a registered instance, manually configure the database information below.
Database Type
Select MongoDB.
Connection Type
Select Cloud Instance.
Instance Region
Select the region where the destination ApsaraDB for MongoDB instance resides.
Replicate Data Across Alibaba Cloud Accounts
In this example, a database instance under the current Alibaba Cloud account is used. Select No.
Architecture Type
Select an architecture based on your business needs:
Replica Set Architecture: Achieves high availability and read/write splitting through multiple node types. For more information, see Replica set architecture.
Sharded Cluster Architecture: Provides three components—Mongos, Shard, and ConfigServer—and allows flexible selection of Mongos and Shard counts and configurations. For more information, see Sharded cluster architecture.
Instance ID
Select the instance ID of the destination ApsaraDB for MongoDB instance.
Authentication Database Name
Enter the name of the database to which the destination ApsaraDB for MongoDB database account belongs. The default value is admin if you have not changed it.
Database Account
Enter the database account for the destination ApsaraDB for MongoDB instance. For permission requirements, see Database account permissions.
Database Password
Enter the password for the database account.
Encryption
DTS supports three connection types: Non-encrypted, SSL-encrypted, and Mongo Atlas SSL. The options available for the Encryption parameter are determined by the values selected for the Access Method and Architecture parameters. The options displayed in the DTS console prevail.
NoteMongoDB databases where the Architecture is Sharded Cluster do not support SSL-encrypted.
If the destination database is a self-managed MongoDB database that uses the Replica Set, the Access Method is not Alibaba Cloud Instance, and you select SSL-encrypted, DTS also supports uploading a CA certificate to verify the connection.
-
After completing the configuration, click Test Connectivity and Proceed at the bottom of the page.
NoteEnsure that you add the CIDR blocks of the DTS servers (either automatically or manually) to the security settings of both the source and destination databases to allow access. For more information, see Add the IP address whitelist of DTS servers.
If the source or destination is a self-managed database (i.e., the Access Method is not Alibaba Cloud Instance), you must also click Test Connectivity in the CIDR Blocks of DTS Servers dialog box.
Configure the task objects.
On the Configure Objects page, configure the objects that you want to migrate.
Configuration
Description
Migration Types
-
If you only need to perform a full migration, select both Schema Migration and Full Data Migration.
-
To perform a migration with no downtime, select Schema Migration, Full Data Migration, and Incremental Data Migration.
Note-
If you do not select Schema Migration, you must ensure that a database and tables to receive the data exist in the destination database. You can also use the object name mapping feature in the Selected Objects box as needed.
-
If you do not select Incremental Data Migration, do not write new data to the source instance during data migration to ensure data consistency.
For more information about task steps, see Migration types.
Processing Mode of Conflicting Tables
- Precheck and Block on Error: Checks whether a collection with the same name exists in the destination database. If no such collection exists, the check passes. If a collection with the same name exists, an error is reported during precheck, and the migration task will not start.Note If you cannot delete or rename the collection with the same name in the destination database, change its name in the destination database. For more information, see Object name mapping.
- Ignore Errors and Continue: Skips the check for collections with the same name in the destination database.Warning Selecting Ignore Errors and Continue may cause data inconsistency and business risks, such as:
- If a record with the same primary key value as in the source database exists in the destination database, the existing record in the destination database is retained, and the record from the source database is not migrated.
- Data initialization may fail, only partial data may be migrated, or the migration may fail.
Capitalization of Object Names in Destination Instance
You can configure the case policy for database and collection names in the destination instance. By default, DTS Default Policy is selected. You can also choose to align with the source or destination database default policies. For more information, see Case conversion policy for destination object names.
Source Objects
In the Source Objects box, click the objects to be migrated, then click
to move them to the Selected Objects box.NoteYou can select migration objects at the DATABASE or COLLECTION granularity.
Selected Objects
-
To set the name of a migration object in the destination instance, or to specify the object that receives data in the destination instance, right-click the migration object in the Selected Objects box to make changes. For more information, see Object name mapping.
-
To remove a selected migration object, click the object in the Selected Objects box, and then click
to move it to the Source Objects box.
NoteTo select incremental migration operations at the database or collection level, right-click the object in the Selected Objects box and make selections in the dialog box that appears.
To set filter conditions (supported only during full migration, not incremental migration), right-click the table in the Selected Objects box and configure the conditions in the dialog box that appears. For instructions, see Set filter conditions.
If you use object name mapping (specifying a database or collection to receive data), migration of other objects that depend on this object may fail.
-
Click Next: Advanced Settings to configure advanced parameters.
Configuration
Description
Dedicated Cluster for Task Scheduling
By default, DTS schedules tasks on a shared cluster. You do not need to select one. If you want more stable tasks, you can purchase a dedicated cluster to run DTS migration tasks.
Retry Time for Failed Connections
After the migration task starts, if the connection to the source or destination database fails, DTS reports an error and immediately begins to retry the connection. The default retry duration is 720 minutes. You can customize the retry time to a value from 10 to 1440 minutes. We recommend that you set the duration to more than 30 minutes. If DTS reconnects to the source and destination databases within the specified duration, the migration task automatically resumes. Otherwise, the task fails.
Note-
For multiple DTS instances that share the same source or destination, the network retry time is determined by the setting of the last created task.
-
Because you are charged for the task during the connection retry period, we recommend that you customize the retry time based on your business needs, or release the DTS instance as soon as possible after the source and destination database instances are released.
Retry Time for Other Issues
After the migration task starts, if a non-connectivity issue, such as a DDL or DML execution exception, occurs in the source or destination database, DTS reports an error and immediately begins to retry the operation. The default retry duration is 10 minutes. You can customize the retry time to a value from 1 to 1440 minutes. We recommend that you set the duration to more than 10 minutes. If the related operations succeed within the specified retry duration, the migration task automatically resumes. Otherwise, the task fails.
ImportantThe value of Retry Time for Other Issues must be less than the value of Retry Time for Failed Connections.
Enable Throttling for Full Data Migration
During full migration, DTS consumes read and write resources on the source and destination databases, which may increase the database load. If required, you can enable throttling for the full migration task. You can set Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s) to reduce the load on the destination database.
Note-
This configuration item is available only if you select Full Data Migration for Migration Types.
-
You can also adjust the full migration speed after the migration instance is running.
Only one data type for primary key _id in a table of the data to be synchronized
Indicates whether the data type of the primary key
_idis unique within the same collection in the data to be migrated.ImportantSelect as needed. Otherwise, data loss may occur.
This configuration is available only if Migration Types includes Full Data Migration.
Yes: Unique. During full migration, DTS does not scan the data types of primary keys in the source database. For each collection, DTS migrates data corresponding to only one primary key data type.
No: Not unique. During full migration, DTS scans the data types of primary keys in the source database and migrates all data.
Enable Throttling for Incremental Data Migration
If required, you can also choose to set speed limits for the incremental migration task. You can set RPS of Incremental Data Migration and Data migration speed for incremental migration (MB/s) to reduce the load on the destination database.
Note-
This configuration item is available only if you select Incremental Data Migration for Migration Types.
-
You can also adjust the incremental migration speed after the migration instance is running.
Environment Tag
Based on your situation, select an environment label to identify the instance. No selection is needed for this example.
Configure ETL
Based on your business needs, select whether to configure the ETL feature to process data.
-
Yes: Configures the ETL feature. You must also enter data processing statements in the text box.
-
No: Does not configure the ETL feature.
Monitoring and Alerting
Select whether to set alerts and receive alert notifications based on your business needs.
-
No: Does not set an alert.
-
Yes: Configure alerts by setting an alert threshold and an alert notifications. If a migration fails or the latency exceeds the threshold, the system sends an alert notification.
-
-
Click Next: Data Validation to configure a data validation task.
For more information about the data validation feature, see Configure data validation.
-
Save the task and run a precheck.
-
To view the parameters for configuring this instance when you call the API operation, move the pointer over the Next: Save Task Settings and Precheck button and click Preview OpenAPI parameters in the bubble that appears.
-
If you do not need to view or have finished viewing the API parameters, click Next: Save Task Settings and Precheck at the bottom of the page.
Note-
Before the migration task starts, DTS performs a precheck. The task starts only after it passes the precheck.
-
If the precheck fails, click View Details next to the failed check item, fix the issue based on the prompt, and then run the precheck again.
-
If a warning is reported during the precheck:
-
For check items that cannot be ignored, click View Details next to the failed item, fix the issue based on the prompt, and then run the precheck again.
-
For check items that can be ignored, you can click Confirm Alert Details, Ignore, OK, and Precheck Again to skip the alert item and run the precheck again. If you choose to ignore a warning, it may cause issues such as data inconsistency and pose risks to your business.
-
-
Purchase the instance.
-
When the Success Rate is 100%, click Next: Purchase Instance.
-
On the Purchase page, select the link specification for the data migration instance. For more information, see the following table.
Category
Parameter
Description
New Instance Class
Resource Group Settings
Select the resource group to which the instance belongs. The default value is default resource group. For more information, see What is Resource Management?
Instance Class
DTS provides migration specifications with different performance levels. The link specification affects the migration speed. You can select a specification based on your business scenario. For more information, see Data migration link specifications.
-
After the configuration is complete, read and select Data Transmission Service (Pay-as-you-go) Service Terms.
-
Click Buy and Start. In the OK dialog box that appears, click OK.
You can view the progress of the migration task on the Data Migration Tasks list page.
Note-
If the migration task does not include incremental migration, it stops automatically after the full migration is complete. After the task stops, its Status changes to Completed.
-
If the migration task includes incremental migration, it does not stop automatically. The incremental migration task continues to run. While the incremental migration task is running, the Status of the task is Running.
-
-
FAQ
Why do task latency and data inconsistency occur even when no data is written to the database?
Cause: A conflict between the automatic deletion mechanism of TTL indexes in MongoDB collections and the data synchronization mechanism of DTS can cause latency and data inconsistency in synchronization or migration tasks.
Missed DELETE operations during incremental writes reduce efficiency: When the TTL index on the source instance deletes expired data, it generates a DELETE record in the Oplog. DTS then synchronizes this DELETE operation. If the TTL index on the destination instance has already deleted the same data, the DELETE operation from DTS will not find the data to delete. The MongoDB engine then returns an unexpected number of affected rows. This triggers an exception handling process and reduces migration efficiency.
Data inconsistency caused by asynchronous deletion of expired data: A TTL index does not delete data in real time. Expired data might still exist on the source instance when it has already been deleted on the destination instance. This causes data inconsistency.
Example:
The MongoDB Oplog or ChangeStream records only the updated fields for an UPDATE operation. It does not record the full document before and after the update. Therefore, if an UPDATE operation cannot find the target data on the destination, DTS ignores the operation.
Timing
Source instance
Destination instance
1
Service inserts data
2
DTS synchronizes the INSERT operation
3
Data has expired but is not yet deleted by the TTL index
4
Service updates the data (for example, updates the TTL index field to change the expiration time)
5
TTL index deletes the data
6
DTS synchronizes the UPDATE, but the data is not found. The operation is ignored.
As a result, this document is missing from the destination MongoDB instance.
Solution: You need to temporarily modify the expiration time of the TTL index in the destination during synchronization or migration to ensure efficiency and consistency. For more information, see Best practices for synchronizing/migrating collections with TTL indexes when MongoDB is the source.