This topic describes how to use Data Transmission Service (DTS) to migrate a self-managed MongoDB database that uses the sharded cluster architecture to an ApsaraDB for MongoDB sharded cluster instance. DTS allows you to migrate the existing and incremental data of on-premises databases to Alibaba Cloud without service interruptions.
For more information about data migration and synchronization solutions, see Overview.
Prerequisites
The versions of the source and destination MongoDB databases are supported by DTS. For more information, see Overview of data migration scenarios.
Each shard in the destination sharded cluster instance has sufficient storage space.
NoteFor example, a self-managed MongoDB database has three shards, and one of these shards occupies a maximum storage space of 500 GB. In this case, the storage space of each shard in the destination instance must be larger than 500 GB.
How it works
DTS migrates a self-managed MongoDB database by migrating each shard in the database. You must create a data migration task for each shard.
The distribution of migrated data in the destination ApsaraDB for MongoDB instance is based on the shard key that you specify. For more information, see Configure sharding to maximize the performance of shards.
Usage notes
DTS uses the resources of the source and destination databases during full data migration. This may increase the loads on the database servers. If you migrate a large volume of data or if the server specifications do not meet your requirements, database services may become unavailable. Before you migrate data, evaluate the impact of data migration on the performance of the source and destination databases. We recommend that you migrate data during off-peak hours.
If the source and destination MongoDB databases run different MongoDB versions or use different storage engines, make sure that the MongoDB versions or storage engines are compatible. For more information, see MongoDB versions and storage engines.
The data is concurrently written to the destination database. Therefore, the storage space occupied in the destination database is 5% to 10% larger than the size of the data in the source database.
Make sure that the destination ApsaraDB for MongoDB instance does not have the same primary key as that in the source database. The default primary key is _id. Otherwise, data may be lost. If the destination instance has the same primary key as that in the source database, delete the same document that corresponds to the _id primary key in the destination instance as that in the source database without affecting your business.
The admin or local database is not used as the source or destination database.
The number of mongos nodes in the source MongoDB database that uses the sharded cluster architecture cannot exceed 10.
Billing
Migration type | Task configuration fee | Internet traffic fee |
Full data migration | Free of charge. | Charged only when data is migrated from Alibaba Cloud over the Internet. For more information, see Data Transmission Service Pricing. |
Incremental data migration | Charged. For more information, see . Data Transmission Service Pricing. |
Migration types
Full data migration: DTS migrates all existing data of objects in the source MongoDB database to the destination MongoDB database.
NoteDTS can migrate the following types of objects: database, collection, and index.
Incremental data migration: After full data migration is complete, DTS migrates incremental data of the source MongoDB database to the destination MongoDB database.
NoteDTS can migrate the create and delete operations that are performed on databases, collections, and indexes.
DTS can migrate the create, delete, and update operations that are performed on documents.
Permissions required for database accounts
Database | Full data migration | Incremental data migration |
Self-managed MongoDB database | Read permissions on the source database | Read permissions on the source database, the admin database, and the local database |
ApsaraDB for MongoDB instance | Read and write permissions on the destination database | Read and write permissions on the destination database |
For more information about how to create a database account and grant permissions to the database account, see the following topics:
Self-managed MongoDB database: db.createUser()
ApsaraDB for MongoDB instance: Manage user permissions on MongoDB databases
Preparations
Required: Disable the balancer of the self-managed MongoDB database. This prevents the impact of chunk migration on data consistency. For more information, see Manage the ApsaraDB for MongoDB balancer.
WarningIf the balancer is not disabled, chunk migration affects the consistency of the data read by DTS.
Delete the orphaned documents that are generated due to chunk migration failures from the self-managed MongoDB database.
NoteIf you do not delete the orphaned documents, the migration performance will be compromised. In addition, some documents may have duplicate
_id
values and data that you do not want to migrate may be migrated.Download the cleanupOrphaned.js file.
wget "https://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/attach/120562/cn_zh/1564451237979/cleanupOrphaned.js"
Replace
test
in the cleanupOrphaned.js file with the name of the database from which you want to delete orphaned documents.NoteIf you want to delete orphaned documents from multiple databases, repeat Substep b and Substep c in Step 2.
Run the following command on a shard to delete the orphaned documents from all collections in the specified database:
NoteYou must repeat this step on each shard.
mongo --host <Shardhost> --port <Primaryport> --authenticationDatabase <database> -u <username> -p <password> cleanupOrphaned.js
Note<Shardhost>: the IP address of the shard.
<Primaryport>: the service port of the primary node in the shard.
<database>: the name of the database to which the database account belongs.
<username>: the account that is used to log on to the self-managed MongoDB database.
<password>: the password that is used to log on to the self-managed MongoDB database.
Example:
In this example, a self-managed MongoDB database has three shards, and you must delete the orphaned documents on each shard.
mongo --host 172.16.1.10 --port 27018 --authenticationDatabase admin -u dtstest -p 'Test123456' cleanupOrphaned.js
mongo --host 172.16.1.11 --port 27021 --authenticationDatabase admin -u dtstest -p 'Test123456' cleanupOrphaned.js
mongo --host 172.16.1.12 --port 27024 --authenticationDatabase admin -u dtstest -p 'Test123456' cleanupOrphaned.js
Create databases and collections to be sharded in the destination ApsaraDB for MongoDB instance, and configure data sharding based on your business requirements. For more information, see Configure sharding to maximize the performance of shards.
NoteIf you configure data sharding before you start data migration, data in the self-managed MongoDB database is evenly migrated to the shards in the destination sharded cluster instance. This prevents the overloading of a single shard.
Procedure
Log on to the DTS console.
In the left-side navigation pane, click Data Migration.
On the upper part of the Migration Tasks page, select the region in which the destination ApsaraDB for MongoDB instance resides.
In the upper-right corner of the page, click Create Migration Task.
Configure the source and destination databases for the data migration task.
Section
Setting
Description
N/A
Task Name
The task name that DTS automatically generates. We recommend that you specify a name that indicates your business requirements for easy identification. You do not need to use a unique name.
Source Database
Instance Type
The instance type of the source database. In this example, User-Created Database with Public IP Address is selected.
NoteIf you select other instance types, you must deploy the network environment for the self-managed database. For more information, see Preparation overview.
Instance Region
If you select User-Created Database with Public IP Address as the instance type, you do not need to set the Instance Region parameter.
NoteIf a whitelist is configured for the self-managed MongoDB database, you must add the CIDR blocks of DTS servers to the whitelist. You can click Get IP Address Segment of DTS next to Instance Region to obtain the CIDR blocks of DTS servers.
Database Type
The type of the source database. Select MongoDB.
Hostname or IP Address
The endpoint or IP address of a shard in the self-managed MongoDB database. In this example, the public IP address of a shard is used.
NoteDTS migrates each shard of the source database until the whole cluster is migrated. In this example, enter the endpoint or IP address of the first shard. When you configure the second migration task, enter the endpoint or IP address of the second shard. You must repeat this procedure until all shards are migrated.
Port Number
The service port number of the shard.
NoteIn this example, the service port of each shard in the self-managed MongoDB database must be accessible over the Internet.
Database Name
The name of the authentication database. The database account is created in this database.
Database Account
The account that is used to log on to the self-managed MongoDB database. For information about the permissions that are required for the account, see the Permissions required for database accounts section of this topic.
Database Password
The password of the database account.
NoteAfter you specify the information about the source database, you can click Test Connectivity next to Database Password to check whether the information is valid. If the information is correct, the Passed message appears. If the information is incorrect, the Failed message appears and you must click Check next to the Failed message to modify the information.
Encryption
Specifies whether to encrypt the connection to the database. In this example, Non-encrypted is selected.
NoteYou can select SSL-encrypted only when you migrate data from MongoDB Atlas.
Destination Database
Instance Type
The instance type of the destination database. Select MongoDB Instance.
Instance Region
The region in which the destination ApsaraDB for MongoDB instance resides.
MongoDB Instance ID
The ID of the destination sharded cluster instance.
Database Name
The name of the authentication database. The database account is created in this database.
Database Account
The database account of the destination ApsaraDB for MongoDB instance. For information about the permissions that are required for the account, see the Permissions required for database accounts section of this topic.
Database Password
The password of the database account.
NoteAfter you specify the information about the destination instance, you can click Test Connectivity next to Database Password to check whether the information is correct. If the information is correct, the Passed message is displayed. If the information is incorrect, the Failed message is displayed and you must click Check next to the Failed message to modify the information.
In the lower-right corner of the page, click Set Whitelist and Next.
If the source or destination database instance is an Alibaba Cloud database instance, such as an ApsaraDB RDS for MySQL or ApsaraDB for MongoDB instance, or is a self-managed database hosted on ECS, DTS automatically adds the CIDR blocks of DTS servers to a whitelist of the database instance or ECS security group rules. If the source or destination database is a self-managed database on data centers or is from other cloud service providers, you must manually add the CIDR blocks of DTS servers to allow DTS to access the database. For more information, see the "CIDR blocks of DTS servers" section of the Add the CIDR blocks of DTS servers to the security settings of on-premises databases topic.
WarningIf the CIDR blocks of DTS servers are automatically or manually added to the whitelist of the database or instance, or to the ECS security group rules, security risks may arise. Therefore, before you use DTS to migrate data, you must understand and acknowledge the potential risks and take preventive measures, including but not limited to the following measures: enhance the security of your username and password, limit the ports that are exposed, authenticate API calls, regularly check the whitelist or ECS security group rules and forbid unauthorized CIDR blocks, or connect the database to DTS by using Express Connect, VPN Gateway, or Smart Access Gateway.
Select the migration types and the objects to be migrated.
Setting
Description
Select the migration types
To perform only full data migration, select only Full Data Migration.
To ensure service continuity during data migration, select Full Data Migration and Incremental Data Migration.
NoteIf Incremental Data Migration is not selected, we recommend that you do not write data to the self-managed MongoDB database during full data migration. This ensures data consistency between the source and destination databases.
Select the objects to be migrated
Select one or more objects from the Available section. Click the icon and add the objects to the Selected Objects section.
NoteDTS cannot migrate data from the admin, local, or config database.
You can select databases, collections, or functions as the objects to be migrated.
By default, after an object is migrated to the destination instance, the name of the object remains unchanged. You can use the object name mapping feature to rename the objects that are migrated to the destination instance. For more information, see Object name mapping.
Specify whether to rename objects
You can use the object name mapping feature to rename the objects that are migrated to the destination instance. For more information, see Object name mapping.
Specify the retry time range for a failed connection to the source or destination database
By default, if DTS fails to connect to the source and destination databases, DTS retries within the following 12 hours. You can specify the retry time range based on your business requirements. If DTS is reconnected to the source and destination databases within the specified time range, DTS resumes the data migration task. Otherwise, the data migration task fails.
NoteWithin the retry time range in which DTS attempts to reconnect to the source and destination databases, you are charged for the DTS instance. We recommend that you specify the retry time range based on your business requirements. You can also release the DTS instance at the earliest opportunity after the source and destination instances are released.
Click Precheck.
NoteA precheck is performed before the migration task starts. The migration task only starts after the precheck succeeds.
If the precheck fails, click the icon next to each failed check item to view the related details. Fix the issues as instructed and run the precheck again.
After the data migration task passes the precheck, click Next.
In the Confirm Settings dialog box, specify the Instance Class parameter and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start to start the data migration task.
Repeat Steps 1 to 11 to create data migration tasks for the remaining shards.
Stop the data migration tasks.
Full data migration
We recommend that you do not manually stop a task during full data migration. Otherwise, the data migrated to the destination database will be incomplete. You can wait until the full data migration task automatically stops.
Incremental data migration
An incremental data migration task does not automatically stop. You must manually stop the task.
NoteWe recommend that you select an appropriate time to manually stop a data migration task. For example, you can stop the task during off-peak hours or before you switch your workloads to the ApsaraDB for MongoDB instance.
Wait until Incremental Data Migration and The migration task is not delayed are displayed in the progress bar of the data migration task. Then, stop writing data to the source database for a few minutes. The latency of incremental data migration may be displayed in the progress bar.
After the status of incremental data migration changes to The migration task is not delayed again, manually stop the migration tasks for all shards.
Switch your workloads to the destination ApsaraDB for MongoDB instance.