All Products
Search
Document Center

Data Transmission Service:Migrate data from a self-managed MongoDB database that uses the sharded cluster architecture to ApsaraDB for MongoDB by using DTS

Last Updated:Sep 14, 2024

This topic describes how to use Data Transmission Service (DTS) to migrate a self-managed MongoDB database that uses the sharded cluster architecture to an ApsaraDB for MongoDB sharded cluster instance. DTS allows you to migrate the existing and incremental data of on-premises databases to Alibaba Cloud without service interruptions.

For more information about data migration and synchronization solutions, see Overview.

Prerequisites

  • The versions of the source and destination MongoDB databases are supported by DTS. For more information, see Overview of data migration scenarios.

  • Each shard in the destination sharded cluster instance has sufficient storage space.

    Note

    For example, a self-managed MongoDB database has three shards, and one of these shards occupies a maximum storage space of 500 GB. In this case, the storage space of each shard in the destination instance must be larger than 500 GB.

How it works

DTS migrates a self-managed MongoDB database by migrating each shard in the database. You must create a data migration task for each shard.

Note

The distribution of migrated data in the destination ApsaraDB for MongoDB instance is based on the shard key that you specify. For more information, see Configure sharding to maximize the performance of shards.

迁移原理

Usage notes

  • DTS uses the resources of the source and destination databases during full data migration. This may increase the loads on the database servers. If you migrate a large volume of data or if the server specifications do not meet your requirements, database services may become unavailable. Before you migrate data, evaluate the impact of data migration on the performance of the source and destination databases. We recommend that you migrate data during off-peak hours.

  • If the source and destination MongoDB databases run different MongoDB versions or use different storage engines, make sure that the MongoDB versions or storage engines are compatible. For more information, see MongoDB versions and storage engines.

  • The data is concurrently written to the destination database. Therefore, the storage space occupied in the destination database is 5% to 10% larger than the size of the data in the source database.

  • Make sure that the destination ApsaraDB for MongoDB instance does not have the same primary key as that in the source database. The default primary key is _id. Otherwise, data may be lost. If the destination instance has the same primary key as that in the source database, delete the same document that corresponds to the _id primary key in the destination instance as that in the source database without affecting your business.

  • The admin or local database is not used as the source or destination database.

  • The number of mongos nodes in the source MongoDB database that uses the sharded cluster architecture cannot exceed 10.

Billing

Migration type

Task configuration fee

Internet traffic fee

Full data migration

Free of charge.

Charged only when data is migrated from Alibaba Cloud over the Internet. For more information, see Data Transmission Service Pricing.

Incremental data migration

Charged. For more information, see . Data Transmission Service Pricing.

Migration types

  • Full data migration: DTS migrates all existing data of objects in the source MongoDB database to the destination MongoDB database.

    Note

    DTS can migrate the following types of objects: database, collection, and index.

  • Incremental data migration: After full data migration is complete, DTS migrates incremental data of the source MongoDB database to the destination MongoDB database.

    Note
    • DTS can migrate the create and delete operations that are performed on databases, collections, and indexes.

    • DTS can migrate the create, delete, and update operations that are performed on documents.

Permissions required for database accounts

Database

Full data migration

Incremental data migration

Self-managed MongoDB database

Read permissions on the source database

Read permissions on the source database, the admin database, and the local database

ApsaraDB for MongoDB instance

Read and write permissions on the destination database

Read and write permissions on the destination database

For more information about how to create a database account and grant permissions to the database account, see the following topics:

Preparations

  1. Required: Disable the balancer of the self-managed MongoDB database. This prevents the impact of chunk migration on data consistency. For more information, see Manage the ApsaraDB for MongoDB balancer.

    Warning

    If the balancer is not disabled, chunk migration affects the consistency of the data read by DTS.

  2. Delete the orphaned documents that are generated due to chunk migration failures from the self-managed MongoDB database.

    Note

    If you do not delete the orphaned documents, the migration performance will be compromised. In addition, some documents may have duplicate _id values and data that you do not want to migrate may be migrated.

    1. Download the cleanupOrphaned.js file.

      wget "https://docs-aliyun.cn-hangzhou.oss.aliyun-inc.com/assets/attach/120562/cn_zh/1564451237979/cleanupOrphaned.js"
    2. Replace test in the cleanupOrphaned.js file with the name of the database from which you want to delete orphaned documents.

      Note

      If you want to delete orphaned documents from multiple databases, repeat Substep b and Substep c in Step 2.

    3. Run the following command on a shard to delete the orphaned documents from all collections in the specified database:

      Note

      You must repeat this step on each shard.

      mongo --host <Shardhost> --port <Primaryport>  --authenticationDatabase <database> -u <username> -p <password> cleanupOrphaned.js
      Note
      • <Shardhost>: the IP address of the shard.

      • <Primaryport>: the service port of the primary node in the shard.

      • <database>: the name of the database to which the database account belongs.

      • <username>: the account that is used to log on to the self-managed MongoDB database.

      • <password>: the password that is used to log on to the self-managed MongoDB database.

      Example:

      In this example, a self-managed MongoDB database has three shards, and you must delete the orphaned documents on each shard.

      mongo --host 172.16.1.10 --port 27018  --authenticationDatabase admin -u dtstest -p 'Test123456' cleanupOrphaned.js
      mongo --host 172.16.1.11 --port 27021 --authenticationDatabase admin -u dtstest -p 'Test123456' cleanupOrphaned.js
      mongo --host 172.16.1.12 --port 27024  --authenticationDatabase admin -u dtstest -p 'Test123456' cleanupOrphaned.js
  3. Create databases and collections to be sharded in the destination ApsaraDB for MongoDB instance, and configure data sharding based on your business requirements. For more information, see Configure sharding to maximize the performance of shards.

    Note

    If you configure data sharding before you start data migration, data in the self-managed MongoDB database is evenly migrated to the shards in the destination sharded cluster instance. This prevents the overloading of a single shard.

Procedure

  1. Log on to the DTS console.

  2. In the left-side navigation pane, click Data Migration.

  3. On the upper part of the Migration Tasks page, select the region in which the destination ApsaraDB for MongoDB instance resides.

  4. In the upper-right corner of the page, click Create Migration Task.

  5. Configure the source and destination databases for the data migration task. 配置源库和目标库信息

    Section

    Setting

    Description

    N/A

    Task Name

    The task name that DTS automatically generates. We recommend that you specify a name that indicates your business requirements for easy identification. You do not need to use a unique name.

    Source Database

    Instance Type

    The instance type of the source database. In this example, User-Created Database with Public IP Address is selected.

    Note

    If you select other instance types, you must deploy the network environment for the self-managed database. For more information, see Preparation overview.

    Instance Region

    If you select User-Created Database with Public IP Address as the instance type, you do not need to set the Instance Region parameter.

    Note

    If a whitelist is configured for the self-managed MongoDB database, you must add the CIDR blocks of DTS servers to the whitelist. You can click Get IP Address Segment of DTS next to Instance Region to obtain the CIDR blocks of DTS servers.

    Database Type

    The type of the source database. Select MongoDB.

    Hostname or IP Address

    The endpoint or IP address of a shard in the self-managed MongoDB database. In this example, the public IP address of a shard is used.

    Note

    DTS migrates each shard of the source database until the whole cluster is migrated. In this example, enter the endpoint or IP address of the first shard. When you configure the second migration task, enter the endpoint or IP address of the second shard. You must repeat this procedure until all shards are migrated.

    Port Number

    The service port number of the shard.

    Note

    In this example, the service port of each shard in the self-managed MongoDB database must be accessible over the Internet.

    Database Name

    The name of the authentication database. The database account is created in this database.

    Database Account

    The account that is used to log on to the self-managed MongoDB database. For information about the permissions that are required for the account, see the Permissions required for database accounts section of this topic.

    Database Password

    The password of the database account.

    Note

    After you specify the information about the source database, you can click Test Connectivity next to Database Password to check whether the information is valid. If the information is correct, the Passed message appears. If the information is incorrect, the Failed message appears and you must click Check next to the Failed message to modify the information.

    Encryption

    Specifies whether to encrypt the connection to the database. In this example, Non-encrypted is selected.

    Note

    You can select SSL-encrypted only when you migrate data from MongoDB Atlas.

    Destination Database

    Instance Type

    The instance type of the destination database. Select MongoDB Instance.

    Instance Region

    The region in which the destination ApsaraDB for MongoDB instance resides.

    MongoDB Instance ID

    The ID of the destination sharded cluster instance.

    Database Name

    The name of the authentication database. The database account is created in this database.

    Database Account

    The database account of the destination ApsaraDB for MongoDB instance. For information about the permissions that are required for the account, see the Permissions required for database accounts section of this topic.

    Database Password

    The password of the database account.

    Note

    After you specify the information about the destination instance, you can click Test Connectivity next to Database Password to check whether the information is correct. If the information is correct, the Passed message is displayed. If the information is incorrect, the Failed message is displayed and you must click Check next to the Failed message to modify the information.

  6. In the lower-right corner of the page, click Set Whitelist and Next.

    If the source or destination database instance is an Alibaba Cloud database instance, such as an ApsaraDB RDS for MySQL or ApsaraDB for MongoDB instance, or is a self-managed database hosted on ECS, DTS automatically adds the CIDR blocks of DTS servers to a whitelist of the database instance or ECS security group rules. If the source or destination database is a self-managed database on data centers or is from other cloud service providers, you must manually add the CIDR blocks of DTS servers to allow DTS to access the database. For more information, see the "CIDR blocks of DTS servers" section of the Add the CIDR blocks of DTS servers to the security settings of on-premises databases topic.

    Warning

    If the CIDR blocks of DTS servers are automatically or manually added to the whitelist of the database or instance, or to the ECS security group rules, security risks may arise. Therefore, before you use DTS to migrate data, you must understand and acknowledge the potential risks and take preventive measures, including but not limited to the following measures: enhance the security of your username and password, limit the ports that are exposed, authenticate API calls, regularly check the whitelist or ECS security group rules and forbid unauthorized CIDR blocks, or connect the database to DTS by using Express Connect, VPN Gateway, or Smart Access Gateway.

  7. Select the migration types and the objects to be migrated. 选择迁移对象和迁移类型选择

    Setting

    Description

    Select the migration types

    • To perform only full data migration, select only Full Data Migration.

    • To ensure service continuity during data migration, select Full Data Migration and Incremental Data Migration.

    Note

    If Incremental Data Migration is not selected, we recommend that you do not write data to the self-managed MongoDB database during full data migration. This ensures data consistency between the source and destination databases.

    Select the objects to be migrated

    • Select one or more objects from the Available section. Click the Rightwards arrow icon and add the objects to the Selected Objects section.

      Note

      DTS cannot migrate data from the admin, local, or config database.

    • You can select databases, collections, or functions as the objects to be migrated.

    • By default, after an object is migrated to the destination instance, the name of the object remains unchanged. You can use the object name mapping feature to rename the objects that are migrated to the destination instance. For more information, see Object name mapping.

    Specify whether to rename objects

    You can use the object name mapping feature to rename the objects that are migrated to the destination instance. For more information, see Object name mapping.

    Specify the retry time range for a failed connection to the source or destination database

    By default, if DTS fails to connect to the source and destination databases, DTS retries within the following 12 hours. You can specify the retry time range based on your business requirements. If DTS is reconnected to the source and destination databases within the specified time range, DTS resumes the data migration task. Otherwise, the data migration task fails.

    Note

    Within the retry time range in which DTS attempts to reconnect to the source and destination databases, you are charged for the DTS instance. We recommend that you specify the retry time range based on your business requirements. You can also release the DTS instance at the earliest opportunity after the source and destination instances are released.

  8. Click Precheck.

    Note
    • A precheck is performed before the migration task starts. The migration task only starts after the precheck succeeds.

    • If the precheck fails, click the Note icon next to each failed check item to view the related details. Fix the issues as instructed and run the precheck again.

  9. After the data migration task passes the precheck, click Next.

  10. In the Confirm Settings dialog box, specify the Instance Class parameter and select Data Transmission Service (Pay-as-you-go) Service Terms.

  11. Click Buy and Start to start the data migration task.

  12. Repeat Steps 1 to 11 to create data migration tasks for the remaining shards.

  13. Stop the data migration tasks.

    • Full data migration

      We recommend that you do not manually stop a task during full data migration. Otherwise, the data migrated to the destination database will be incomplete. You can wait until the full data migration task automatically stops.

    • Incremental data migration

      An incremental data migration task does not automatically stop. You must manually stop the task.

      Note

      We recommend that you select an appropriate time to manually stop a data migration task. For example, you can stop the task during off-peak hours or before you switch your workloads to the ApsaraDB for MongoDB instance.

      1. Wait until Incremental Data Migration and The migration task is not delayed are displayed in the progress bar of the data migration task. Then, stop writing data to the source database for a few minutes. The latency of incremental data migration may be displayed in the progress bar.

      2. After the status of incremental data migration changes to The migration task is not delayed again, manually stop the migration tasks for all shards.结束迁移任务

  14. Switch your workloads to the destination ApsaraDB for MongoDB instance.