All Products
Search
Document Center

ApsaraDB for MongoDB:Defragment the disks of an instance to increase disk utilization

Last Updated:Nov 21, 2024

This topic describes how to run the compact command to deefragment the disks of an ApsaraDB for MongoDB instance to increase disk utilization. Disks fragments are generated due to a high number of data insert and update operations or the data deletion. The command is used to defragment the disks of the primary and secondary nodes in the instance.

We recommend that you use the storage analysis feature displayed in the ApsaraDB for MongoDB console to defragment the disks of the instance. This way, the defragmentation is simplified and the impacts on your business are minimized. The feature allows you only to defragment the disks of hidden nodes in the instance. To defragment the disks of primary and secondary nodes in the instance, perform a primary/secondary switchover. For more information, see Storage analysis.

Prerequisites

The storage engine of the instance is WiredTiger.

Background information

  • After data is deleted from the instance, the storage used by the deleted data is marked as free storage. Newly written data may be directly stored in the free storage, or stored in the end of files after the storage of the files is expanded. As a result, a part of the free storage is not used. Such unused free storage constitutes disk fragments. More disk fragments lowers disk utilization.

  • You can run the db.runCommand({collStats: <collection_name>}) command to access a node and view the storage of a specified collection. Some keywords in the preceding command:

    • size: the logical storage size of the collection.

    • storageSize: the physical storage size of the collection.

    • freeStorageSize: the size of the free storage that can be defragmented. The keyword is required only for instances that run MongoDB 4.4 or later.

    If you run the remove command to delete documents, the size value decreases. However, the storageSize value does not necessarily decrease. In this case, you can view the numeric value that is calculated by the freeStorageSize value divided by the storageSize value. A high numeric value indicates a high defragmentation rate.

Note

For more information about the size, storageSize, and freeStorageSize keywords, see collStats-Output.

  • The compact command is used to defragment the disks of an instance in ApsaraDB for MongoDB. For more information about the compact command, see compact.

Usage notes

  • Data backup: Before you defragment the disks of the instance, we recommend that you back up the instance data. For more information, see Configure manual backup for an instance.

  • Impacts of the compact command:

    • Blocked read/write operations and compromised performance

      • If you run the compact command on an instance that runs a MongoDB version earlier than MongoDB 4.4, the database to which a specified collection belongs is locked and read/write operations performed on the database are blocked. We recommend that you perform this operation during off-peak hours or upgrade your instance to MongoDB 4.4 or later. For more information about how to upgrade the version of an instance, see Upgrade the major version of an instance. For more information about the blocking behavior, see Blocking.

      • If you run the compact command on an instance that runs MongoDB 4.4 or later, read/write operations performed on the instance are not blocked. However, the instance performance may be compromised. We recommend that you this operation during off-peak hours.

    • Node rebuilding

      • In an instance that runs MongoDB 3.4.x, MongoDB 4.0.x, MongoDB 4.0.22 or earlier, or MongoDB 5.0.6 or earlier, a node on which the compact command is running enters the RECOVERING state. If the node remains this state for a long period of time, the node is identified by the instance detection component as an unhealthy node. This triggers rebuilding operations. For more information, see Version Specific Considerations for Secondary Nodes. For more information about MongoDB versions, see Minor versions of ApsaraDB for MongoDB.

      • In an instance that runs a MongoDB version later than the preceding versions, a node on which the compact command is running remains in the SECONDARY state. This does not trigger rebuilding operations.

  • Invalid execution of the compact command:

    The compact command cannot be executed in the following scenarios. For more information, see Open source code.

    • The size of a physical collection is less than 1 MB.

    • Among the first 80% of the file storage, the free storage is less than 20%. Among the first 90% of the file storage, the free storage is less than 10%.

  • Defragmentation duration: The time required to defragment the disks of the instance by running the compact command depends on factors, such as the amount of data in collections and the system load.

  • Others:

    • If you run the compact command, the released storage of the instance may be smaller than the free storage. In this case, make sure that the next execution of the compact command is initiated after the previous execution is completed. This avoids the frequent and repeated execution.

    • The compact command can be run when an instance is locked due to a full disk storage.

Estimate disk storage to be defragmented

  1. Use the mongo shell to connect to the instance. To reduce business impacts, we recommend that you connect to a secondary node in a replica set instance. Connection methods vary based on the instance architecture. For more information, see the following topics:

  2. Switch to the database where a specified collection is stored.

    Syntax:

    use <database_name>

    Parameter in the preceding command: <database_name>: the name of the database to which the collection belongs.

    Note

    You can run the show dbs command to query the name of the current database.

    Example:

    Switch to the test_database database.

    use test_database
  3. View the disk storage to be defragmented for the collection.

    Syntax:

    db.<collection_name>.stats().wiredTiger["block-manager"]["file bytes available for reuse"]

    Parameter in the following command: <collection_name> the name of the collection.

    Note

    You can run the show tables command to query the name of the current collection.

    Example:

    db.test_database_collection.stats().wiredTiger["block-manager"]["file bytes available for reuse"]

    The following result is returned.

    207806464

    This result indicates that the estimated disk storage to be defragmented is 207,806,464 bytes.

Defragment the disks of a standalone or replica set instance

  • A standalone instance has only one node. You can connect to the primary node and run the compact command to defragment the disks of the primary node.

  • A replica set instance has multiple nodes. You must defragment the disks of the primary and secondary node in the instance. To reduce business impacts, we recommend that you defragment the disks of secondary nodes in the instance, perform a primary/secondary switchover to switch the primary node to a secondary node, and then defragment the disks of the new secondary node. For more information about how to perform a primary/secondary switchover, see Configure a primary/secondary switchover for a replica set instance.

    Note

    If the replica set instance has read-only nodes, you must defragment the disks of the nodes by using the method similar to the method to defragment the disks of the primary and secondary nodes.

  1. Connect to a standalone or replica set instance by using the mongo shell. Connection methods vary based on the instance architecture. For more information, see the following topics:

  2. Switch to the database where a specified collection is stored.

    Syntax:

    use <database_name>

    Parameter in the preceding command: <database_name>: the name of the database to which the collection belongs.

    Note

    You can run the show dbs command to query the name of the current database.

    Example:

    Switch to the replica_database database.

    use replica_database
  3. View the disk storage occupied by the database before defragmentation.

    db.stats()
    Note

    This command can be used without any changes.

  4. Defragment the disks of the collection.

    Syntax:

    db.runCommand({compact:"<collection_name>",force:true})

    Parameters in the preceding command:

    • <collection_name>: the name of the collection.

      Note

      You can run the show tables command to query the name of the current collection.

    • force: Optional. Set the value to true.

      This parameter is required if you run the command on the primary node of an instance that runs MongoDB 4.2 or earlier.

    Example:

    db.runCommand({compact:"sharded_collection"})

    The following result is returned:

    { "ok" : 1 }
  5. View the disk storage occupied by the database after defragmentation.

    db.stats()
    Note

    This command can be used without any changes.

Defragment the disks of a sharded cluster instance

For a sharded cluster instance, you need only to defragment the disks of shard components in the instance. The mongos and ConfigServer components in the instance do not store user data. In addition, more add and update operations and less delete operations are performed. Therefore, you do not need to defragment the disks of the mongos and ConfigServer components.

Note

The compact command is not supported on the read-only nodes of the instance Therefore, the disks of read-only nodes cannot be defragmented.

  1. Use the mongo shell to connect to the instance. For more information, see Connect to a sharded cluster instance by using the mongo shell.

  2. Switch to the database where a specified collection is stored.

    Syntax:

    use <database_name>

    Parameter in the preceding command: <database_name>: the name of the database to which the collection belongs.

    Note

    You can run the show dbs command to query the name of the current database.

    Example:

    Switch to the sharded_database database.

    use sharded_database
  3. View the disk storage occupied by the database before defragmentation.

    db.stats()
    Note

    This command can be used without any changes.

  4. Defragment the disks of the collection.

    You must defragment the disks of the primary and secondary nodes in a shard component in the instance. To reduce business impacts, we recommend that you defragment the disks of secondary nodes in the component, perform a primary/secondary switchover to switch the primary node to a secondary node, and then defragment the disks of the new secondary node. For more information about how to perform a primary/secondary switchover, see Configure a primary/secondary switchover for a sharded cluster instance.

    • Defragment the disks of the primary node in the shard component.

      Syntax:

      db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>",force:true}})

      Parameters in the preceding command:

      • <Shard ID>: the ID of the shard component.

        Note

        You can log on to the ApsaraDB for MongoDB console and view the ID of the shard component in the Shard List section on the Basic Information page.

      • <collection_name>: the name of the collection.

        Note

        You can run the show tables command to query the name of the current collection.

      • force: Optional. Set the value to true.

        This parameter is required if you run the command on a sharded cluster instance that runs MongoDB 4.2 or earlier.

      Example:

      db.runCommand({runCommandOnShard:"shard01","command":{compact:"sharded_collection",force:true}})
    • Defragment the disks of secondary nodes in the shard component.

      This operation is performed in the mongo shell in a different manner from that in mongosh. Select the operations that suit your client.

      mongo shell

      Syntax:

      db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>"},$queryOptions: {$readPreference: {mode: 'secondary'}}})

      Parameters in the preceding command:

      • <Shard ID>: the ID of the shard component.

        Note

        You can log on to the ApsaraDB for MongoDB console and view the ID of the shard component in the Shard List section on the Basic Information page.

      • <collection_name>: the name of the collection.

        Note

        You can run the show tables command to query the name of the current collection.

      Example:

      db.runCommand({runCommandOnShard:"shard01","command":{compact:"sharded_collection"},$queryOptions: {$readPreference: {mode: 'secondary'}}})

      mongosh

      Note

      The runCommandOnShard command is not supported by mongosh of v2.x. Run this command in mongosh v1.x.

      Syntax:

      db.getMongo().setReadPref('secondary')
      db.runCommand({runCommandOnShard:"<Shard ID>","command":{compact:"<collection_name>"}})

      Parameters in the preceding command:

      • <Shard ID>: the ID of the shard component.

        Note

        You can log on to the ApsaraDB for MongoDB console and view the ID of the shard component in the Shard List section on the Basic Information page.

      • <collection_name>: the name of the collection.

        Note

        You can run the show tables command to query the name of the current collection.

      Example:

      db.getMongo().setReadPref('secondary')
      db.runCommand({runCommandOnShard:"d-2ze91ae9d55d6604","command":{compact:"test"}})
  5. View the disk storage occupied by the database after defragmentation.

    db.stats()
    Note

    This command can be used without any changes.