You can configure sharding for each collection in a sharded cluster instance to make full use of the storage space of shards in the instance and maximize the computing performance of these shards.
Background information
If a collection is not sharded, all data in the collection is stored on the same shard node. In this case, the storage space of other shards cannot be fully used and the computing performance of these shards is compromised.
Prerequisites
The instance is a sharded cluster instance.
Usage notes
You cannot change or delete the configured shard key after sharding.
After you configure sharding, the balancer shards existing data that meets the specified criteria, which consumes the resources of an instance. We recommend that you perform this operation during off-peak hours.
NoteBefore configuring sharding, you can set an active time window to limit the effective period of the balancer to off-peak hours. For more information, see Set an active time window for the balancer.
The choice of a shard key affects the performance of a sharded cluster instance. For more information about how to choose a shard key, see Shard Keys.
Sharding strategies
Sharding strategy | Description | Scenario |
Ranged sharding | MongoDB divides data into contiguous ranges determined by the shard key values. Each chunk represents a contiguous range of data.
| The shard key value is not monotonically increasing or decreasing. The shard key has large cardinality and low frequency. Range-based queries are required. |
Hashed sharding | MongoDB computes the hash value of a single field as the index value and divides data into chunks based on the range of hash values.
| The shard key value is monotonically increasing or decreasing. The shard key has large cardinality and low frequency. Data writes are randomly distributed to shards. Data is read with high randomness. |
In addition to the preceding two sharding strategies, you can also configure a compound shard key. For example, configure both a key with low cardinality and a monotonically increasing key. For more information, see Shard key selection.
Procedure
The following procedure uses the database named mongodbtest and the collection named customer as an example.
Connect to an ApsaraDB for MongoDB sharded cluster instance by using the mongo shell.
Enable sharding for the database where the collection to be sharded resides.
ImportantIf your instance runs MongoDB 6.0 or later, skip this step. For more information, see sh.enableSharding().
sh.enableSharding("<database>")
<database>
: the name of the database.Example:
sh.enableSharding("mongodbtest")
NoteYou can run the
sh.status()
command to check whether sharding is enabled.Create an index on the shard key field.
db.<collection>.createIndex(<keyPatterns>,<options>)
Parameters in the preceding command:
<collection>
: the name of the collection.<keyPatterns>
: the field used for indexing and the index type.Common index types are as follows:
1: an ascending index
-1: a descending index
"hashed": a hashed index
<options>
: the optional parameters. For more information, see db.collection.createIndex(). This field is not used in this example.
Sample command for creating an ascending index:
db.customer.createIndex({name:1})
Sample command for creating a hashed index:
db.customer.createIndex({name:"hashed"})
Configure sharding for the collection.
sh.shardCollection("<database>.<collection>",{ "<key>":<value> } )
Parameters in the preceding command:
<database>
: the name of the database.<collection>
: the name of the collection.<key>
: the shard key that MongoDB uses to shard data.<value>
1: ranged sharding. This strategy supports efficient range-based queries based on the shard key.
"hashed": hashed sharding. This strategy distributes data writes evenly among shards.
Sample command for configuring ranged sharding:
sh.shardCollection("mongodbtest.customer",{"name":1})
Sample command for configuring hashed sharding:
sh.shardCollection("mongodbtest.customer",{"name":"hashed"})
What to do next
After the instance has been running and data has been written for a while, you can run the sh.status()
command in the mongo shell to view data distribution among shards.
You can also run the db.stats()
command to view the size of data stored on each shard.