Without sharding, all data in a collection resides on a single shard while other shards remain idle. Sharding distributes data across all shards so each shard handles a portion of the workload.
Prerequisites
The instance must be a sharded cluster instance.
Shard key selection
A shard key determines how MongoDB distributes data across shards. Choose the shard key carefully because it directly affects query performance and data distribution. For more information about shard keys, see Shard keys and How to select a shard key.
A good shard key has three properties:
High cardinality: The key has many distinct values, enabling fine-grained data distribution.
Low frequency: No single key value dominates. Frequent values create large, unmovable chunks.
Non-monotonic change: The key does not steadily increase or decrease. Monotonic keys direct all writes to one shard.
Shard key mutability depends on the MongoDB version:
Before 4.4: Shard keys cannot be modified or deleted after creation.
4.4 and later: refineCollectionShardKey adds a suffix field to an existing shard key.
5.0 and later: reshardCollection completely changes the shard key.
Sharding strategies
MongoDB supports range sharding, hashed sharding, and compound shard keys.
| Property | Range sharding | Hashed sharding |
|---|---|---|
| Mechanism | Divides data into chunks based on shard key value ranges | Hashes a single field value and distributes data by hash ranges |
| Distribution | Can be uneven; may create hot spots with no write distribution | Even distribution across all shard nodes; provides write distribution |
| Range queries | Efficient. The mongos router locates the target shard directly. | Inefficient. Queries broadcast to all shard nodes. |
| Best for | Non-monotonic keys with high cardinality and low frequency; workloads that rely on range queries | Monotonically increasing or decreasing keys with high cardinality; workloads that require random reads and writes |
| Shard key value | 1 (ascending) or -1 (descending) | "hashed" |
Compound shard keys combine multiple fields. Use compound keys when no single field provides sufficient cardinality or when queries filter on multiple fields. For example, combine a low-cardinality key with a monotonically increasing key.
Choose a strategy
Use range sharding when queries frequently scan a range of shard key values. Make sure the key is non-monotonic to avoid hot spots.
Use hashed sharding when write volume is high and data must be distributed evenly. Accept the trade-off that range queries hit all shard nodes.
Shard a collection
The following procedure uses the mongodbtest database and customer collection as examples.
Step 1: Connect to the sharded cluster
Connect to the sharded cluster instance through the mongo shell. For connection methods, see Log on to a sharded cluster instance.
Step 2: Enable sharding for the database
Skip this step if the instance runs MongoDB 6.0 or later. Starting from MongoDB 6.0, sh.enableSharding() is no longer required.
Run the following command to enable sharding on the target database:
sh.enableSharding("mongodbtest")Step 3: Create an index on the shard key
Before sharding a collection, create an index that supports the shard key.
Syntax:
db.<collection>.createIndex(<keyPatterns>, <options>)For more information, see db.collection.createIndex().
Range sharding example (ascending index on the name field):
db.customer.createIndex({ name: 1 })Hashed sharding example (hashed index on the name field):
db.customer.createIndex({ name: "hashed" })Index type values:
| Value | Type |
|---|---|
1 | Ascending |
-1 | Descending |
"hashed" | Hashed |
Step 4: Shard the collection
Run sh.shardCollection() to shard the collection.
Syntax:
sh.shardCollection("<database>.<collection>", { "<key>": <value> })Range sharding example:
sh.shardCollection("mongodbtest.customer", { "name": 1 })Hashed sharding example:
sh.shardCollection("mongodbtest.customer", { "name": "hashed" })The <value> parameter determines the sharding strategy:
| Value | Strategy |
|---|---|
1 | Range sharding |
"hashed" | Hashed sharding |
Step 5: Wait for the balancer to distribute data
After you configure sharding, the balancer automatically splits data that meets the criteria and migrates chunks across shard nodes.
The balancer consumes instance resources during data migration. Perform initial sharding during off-peak hours. Set an active window for the balancer to restrict balancer activity to specific time periods and prevent performance impact during peak hours.
Verify the sharding configuration
Check data distribution
Run sh.status() to view the sharding status and data distribution across shard nodes:
sh.status()Check storage per shard
Run db.stats() to view storage statistics for each shard node:
db.stats()