This repository contains the complete setup for a MongoDB sharded cluster deployed on a Kubernetes environment using kind
(Kubernetes in Docker). The configuration includes multiple replica sets, sharded data, and routing, all designed to achieve high availability, scalability, and fault tolerance.
- Introduction
- Prerequisites
- Architecture
- Setup Guide
- Scaling and Load Testing
- Use Case: Sharding a User Collection by
userId
- Troubleshooting
- Contributing
- License
This project demonstrates how to deploy a fully functional MongoDB sharded cluster on Kubernetes using StatefulSets for MongoDB components, including config servers, shard servers, and mongos routing services.
The cluster architecture consists of:
- 4 Shards with 2 or 3 replica sets each
- 3 Config servers (replica set)
- 1 MongoS routing server
The project is ideal for anyone looking to implement a scalable, distributed database in Kubernetes for real-world applications or testing.
To set up and deploy this MongoDB sharded cluster, ensure you have the following installed:
The cluster is deployed as follows:
- Shard 1: Replica set with instances on ports 27301, 27302, 27303
- Shard 2: Replica set with instances on ports 27401, 27402, 27403
- Shard 3: Replica set with instances on ports 27501, 27502, 27503
- Shard 4: Replica set with instances on ports 27601, 27602, 27603
- Config servers: 3 instances forming a replica set on ports 27201, 27202, 27203
- MongoS router: Instance on port 27100
git clone https://github.com/Mx-Bx/k8smongodb-shardedcluster.git
cd k8smongodb-shardedcluster
Deploy the MongoDB shards and config servers using the provided Kubernetes manifests:
kubectl apply -R -f manifests/.
## OR
./start-all.sh
Verify if everything is running correctly:
./check-status.sh
Deploy the MongoDB shards and config servers using the provided Kubernetes manifests:
./init-sharding.sh
Enable sharding for a database:
sh.enableSharding("dbTest")
To verify that the sharding is working correctly, check the status:
kubectl exec -it $(kubectl get pods -l app=mongos -o jsonpath="{.items[0].metadata.name}") -- mongosh --eval "sh.status()"
You can scale the replica sets or shards by modifying the StatefulSet configuration files.
For load testing, consider using a dataset like MongoDB's sample datasets or generate custom data using MongoDB's mongoimport
tool.
First, we will create a database called testdb
and enable sharding on it.
-
Connect to the
mongos
instance:kubectl exec -it $(kubectl get pods -l app=mongos -o jsonpath="{.items[0].metadata.name}") -- mongosh --port 27100
-
Enable sharding on the
testdb
database:use testdb sh.enableSharding("testdb")
Next, we will create a collection called users
and shard it based on the userId
field, which will act as our shard key.
db.createCollection("users")
sh.shardCollection("testdb.users", { "userId": 1 })
## or
db.users.createIndex({ 'userId': "hashed" })
sh.shardCollection("testdb.users", { 'userId': "hashed" })
## To verify if indexes were created:
db.users.getIndexes()
Now, we’ll generate a significant dataset of users to observe the sharding behavior. We’ll use a simple loop to insert a large number of documents with userId
values that will be evenly distributed across shards.
In the mongos
shell, run the following script to insert 100,000 user documents:
let batch = [];
for (let i = 1; i <= 100000; i++) {
batch.push({ userId: i, name: "User " + i, age: Math.floor(Math.random() * 50) + 18 });
if (batch.length === 1000) { // Insert every 1000 documents
db.users.insertMany(batch);
batch = [];
}
}
if (batch.length > 0) {
db.users.insertMany(batch); // Insert remaining documents
}
This will insert 100,000 users into the users
collection with random ages. The userId
field is used as the shard key, which will help distribute the documents across your shards.
Once the dataset is inserted, you can verify how the chunks have been distributed across the shards. Use the following command in the mongos
shell:
db.adminCommand({ balancerStatus: 1 })
This will show whether the balancer is actively distributing chunks across shards.
Next, you can check the chunk distribution for the users
collection:
db.printShardingStatus()
db.users.getShardDistribution()
Look for the testdb.users
section in the output, which will display the chunk distribution across your shards. Each chunk should represent a range of userId
values, and you should see how many chunks are assigned to each shard.
You can perform some test queries to check how sharding affects the query results.
For example, to query users with specific userId
ranges and see how MongoDB handles it across shards:
db.users.find({ userId: { $gte: 1000, $lt: 2000 } }).explain("executionStats")
The output will show how many shards were involved in the query.
- Enable sharding on the
testdb
database. - Shard the
users
collection byuserId
. - Insert 100,000 users with
userId
values. - Check chunk distribution using
db.printShardingStatus()
. - Run queries to observe how the data is distributed across shards.
If
you run into any issues, you can inspect the logs of MongoDB components:
kubectl logs <pod-name>
You can run a simple busybox
or alpine
pod that includes telnet
or curl
. Here’s how to create a test pod:
kubectl run test-network-a --image=busybox --restart=Never -- sh -c "sleep 3600"
Once the pod is running, exec into it and check connectivity from there:
kubectl exec -it test-network-a -- sh
Inside the pod, run the following command to test connectivity to config-server-2
:
telnet config-server-2.config-server.default.svc.cluster.local 27201
or
nc -zv config-server-2.config-server.default.svc.cluster.local 27201
For common issues related to networking, persistent storage, or pod scheduling, refer to the Kubernetes documentation or MongoDB sharding documentation.
If you'd like to contribute, feel free to submit a pull request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License. See the LICENSE file for details.