week-5.txt

SHARDED CLUSTER MANAGEMENT

Quiz:

1. Which of the following are essential questions to answer in planning a strategy for scaling out with shards? Check all that apply.
	
   - What tolerance do I have for data loss? 
   - By what factors should I measure the capacity of my system? [T]
   - How many shards do I need to add to keep my capacity under 80%? [T]
   - What is the age of my existing infrastructure and should I consider updating to more reliable hardware? 
   - What growth trend does my application follow along my primary capacity metric? [T]

2. Which of the following accurately describe config servers in a sharded cluster? Check all that apply.

  - You can consider config servers to be identical with regard to the data they contain. [T]
  - The configure servers form a replica set.
  - Requests from mongos processes are distributed across the config servers.
  - Config servers are critical to the health of your sharded cluster. [T]
  - Config servers provide automatic failover such that if one goes down, meta-data updates can continue. 

3. True or false? When upgrading from MongoDB 2.4 to 2.6 there is a strict order in which you must upgrade the mongod, mongos, and config servers.

  - True. [T]
  - False.

4. Which of the following are services a mongos performs? Check all that apply.

  - Coordinates the process of keeping shards balanced [T]
  - Provides an abstraction layer that frees clients from a need to know how the data is sharded [T]
  - Maintains all cluster meta data
  - Merges results from individual shards for queries involving sort() [T]
  - Maintains indexes for shard keys

5. Which of the following occur when a chunk is split?

  - A small split token is placed in the data file
  - The data file is turned into two files, but they are at the same location as they were before
  - A split alters meta data only, there is no change to the data itself [T]
  - The data file is split in two and possibly moved
  - A new chunk is created and new writes go to the new chunk

6. Which of the following is the most likely cause of MongoDB marking a chunk as jumbo?

  - The balancer is not running
  - Too many large documents in your collection
  - Too few shards
  - A shard key with a cardinality that is too low [T]
  - Too many documents in the collection


===============================================================================================================

H-w 5-1:

1. Start configdb server(s):
   mongod --configsvr --dbpath /data/configdb --port 27019

2. Start a mongos instance:
   mongos --configdb HP:27019 --port 27099

3. Start 3 mongod instances:
   mongod --port 27031 --dbpath=/data/db
   mongod --port 27032 --dbpath=/data/db1
   mongod --port 27033 --dbpath=/data/db2

4. You should have 4 mongo processes running on your machine. 

5. Now we need to set the 3 mongod instaces(3 shards) as a cluster, go to the mongos shell:
   mongo --port 27099
   mongos> db.adminCommand({addShard: "HP:27031", "name":"shard0"})
   mongos> db.adminCommand({addShard: "HP:27032", "name":"shard1"})
   mongos> db.adminCommand({addShard: "HP:27033", "name":"shard2"}) 

6. Before doing anything to cluster, we should stop the balancer to avoid automatic chunk migrations:
   mongos> sh.stopBalancer()
   mongos> sh.getBalancerState() //should say false here

7. Now we need to create a db m202 with presplit collection in it:
   mongos> use m202
   mongos> db.createCollection("presplit")

8. Now we need to enable sharding on m202 db and shard the Collections using the "a" as shard key:
   mongos> sh.enableSharding("m202")
   mongos> sh.shardCollection("m202.presplit",{"a":1})

9. Now you need to create chunks:
   mongos> for ( var x=0; x<25; x++ ){ db.adminCommand({ split : "m202.presplit" , middle : { a : x } })} 
   Above command will create 26 chunks on shard0 as its primary.

10. sh.status() will show you the chunks under the m202 database.

11. Inoder to move the chunks from one shard to other shard, You can use the following command:
    db.adminCommand( { moveChunk : "m202.presplit", find : {a:14}, to : "HP:27032"}) 
    I've moved the chunks first according to the question and we need to merge these chunks here according to the ranges.

12. For merging: 
    db.adminCommand( { mergeChunks: "m202.presplit", bounds: [ { "a":0},{"a":2} ] })
    Above command will merge the a:0 --> a:1 and a:1 --> a:2 chunks. 
    So, use the above command accordingly to get the ranges as per the question and sometimes you may need to move the chunks again. 

mongos> sh.status(true)
--- Sharding Status --- 
  sharding version: {
  "_id" : 1,
  "version" : 4,
  "minCompatibleVersion" : 4,
  "currentVersion" : 5,
  "clusterId" : ObjectId("53899ab5505ac666aae1e316")
}
  shards:
  {  "_id" : "shard0",  "host" : "HP:27031" }
  {  "_id" : "shard1",  "host" : "HP:27032" }
  {  "_id" : "shard2",  "host" : "HP:27033" }
  databases:
  {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
  {  "_id" : "m202",  "partitioned" : true,  "primary" : "shard0" }
    m202.presplit
      shard key: { "a" : 1 }
      chunks:
        shard0  4
        shard1  3
        shard2  4
      { "a" : { "$minKey" : 1 } } -->> { "a" : 0 } on : shard0 Timestamp(18, 1) 
      { "a" : 0 } -->> { "a" : 7 } on : shard0 Timestamp(20, 7) 
      { "a" : 7 } -->> { "a" : 10 } on : shard0 Timestamp(20, 9) 
      { "a" : 10 } -->> { "a" : 14 } on : shard0 Timestamp(20, 12) 
      { "a" : 14 } -->> { "a" : 15 } on : shard1 Timestamp(20, 1) 
      { "a" : 15 } -->> { "a" : 20 } on : shard1 Timestamp(20, 16) 
      { "a" : 20 } -->> { "a" : 21 } on : shard1 Timestamp(15, 0) 
      { "a" : 21 } -->> { "a" : 22 } on : shard2 Timestamp(20, 0) 
      { "a" : 22 } -->> { "a" : 23 } on : shard2 Timestamp(17, 0) 
      { "a" : 23 } -->> { "a" : 24 } on : shard2 Timestamp(18, 0) 
      { "a" : 24 } -->> { "a" : { "$maxKey" : 1 } } on : shard2 Timestamp(19, 0) 

mongos> 

mongos> cluster = new ShardingTest({"shards" : 3, "chunksize" : 1, "config": 3, other:{rs:true} }) can be used to create a shard cluster with 3replica nodes as 3 shards with 3 configdbs.

===============================================================================================================   

H-w 5.2:

Which of the following are advantages of pre-splitting the data that is being loaded into a sharded cluster, rather than throwing all of the data in and waiting for it to migrate?

  - You can decide which shard has which data range initially if you pre-split the data

  - Migration takes time, especially when the system is under load

===============================================================================================================

H-w 5.3:

$ mongo --nodb
Now, we will spin up a sharded cluster with 2 shards, each of which has 3 members (a primary, a secondary, and an arbiter) plus one config server. We will do this using ShardingTest as follows.

> cluster = new ShardingTest({shards: 2, chunksize: 1, rs : {nodes : [{}, {}, {arbiter: true}]} });
Give that a minute to spin up and you should see all of your mongod's plus a mongos at port 30999. At this point, you can initialize MongoProc and begin your homework.

Start by initializing with MongoProc. Once you have initialized, your "testDB.testColl" collection will be populated (it will be dropped at the beginning of the initialization if present), and it will be split into chunks, and those chunks will be distributed across each of your shards.

- mongo --port 30999

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
  "_id" : 1,
  "version" : 4,
  "minCompatibleVersion" : 4,
  "currentVersion" : 5,
  "clusterId" : ObjectId("538b7b7d9f91188f8156c2b0")
}
  shards:
  {  "_id" : "test-rs0",  "host" : "test-rs0/HP:31100,HP:31101" }
  {  "_id" : "test-rs1",  "host" : "test-rs1/HP:31200,HP:31201" }
  databases:
  {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
  {  "_id" : "testDB",  "partitioned" : true,  "primary" : "test-rs0" }
    testDB.testColl
      shard key: { "_id" : 1 }
      chunks:
        test-rs1  12
        test-rs0  9
      too many chunks to print, use verbose if you want to force print

mongos> show dbs
admin   (empty)
config  0.016GB
testDB  0.063GB
mongos> use testDB
switched to db testDB
mongos> show collections
system.indexes
testColl
mongos> db.testColl.find({"_id": 90001})
{ "_id" : 90001, "otherID" : ObjectId("5389f8fa9a85db1c5827f5dc"), "string" : "testStringForPadding0000000000000000000000000000000000000000" }
mongos> 
mongos> 
mongos> use admin
switched to db admin

mongos> db.adminCommand({flushRouterConfig:1})
{ "flushed" : true, "ok" : 1 }
mongos> db.testColl.find({"_id": 90001})

Done.

===============================================================================================================

H-w 5.4:

In a sharded cluster, which of the following can keep large chunks from being split as part of MongoDB's balancing process? Check all that apply.

  - Frequent restarts of mongos processes
  - If there are not enough distinct shard key values
  - If one of the config servers is down when a mongos tries to do a split

===============================================================================================================  

H-w 5.5:

Which of the following are REQUIRED when upgrading a sharded cluster to MongoDB 2.6? Check all that apply.

  - If your MongoDB deployment is not already running MongoDB 2.4, upgrade to 2.4 first.
  - Upgrade all mongos instances before upgrading mongod instances.
  - Disable the balancer