-
Notifications
You must be signed in to change notification settings - Fork 26
Sharding
To define your database instance connections, simply define:
instance:
local:
dsn: 'mysql:host=localhost'
user: root
Please note that the instance definition doesn't include database name.
A shard mapping defines how do we allocate data on different shards.
A shard mapping config may consist of:
-
tablestable names that use this shard rule. -
keythe shard key on each table we defined intables. -
shardsthe data nodes that used for this shard rule. -
chunksthe slices cross the shards used in this rule. -
hashorrangethe shard method used in this rule.
One SG (Shard Group) consists of nodes for write and nodes for read. one SG (Shard Group) may contains one or more chunks.
e.g., one shard may contain one or more chunks:
Shard1 [ Chunk1, Chunk2, Chunk3, ... ]
Shard2 [ Chunk513, Chunk514, Chunk515, ... ]
...
Each chunk has its own chunk name (for its database) and the shard ID:
c1 => ['shard' => 's1', 'dbname' => 'bossnet_c1'],
c2 => ['shard' => 's1', 'dbname' => 'bossnet_c2'],
c3 => ['shard' => 's1', 'dbname' => 'bossnet_c3'],
Every chunk maybe applied to the different nodes in the same shard. e.g.
c1 on the node for read in shard s1.
The above operation select the read node from c1's shard s1. for read.
And so, when dispatching a record, we select the chunk first, and then select the node for read/write on the related shard.
A internal shard mapping config looks like this:
'M_store_id' => [
'key' => 'store_id',
'shards' => [ 's1', 's2' ],
'method' => 'hash',
'chunks' => [
536870912 => [ 'shard' => 'node1'],
1073741824 => [ 'shard' => 'node1'],
1610612736 => [ 'shard' => 'node1'],
2147483648 => [ 'shard' => 'node2'],
2684354560 => [ 'shard' => 'node2'],
3221225472 => [ 'shard' => 'node2'],
3758096384 => [ 'shard' => 'node3'],
4294967296 => [ 'shard' => 'node3'],
],
],
By default, the 'chunks' list is empty. So you have to create chunks first.
$shards = Book::shards(); // returns Shards of the model.
foreach ($shards as $shardId => $shard) {
$shard; // instance of Maghead\Sharding\Shard
}
Shard Manager manages the connections to shards. the connections used by shards
are defined in the databases section in the config file.
To create the shard manager, you need two arguments: Config object and ConnectionManager instance.
use Maghead\Sharding\Manager\ShardManager;
$shardManager = new ShardManager($config, $dataSourceManager);
To get the shards used by one shard mapping, simply call loadShardCollectionOf with the
related shard mapping ID:
$shards = $shardManager->loadShardCollectionOf('M_store_id');
The returned $shards is a Maghead\Sharding\ShardCollection instance.
To create the shard dispatcher, simply invoke createDispatcher on the shard
collection instance:
$dispatcher = $shards->createDispatcher();
To add a new shard mapping config:
maghead shard mapping add [mappingId] \
--key store_id \
--hash \
--shards "s1,s2,s3" \
--chunks 32
To remove a shard mapping config:
maghead shard mapping remove [mappingId]
Create an empty shard with the corresponding schema.
maghead shard allocate \
--mapping [mappingIds] \
--instance [instanceId] \
[shardId]
To use the allocate operation:
$config = ConfigLoader::loadFromFile('.../config.yml');
$o = new AllocateShard($config);
$o->allocate('local', 't1', 'M_store_id');
The above command allocate a new node t1 on local instance and initialize
the schemas related to the shard mapping.
Where local is the instance ID, t1 is the node ID for the new shard, and
M_store_id is the shard mapping ID defined in the config file.
Clone an existing shard on the same instance.
maghead shard clone --instance [instanceId] [source shard] [dest shard]
Before cloning the shard, be sure to have the mysql instance defined in the config.
The ShardCloning operation uses mysqldbcopy to copy the database.
To clone a shard in PHP code:
$config = ConfigLoader::loadFromFile('.../config.yml');
$o = new CloneShard($config);
$o->setDropFirst(true);
$o->clone('local', 't2', 'master');
The above code creates a new node t2 and copy the data from master.
Move a shard to an instance:
maghead shard move --instance [instanceId] nodeId
Prune all rows that doesn't belong to the shard.
maghead shard prune --mapping [mappingId] [shard]
Shard pruning finds all schema related to the shard mapping, and then iterate each collection to prune the rows that does not belong to the shard itself.
$config = ConfigLoader::loadFromFile('.../config.yml');
$o = new PruneShard($config, $logger);
$o->prune('M_store_id', $schemas, 't1');
To create a new record on the global table, the repository will first find the master node to insert the record to get the primary key of the record.
The second step will be: inserting the newly created record with primary key into all shards used by the shard mapping defined in the schema.
The API is the same as when the ORM doesn't use sharding:
$ret = Store::create([ 'name' => 'Shop III', 'code' => 'BS001' ]);
To update a record on the global table, the repository will first find the master node to update the record with the primary key.
The second step will be: updating the record with new values by primary key in all shards used by the shard mapping defined in the schema.
The API is the same as when the ORM doesn't use sharding:
$ret = $store->update([ 'code' => 'BS002' ]);
To delete a record on the global table, the repository will first find the master node to insert the record to get the primary key of the record.
The second step will be: delete all records existed in all shards used by the shard mapping defined in the schema.
Initially, one shard might only have one chunk (the shard itself). To split
the existing one chunk into many chunks, you must run chunks:init command to
initialize the chunks for a shard mapping:
maghead shard chunks:init M_store_id 1024
the command above will get the existing shards from M_store_id,
'M_store_id' => [
'tables' => ['orders'], // This is something that we will define in the schema.
'key' => 'store_id',
'shards' => [ 's1', 's2' ],
'chunks' => [ ],
'hash' => [],
],
The chunk manager then iterate the shards, and the create chunks on each shard, a new chunk list will be allocated in the shard mapping config:
'M_store_id' => [
'tables' => ['orders'], // This is something that we will define in the schema.
'key' => 'store_id',
'shards' => [ 's1', 's2' ],
'chunks' => [
10000 => ['shard' => 's1' ], // the dbname may override the dbname in the DSN of the shard.
20000 => ['shard' => 's1' ],
30000 => ['shard' => 's1' ],
40000 => ['shard' => 's2' ],
50000 => ['shard' => 's2' ],
]
],
The chunk manager will then create the database schema on each chunk.
Here is the steps of creating a new chunk:
- Get shards from the shard mapping
- recaluclate the chunks for expanding chunks.
- For each new chunk, we give it a new dbname.
- Get the connection DSN from the shard.
- For each connection, create the database.
- Run schema initializer on each connection.
Splitting is a process that keeps chunks from growing too large.
When a chunk grows beyond a specified chunk size, or if the number of documents in the chunk exceeds Maximum Number of Documents Per Chunk to Migrate, Maghead ORM splits the chunk based on the shard key values the chunk represent.
A chunk may be split into multiple chunks where necessary.
Splitting chunk is a relatively heavier task. It splits one chunks into two or more chunks to reduce the chunk size.
CREATE TABLE new_chunk.orders LIKE orig_chunk.orders;
INSERT INTO new_chunk.orders SELECT * from orig_chunk.orders;
Chunk Split must be done on one mysql connection. (cross databases on the same machine)
To split a chunk, please run the command below:
maghead shard chunks:split c1 4
The command above will split the chunk c1 into 4 chunks by its shard targets.
- dailymotion chash https://github.com/dailymotion/chash
- flexihash https://github.com/pda/flexihash