-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Labels
bugSomething isn't workingSomething isn't workingcriticalThe issue is critical and should be fixed ASAPThe issue is critical and should be fixed ASAProuterstorage
Description
Yesterday I encountered 150 doubled buckets in the cluster and all of them were caused by master changes. The following happened:
- The cluster is rebalancing buckets
- Due to high load on storages during rebalancing, they don't respond to cartridge failover's pings, failover constantly changes masters in the cluster
- Current master sends the bucket, makes it
SENT
. Right after that the master changes - New master still has bucket in
ACTIVE
state. The replication from old master to the new one either becomes "stopped" (which we had) or it's very slow. - We have 2 buckets in
ACTIVE
state at the same time on different replicaset master
This is dangerous, since a router doesn't know, which bucket is real, so some of the routers will send RW request to one replicaset, other - to another one. As soon as replication is fixed and new master gets the correct state of the bucket, new master either breaks replication, similar to the #573 (if there're RW refs on that bucket), or it will just silently delete the bucket, loosing the data, which it got during that time, when it didn't have correct information about the bucket
It seems, we should teach router to figure out such broken master and don't route requests to them at all.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingcriticalThe issue is critical and should be fixed ASAPThe issue is critical and should be fixed ASAProuterstorage