Doubled buckets after master switch

Yesterday I encountered 150 doubled buckets in the cluster and all of them were caused by master changes. The following happened:

1. The cluster is rebalancing buckets
2. Due to high load on storages during rebalancing, they don't respond to cartridge failover's pings, failover constantly changes masters in the cluster
3. Current master sends the bucket, makes it `SENT`. Right after that the master changes
4. New master still has bucket in `ACTIVE` state. The replication from old master to the new one either becomes "stopped" (which we had) or it's very slow.
5. We have 2 buckets in `ACTIVE` state at the same time on different replicaset master

This is dangerous, since a router doesn't know, which bucket is real, so some of the routers will send RW request to one replicaset, other - to another one. As soon as replication is fixed and new master gets the correct state of the bucket, new master either breaks replication, similar to the https://github.com/tarantool/vshard/issues/573 (if there're RW refs on that bucket), or it will just silently delete the bucket, loosing the data, which it got during that time, when it didn't have correct information about the bucket

It seems, we should teach router to figure out such broken master and don't route requests to them at all. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Doubled buckets after master switch #576

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Doubled buckets after master switch #576

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions