Skip to content

Doubled buckets after master switch #576

@Serpentian

Description

@Serpentian

Yesterday I encountered 150 doubled buckets in the cluster and all of them were caused by master changes. The following happened:

  1. The cluster is rebalancing buckets
  2. Due to high load on storages during rebalancing, they don't respond to cartridge failover's pings, failover constantly changes masters in the cluster
  3. Current master sends the bucket, makes it SENT. Right after that the master changes
  4. New master still has bucket in ACTIVE state. The replication from old master to the new one either becomes "stopped" (which we had) or it's very slow.
  5. We have 2 buckets in ACTIVE state at the same time on different replicaset master

This is dangerous, since a router doesn't know, which bucket is real, so some of the routers will send RW request to one replicaset, other - to another one. As soon as replication is fixed and new master gets the correct state of the bucket, new master either breaks replication, similar to the #573 (if there're RW refs on that bucket), or it will just silently delete the bucket, loosing the data, which it got during that time, when it didn't have correct information about the bucket

It seems, we should teach router to figure out such broken master and don't route requests to them at all.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcriticalThe issue is critical and should be fixed ASAProuterstorage

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions