-
Notifications
You must be signed in to change notification settings - Fork 864
feat: [Shard-Distributor] Implementation of onboarding logic #7321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: [Shard-Distributor] Implementation of onboarding logic #7321
Conversation
Signed-off-by: edigregorio <[email protected]>
Signed-off-by: edigregorio <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Looking at this I think we need to consider the switch to fully onboarded again.
Doing a gradual rollout of this will cause inconsistency in shard ownership, I think.
return nil, fmt.Errorf("delete executors: %w", err) | ||
} | ||
for shard := range request.GetShardStatusReports() { | ||
err = h.storage.AssignShard(ctx, request.GetNamespace(), request.GetExecutorID(), shard) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Ok for now, but slightly worried about transactionallity - maybe we should have an "assignShardsToExecutor" function in the storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we just deleted the executor from the store I'm concerned if this call will just fail since the executor is not there?
I think a new store function that's transactional sound good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points, I moved the multiple assignments as a transaction, relying on the existing functionalities of the store module.
continue | ||
} | ||
if p.namespaceCfg.Mode != config.MigrationModeONBOARDED { | ||
p.logger.Info("Namespace not onboarded, rebalance not triggered", tag.ShardNamespace(p.namespaceCfg.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might log a lot, but we can always adjust if it's too much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it could be, we also have already log out of this if condition so we can consider to remove both if it is too noisy
…tion Signed-off-by: edigregorio <[email protected]>
If we use the incrementally as implemented right now yes, it will be inconsistent but I am thinking to a different way of enabling the feature and still be safe from the global impact. Writing it down in the onboarding plan. |
Signed-off-by: edigregorio <[email protected]>
What changed?
Logic implemented in the shard distributor:
when receiving the heartbeat
Logic that will be implemented in a followup pr for executor library:
NextNext PR
For each of the cases create a test in the canary
Why?
How did you test it?
Unit tests and local execution
Potential risks
the only service using it, it's the canary. No real impact in production
Release notes
Documentation Changes