-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update rule management to avoid sporadic 503 errors #4039
base: main
Are you sure you want to change the base?
Conversation
"resourceID", resLR.ID(), | ||
"arn", awssdk.ToString(sdkLR.ListenerRule.RuleArn)) | ||
return nil | ||
func (m *defaultListenerRuleManager) SetRulePriorities(ctx context.Context, unmatchedSDKLRs []ListenerRuleWithTags, lastAvailablePriority int32) (int32, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unmatchedSDKLRs seems to be the wrong name. Aren't these listener rules that are matched but just at the wrong priority?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read further on in the implementation seems we use this function for matched and unmatched rules.
docs/install/iam_policy.json
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move "elasticloadbalancing:SetRulePriorities" to where we have condition, like other modify calls
"Condition": {
"Null": {
"aws:ResourceTag/elbv2.k8s.aws/cluster": "false"
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I agree. Nice catch fixed now. :)
|
||
matchedResAndSDKLRs, unmatchedResLRs, unmatchedSDKLRs := matchResAndSDKListenerRules(resLRs, sdkLRs) | ||
// matchedResAndSDKLRsBySettings : A slice of matched resLR and SDKLR rule pairs that have matching settings like actions and conditions | ||
// unmatchedResLRs : A slice of resLR) that do not have a corresponding match in the sdkLRs. These rules need to be created on the load balancer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resLR)
typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. Thanks :)
if err != nil { | ||
return err | ||
} | ||
resLR.SetStatus(lrStatus) | ||
} | ||
for _, resAndSDKLR := range matchedResAndSDKLRs { | ||
// Update existing listener rules on the load balancer for their tags | ||
for _, resAndSDKLR := range matchedResAndSDKLRsBySettings { | ||
lsStatus, err := s.lrManager.Update(ctx, resAndSDKLR.resLR, resAndSDKLR.sdkLR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this lrManager.Update
is used only for update tags with your change ? should we change the method name to make it clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Updated.
sdkLR: sdkLR, | ||
}) | ||
} | ||
unmatchedSDKLRs = append(unmatchedSDKLRs[:i], unmatchedSDKLRs[i+1:]...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I am not a fan of modifying the input of a function like this. I see that we do this is in other parts of the code but overall I think it's an anti-pattern. Using this pattern also means you have to do multiple modification to i
to get the index correct.
I would prefer to just allocate a new array to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done this to avoid any duplicate desired res rules matching the settings to be paired to same sdk rule.
break | ||
} | ||
} | ||
if !found { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will still result in 503s (please correct me if I'm wrong)
Basically, from my read of the code, when a user changes their conditions or actions we will delete the ListenerRule and re-create the rule. If we're unlucky, it's possible for the delete to get propagated and the create to be delayed a bit which causes the 503 errors.
Ideally, we can use modify-rule
https://docs.aws.amazon.com/cli/latest/reference/elbv2/modify-rule.html to detect these rules with the changed actions / conditions and modify the rule in place, rather than delete and create the rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. I have updated the code to avoid such scenarios. We will first push down all the unmatched SDK LRs which are not matching any priority or settings down. We will reprioritize the rules which are matching the settings to match their desired priorities and then we will either modify/create the rules at their desired priorities and in the end delete all the unwanted pushed down rules. This will we will have have rules on the listener before we get a newer/updated one at higher priority.
sdkLRs = append(sdkLRs, sdkLR) | ||
lastAvailablePriority-- | ||
} | ||
//Reprioratize matched rules by settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Reprioritize
desiredActions, err := buildSDKActions(resLR.Spec.Actions, m.featureGates) | ||
if err != nil { | ||
func (m *defaultListenerRuleManager) SetRulePriorities(ctx context.Context, matchedResAndSDKLRsBySettings []resAndSDKListenerRulePair, unmatchedSDKLRs []ListenerRuleWithTags) error { | ||
var lastAvailablePriority int32 = 50000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for my own understanding, it's impossible for a user to model a rule with 50k priority in K8s. It would have to be done via an out of band approach therefore would also be eligible for removal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this before, but we should have some unit tests here.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: shraddhabang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Done. Please take a look. |
Issue
#3816
Description
The root cause for the above issue was identified as a race condition in the ALB's rule management process. The controller uses ModifyRule API sequentially to add any new rule at the existing priority which results in loss of the existing older rule at that priority. When the first API call to modify rules is received, the ALB waits 10 seconds to batch all subsequent changes before deploying them to the data plane. If some rule modification APIs arrive after the initial 10-second window, they are implemented in the next batch, leading to a brief period where a random rule may not exist, causing 404 errors.
The proposed improvement in the rule management logic make use of SetRulePriorities API to first push the unwanted rules down in the listener instead of deleting them. This creates the gaps in the rules order to add any new/updated rule. Once the new rule is successfully added, the controller will then delete all the unwanted rules. This way we ensure that no rules will be lost before the new/updated rules are present in the controller. This will avoid any sporadic 404 errors caused due to race conditions.
Checklist
README.md
, or thedocs
directory)BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯