-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Alternate approach to solve #8482 based on ideas from @moko-poi in PR #8547 #8684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for karpenter-docs-prod ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
pkg/providers/instance/instance.go
Outdated
| "capacity-type", karpv1.CapacityTypeSpot, | ||
| "ttl", awscache.UnavailableOfferingsTTL, | ||
| "fleet-id", fleetID, | ||
| ).V(1).Info("removing offering from offerings") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I love having two logs. Whats the reason for not using the previous mechanism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DerekFrank The primary motivation is to keep the order of the log items consistent i.e. backward compatible with order reason, instance-type, zone, capacity-type and only in case of"reason":"InsufficientInstanceCapacity" we add we add fleet-id like:
{"level":"DEBUG","time":"2025-08-11T11:22:26.753Z","logger":"controller","caller":"cache/unavailableofferings.go:73","message":"removing offering from offerings","commit":"434f54c","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":""},"namespace":"","name":"<redacted","reconcileID":"1569d46a-22ad-4d50-be67-f2e0392df3dd","reason":"InsufficientInstanceCapacity","instance-type":"r6a.32xlarge","zone":"eu-west-1b","capacity-type":"on-demand","ttl":"3m0s","fleet-id":"fleet-XXX"}
This is important, because they are Cx relying on the order, using regex pattern to query or tools like LogParserForKarpenter or other custom log parsing.
You rejected @moko-poi approach of adding just the fleet-id as last argument to func MarkUnavailable, because in case of spot interruption, there is no fleet-id and we would have an empty " fleet-id:"" attribute., which could confuse users.
Using the second approach with a map unavailableReason with key reason and fleet-id would move reason to the end of the log line, breaking backwards compatibility, and we have to add sorting for this map, which doesn't look nice as well.
To be clear: This PR does not create two log lines for the same event, it's just the call to log.FromContext in two different locations, either in instance.go for InsufficientInstanceCapacity event or one in controller.go for Spot interruptions.
So my approach keeps the func MarkUnavailable clean and just have a func signature with all attributes stored in cache offeringCache, because reasonand fllet-idare not values stored in this cache. In addition the logging is done where the corresponding event happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the reasoning behind having the log line be in two places. I think having the sorting is cleaner for backwards compatibility, especially if we intend to add more log information to this field in the future. We can discuss offline and get a path forward
|
Thanks @youwalther65 and @DerekFrank for following up on this! I'm totally fine with continuing this work in #8684 — it keeps the intent of #8547 while making the log order deterministic and preserving backward compatibility. Happy to close #8547 once this PR becomes the final implementation. |
Alternate approach to solve #8482 based on ideas from @moko-poi in PR #8547
Fixes #8482
Description
Adds FleetID to Karpenter controller logs in case of
InsufficientInstanceCapacityerror in EC2 CreateFleet API call.Moved logging out ofpkg/cache/unavailableofferings.gofuncMarkUnavailable.After offline discussion with @DerekFrank I moved logging back to
pkg/cache/unavailableofferings.gofuncMarkUnavailable, but now keep log item order backward compatible using approach here.How was this change tested?
Does this change impact docs?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.