Alternate approach to solve #8482 based on ideas from @moko-poi in PR #8547 #8684

youwalther65 · 2025-10-27T09:04:01Z

Alternate approach to solve #8482 based on ideas from @moko-poi in PR #8547

Description
Adds FleetID to Karpenter controller logs in case of InsufficientInstanceCapacity error in EC2 CreateFleet API call.

~~Moved logging out of pkg/cache/unavailableofferings.go func MarkUnavailable.~~

After offline discussion with @DerekFrank I moved logging back to pkg/cache/unavailableofferings.go func MarkUnavailable, but now keep log item order backward compatible using approach here.

How was this change tested?

Does this change impact docs?

Yes, PR includes docs updates
Yes, issue opened: #
No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

aws#8547

netlify · 2025-10-27T09:04:22Z

✅ Deploy Preview for karpenter-docs-prod ready!

Name	Link
🔨 Latest commit	`114a892`
🔍 Latest deploy log	https://app.netlify.com/projects/karpenter-docs-prod/deploys/69033af31ddacc00087ea613
😎 Deploy Preview	https://deploy-preview-8684--karpenter-docs-prod.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

DerekFrank · 2025-10-28T17:19:32Z

pkg/providers/instance/instance.go

+					"capacity-type", karpv1.CapacityTypeSpot,
+					"ttl", awscache.UnavailableOfferingsTTL,
+					"fleet-id", fleetID,
+				).V(1).Info("removing offering from offerings")


I'm not sure I love having two logs. Whats the reason for not using the previous mechanism?

@DerekFrank The primary motivation is to keep the order of the log items consistent i.e. backward compatible with order reason, instance-type, zone, capacity-type and only in case of"reason":"InsufficientInstanceCapacity" we add we add fleet-id like:

{"level":"DEBUG","time":"2025-08-11T11:22:26.753Z","logger":"controller","caller":"cache/unavailableofferings.go:73","message":"removing offering from offerings","commit":"434f54c","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":""},"namespace":"","name":"<redacted","reconcileID":"1569d46a-22ad-4d50-be67-f2e0392df3dd","reason":"InsufficientInstanceCapacity","instance-type":"r6a.32xlarge","zone":"eu-west-1b","capacity-type":"on-demand","ttl":"3m0s","fleet-id":"fleet-XXX"}

This is important, because they are Cx relying on the order, using regex pattern to query or tools like LogParserForKarpenter or other custom log parsing.

You rejected @moko-poi approach of adding just the fleet-id as last argument to func MarkUnavailable, because in case of spot interruption, there is no fleet-id and we would have an empty " fleet-id:"" attribute., which could confuse users.

Using the second approach with a map unavailableReason with key reason and fleet-id would move reason to the end of the log line, breaking backwards compatibility, and we have to add sorting for this map, which doesn't look nice as well.

To be clear: This PR does not create two log lines for the same event, it's just the call to log.FromContext in two different locations, either in instance.go for InsufficientInstanceCapacity event or one in controller.go for Spot interruptions.
So my approach keeps the func MarkUnavailable clean and just have a func signature with all attributes stored in cache offeringCache, because reasonand fllet-idare not values stored in this cache. In addition the logging is done where the corresponding event happens.

I understand the reasoning behind having the log line be in two places. I think having the sorting is cleaner for backwards compatibility, especially if we intend to add more log information to this field in the future. We can discuss offline and get a path forward

moko-poi · 2025-10-31T03:19:33Z

Thanks @youwalther65 and @DerekFrank for following up on this!

I'm totally fine with continuing this work in #8684 — it keeps the intent of #8547 while making the log order deterministic and preserving backward compatibility.
The main goal from my side was always to surface the FleetID for CloudTrail correlation, without adding noise (like empty fleet-id fields) or changing the existing log structure.

Happy to close #8547 once this PR becomes the final implementation.

Alternate approach to solve aws#8482 based on ideas from moko-poi in PR

d9a5335

aws#8547

youwalther65 requested a review from a team as a code owner October 27, 2025 09:04

youwalther65 requested a review from rschalo October 27, 2025 09:04

youwalther65 mentioned this pull request Oct 27, 2025

feat: add Fleet ID to error logs for CloudTrail correlation #8547

Open

3 tasks

youwalther65 changed the title ~~Alternate approach to solve #8482 based on ideas from moko-poi in PR #8547~~ Alternate approach to solve #8482 based on ideas from @moko-poi in PR #8547 Oct 27, 2025

DerekFrank reviewed Oct 28, 2025

View reviewed changes

youwalther65 added 2 commits October 30, 2025 10:56

Changed PR according to referenced PR while keeping log item order

23f53c9

Merge branch 'main' into fleetid

114a892

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Alternate approach to solve #8482 based on ideas from @moko-poi in PR #8547 #8684

Alternate approach to solve #8482 based on ideas from @moko-poi in PR #8547 #8684

youwalther65 commented Oct 27, 2025 •

edited

Loading

Uh oh!

netlify bot commented Oct 27, 2025 •

edited

Loading

Uh oh!

DerekFrank Oct 28, 2025

Uh oh!

youwalther65 Oct 28, 2025 •

edited

Loading

Uh oh!

DerekFrank Oct 29, 2025

Uh oh!

moko-poi commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Alternate approach to solve #8482 based on ideas from @moko-poi in PR #8547 #8684

Are you sure you want to change the base?

Alternate approach to solve #8482 based on ideas from @moko-poi in PR #8547 #8684

Conversation

youwalther65 commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for karpenter-docs-prod ready!

Uh oh!

DerekFrank Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

youwalther65 Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DerekFrank Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

moko-poi commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

youwalther65 commented Oct 27, 2025 •

edited

Loading

netlify bot commented Oct 27, 2025 •

edited

Loading

youwalther65 Oct 28, 2025 •

edited

Loading