fix(managedobserve): refresh install token per flush ENG-446#297
Open
michiosw wants to merge 1 commit into
Open
fix(managedobserve): refresh install token per flush ENG-446#297michiosw wants to merge 1 commit into
michiosw wants to merge 1 commit into
Conversation
Greptile SummaryThis PR refreshes managed install credentials during daemon background work. The main changes are:
Confidence Score: 5/5This looks safe to merge.
Important Files Changed
Reviews (1): Last reviewed commit: "fix(managedobserve): refresh install tok..." | Re-trigger Greptile |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Where We Are
ENG-446 started from a concrete retry storm: hosted ingest returned terminal
400responses for an org mismatch, and the daemon treated them like size-related batch failures. With four records in a batch, the client could send the same bad payload shape at batch size 4, then 2, then 1. That wastes API capacity and floods logs while the server is correctly rejecting the request.PR #296 tried to fix that and token rotation together, but it also advanced the cursor after a hosted one-record
413. Greptile correctly flagged that as data loss risk because a hosted413can be generic or transient.Where We Want To Go
For this PR, terminal hosted validation errors should stop the current flush immediately. A
400should not shrink the batch, should not advance the cursor, and should leave the data retryable after the operator fixes stale config or token/org mismatch.A running managed daemon should also pick up rotated install tokens without restart for both GitHub policy refresh and ledger streaming. That prevents stale startup credentials from causing repeated hosted failures after token rotation.
This is the minimal safe slice for ENG-446. It does not add quarantine, long cooldown state, or doctor UX. Those are larger follow-ups if we want the full Linear acceptance criteria.
How do we get there
Reload managed config and resolve the install token before each policy refresh and before each stream flush. Keep the existing local oversized-record cursor advance, because that path is based on client-side payload limits we can prove locally.
Change hosted retry behavior so only
413reduces batch size. A hosted400or422now returns after one request. A hosted one-record413also returns without advancing the cursor, so Greptile's data-loss concern is addressed.Tests cover the reviewer-critical cases:
400gets one request and no cursor advance413does not advance the cursorVerified with:
go test ./internal/managedobserve ./internal/managedstreamgo test -race ./internal/managedobserve ./internal/managedstreamgo test ./...go vet ./...git diff --check