fix(managedobserve): refresh rotated install tokens#296
Conversation
Greptile SummaryThis PR refreshes managed install tokens during long-running daemon work. The main changes are:
Confidence Score: 3/5This is close, but the single-record 413 handling should be fixed before merging.
internal/managedstream/stream.go Important Files Changed
Reviews (1): Last reviewed commit: "fix(managedobserve): refresh rotated ins..." | Re-trigger Greptile |
| if limit == 1 { | ||
| return err | ||
| return advancePastMinimumBatch(statePath, batch, fmt.Sprintf("hosted status %d", hostedErr.StatusCode), err) | ||
| } |
There was a problem hiding this comment.
This branch saves the cursor past a one-record batch for any HTTP 413. A 413 can come from a proxy, load balancer, or misconfigured route rather than from the ledger service proving this specific record can never be accepted. In that case the daemon records the cursor before returning the error, so the next flush starts after this action and the ledger entry is never retried or streamed. Please only advance here when the client has a local payload-limit violation or a structured hosted response that identifies the single record as permanently too large.
|
Closing this version. The replacement will keep the token-refresh fix but avoid the single-record 413 cursor skip that Greptile flagged. |

Where We Are
PR #295 was closed because it fixed ledger token rotation but left GitHub policy refresh on the startup install token. It also kept shrinking batches for generic hosted 400/422 validation errors, which can retry the same invalid payload shape instead of stopping.
Where We Want To Go
A running managed daemon should use a rotated install token for both policy refresh and ledger streaming. Hosted validation errors should fail once without advancing the cursor. Real 413 payload-size responses should still shrink batches, and a single rejected record should not be resent forever.
How do we get there
Reload managed config and the install token before each policy refresh and stream flush. Limit hosted shrink retries to 413, and advance past a one-record 413 after recording the diagnostic error.
Verified with: