Skip to content

fix(managedobserve): refresh install token per flush ENG-446#297

Open
michiosw wants to merge 1 commit into
mainfrom
feat/refresh-managed-observe-install-token-lean
Open

fix(managedobserve): refresh install token per flush ENG-446#297
michiosw wants to merge 1 commit into
mainfrom
feat/refresh-managed-observe-install-token-lean

Conversation

@michiosw

@michiosw michiosw commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Where We Are

ENG-446 started from a concrete retry storm: hosted ingest returned terminal 400 responses for an org mismatch, and the daemon treated them like size-related batch failures. With four records in a batch, the client could send the same bad payload shape at batch size 4, then 2, then 1. That wastes API capacity and floods logs while the server is correctly rejecting the request.

PR #296 tried to fix that and token rotation together, but it also advanced the cursor after a hosted one-record 413. Greptile correctly flagged that as data loss risk because a hosted 413 can be generic or transient.

Where We Want To Go

For this PR, terminal hosted validation errors should stop the current flush immediately. A 400 should not shrink the batch, should not advance the cursor, and should leave the data retryable after the operator fixes stale config or token/org mismatch.

A running managed daemon should also pick up rotated install tokens without restart for both GitHub policy refresh and ledger streaming. That prevents stale startup credentials from causing repeated hosted failures after token rotation.

This is the minimal safe slice for ENG-446. It does not add quarantine, long cooldown state, or doctor UX. Those are larger follow-ups if we want the full Linear acceptance criteria.

How do we get there

Reload managed config and resolve the install token before each policy refresh and before each stream flush. Keep the existing local oversized-record cursor advance, because that path is based on client-side payload limits we can prove locally.

Change hosted retry behavior so only 413 reduces batch size. A hosted 400 or 422 now returns after one request. A hosted one-record 413 also returns without advancing the cursor, so Greptile's data-loss concern is addressed.

Tests cover the reviewer-critical cases:

  • hosted 400 gets one request and no cursor advance
  • hosted one-record 413 does not advance the cursor
  • stream flushes pick up a refreshed install token
  • GitHub policy refresh picks up a refreshed install token

Verified with:

  • go test ./internal/managedobserve ./internal/managedstream
  • go test -race ./internal/managedobserve ./internal/managedstream
  • go test ./...
  • go vet ./...
  • git diff --check

michiosw commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@greptile-apps

greptile-apps Bot commented Jun 15, 2026

Copy link
Copy Markdown

Greptile Summary

This PR refreshes managed install credentials during daemon background work. The main changes are:

  • Reloads managed config and install token before each GitHub policy refresh.
  • Reloads managed config and install token before each ledger stream flush.
  • Keeps hosted validation errors terminal instead of shrinking batches.
  • Limits hosted shrink retries to 413 Request Entity Too Large.
  • Preserves local oversized-record cursor advancement while leaving hosted one-record 413 retryable.

Confidence Score: 5/5

This looks safe to merge.

  • No blocking issues found in the changed code.
  • The token refresh loops preserve the prior flush and policy refresh behavior while resolving fresh config each tick.
  • The hosted retry change matches the stated contract and keeps cursor state unchanged on hosted failures.

Important Files Changed

Filename Overview
internal/managedobserve/daemon.go Adds per-refresh config and install-token resolution for policy and stream background loops.
internal/managedstream/stream.go Narrows hosted batch-size retries to 413 and avoids cursor advancement for hosted minimum-batch failures.

Reviews (1): Last reviewed commit: "fix(managedobserve): refresh install tok..." | Re-trigger Greptile

@michiosw michiosw requested a review from hasandemirkiran June 15, 2026 16:49
@michiosw michiosw changed the title fix(managedobserve): refresh install token per flush fix(managedobserve): refresh install token per flush ENG-446 Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant