feat(hitl): environment scout — negotiate scoped credentials before architecture#78
Conversation
Manual validation (local, end-to-end — one simulated hop)Ran a full build against a local control plane + Hax with a repo containing service config (
Simulated hop: hosted Hax can't reach the local control plane, so the Hax→CP webhook was relayed by POSTing a correctly HMAC-signed payload to the CP. Real signature verification was exercised (a wrongly-shaped payload was rejected with 400; the correct one accepted with 200). A real human submit + full Hax→CP delivery should be confirmed on the shared/Railway stack. Merge note: this PR is stacked on #77 (base = |
…re + schema)
Three new modules under swe_af/hitl/:
- services.py — knowledge base of 9 common third-party services (Railway,
Fly.io, Vercel, Supabase, Sentry, Datadog, GitHub, OpenAI, Anthropic)
with their env var conventions, mint URLs, permissions hints, and
signal files. Plus detect_services_from_repo() for a deterministic
static pre-pass the LLM scout can build on.
- credentials_store.py — process-local, execution-scoped dict for the
credentials the scout negotiates. Keyed by run_id, thread-safe,
isolates concurrent builds, NEVER persists. The full discussion of
why this is in-memory (not BuildConfig, not app.memory, not the
filesystem) lives in the module docstring.
- scout_schema.py — ScoutResult Pydantic model used as the harness
schema. Includes an explicit "scoped_credentials must NEVER round-
trip through model_dump unless excluded" comment for callers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ness env
Adds the new reasoner that runs once between PM and Architect when HAX is
enabled. The scout reads the PRD + repo, identifies third-party services
whose absence would block the work, and asks the user for scoped /
temporary tokens via a single Hax mega-form. Submitted values are stashed
in the in-memory credentials store keyed by run_id; the scout's return
payload OMITS scoped_credentials so the secrets never reach the control-
plane workflow_execution row.
- swe_af/prompts/environment_scout.py — system prompt + task-prompt
builder. Strong guidance on when NOT to ask (purely local PRD, prior
answers already cover the question, no genuine PRD-blocking
requirement).
- swe_af/reasoners/pipeline.py — @router.reasoner async def
run_environment_scout. Same wrapper shape as the three reasoners
from PR #77; uses run_with_ask_user with budget=2.
- swe_af/app.py:
* plan() — Phase 1.5 calls run_environment_scout via app.call BETWEEN
PM and architect; guarded so it runs only when HAX_API_KEY is set.
* build() body wrapped in try/finally so clear_scoped_credentials
ALWAYS runs on exit (success or exception). Eliminates secret
leakage across builds within the same agent process.
* app.harness is monkey-patched once at module load to auto-inject
stored credentials as env vars on EVERY harness call across the
pipeline. Avoids touching the 25+ existing call sites.
Backwards-compatible: with HAX_API_KEY unset, plan() skips the scout and
the monkey-patched harness passes os.environ through unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three pillars covered:
- services.py — KNOWN_SERVICES inventory bounds, missing-path safety,
file + directory signal detection, prompt-summary rendering.
- credentials_store.py — round-trip, blank/None filtering, isolation
between execution_ids, get-returns-copy, concurrent thread safety,
inject-into-env layering rules.
- scout closure round-trip — pass 1 emits ask_user_form via the
wrapper, pass 2 sees prior_user_responses and returns
scoped_credentials; no-services-detected short-circuits the pause;
model_dump(exclude={"scoped_credentials"}) actually strips the field.
All tests mock HaxClient + app.pause; no real network, no real harness.
Pin a baseline of 8+ services so future trimming is visible in diff.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cb9f647 to
bb5f197
Compare
Summary
A new reasoner —
run_environment_scout— runs once between PM andArchitect when HAX is enabled. It reads the PRD + repo, identifies third-
party services whose absence would block the work, and asks the user via
a single Hax mega-form for scoped, temporary tokens. Negotiated
credentials are injected into every downstream harness call (Architect,
SprintPlanner, Coder, QA, Reviewer, CI Fixer, …) as env vars — never
written to disk, never persisted through the control plane.
Example flow: the scout sees
railway.toml, recognizes the PRD requiresadding a new endpoint that queries the DB, and asks:
User mints a 1-day token, pastes it, build resumes with
RAILWAY_TOKENavailable to the coder.
Architecture
What's new
Substrate (commit 1)
swe_af/hitl/services.py— knowledge base of 9 common services(Railway, Fly.io, Vercel, Supabase, Sentry, Datadog, GitHub, OpenAI,
Anthropic) with env var conventions, mint URLs, permissions hints, and
signal files. Plus
detect_services_from_repo()for a deterministicstatic pre-pass the LLM scout can build on.
swe_af/hitl/credentials_store.py— process-local, run-scoped dict.Thread-safe, isolates concurrent builds, never persists. Full design
rationale in the module docstring.
swe_af/hitl/scout_schema.py—ScoutResultPydantic with explicitguidance that
scoped_credentialsmust be excluded from anyserialization that crosses a logging boundary.
Reasoner + integration (commit 2)
swe_af/prompts/environment_scout.py— system prompt + task-promptbuilder. Strong guidance on when NOT to ask.
swe_af/reasoners/pipeline.py—@router.reasoner async def run_environment_scout. Same wrapper shape as the three reasoners infeat(hitl): generic ask_user_via_form capability for selected reasoners #77; uses
run_with_ask_userwithbudget=2. Stores credentialsdirectly in the in-memory dict; returns a dump that EXCLUDES
scoped_credentialsso the control plane workflow_execution rownever sees secret values.
swe_af/app.py:plan()— Phase 1.5 calls the scout between PM and Architect.Guarded by
HAX_API_KEYcheck so it's a no-op when Hax is disabled.build()body wrapped intry/finallysoclear_scoped_credentials(run_id)ALWAYS runs on exit (success orexception). Eliminates leakage across builds in the same process.
app.harnessmonkey-patched once at module load to auto-mergestored credentials into the harness env on EVERY call. Avoids
touching the 25+ existing call sites.
Tests (commit 3)
17 new unit tests covering services detection, credentials store
(round-trip, filtering, isolation, thread safety, get-returns-copy,
inject-into-env layering), and the scout closure pass-1 → pause →
pass-2 round-trip. All mocked; no real Hax, no real harness.
Security boundary
The scout's negotiated credentials live in one place only:
a module-level dict in
swe_af.hitl.credentials_store, keyed by thebuild's run_id. They are:
never values.
app.notelines from the scout count negotiated +skipped, no secrets.
app.call: the scout's return dictexcludes
scoped_credentials; downstream reasoners look up viaget_scoped_credentials(run_id), not via kwarg.try/finallyinbuild()guaranteesclear_scoped_credentials(run_id)runs even on uncaught exception.same agent process cannot see each other's credentials.
This is in-memory only. If the agent process restarts mid-build, the
credentials are lost — same as any other in-memory state. We chose this
over
app.memory(scope=run)(persists to control-plane DB) and overBuildConfig(serializes throughapp.call).Backwards compatibility
With
HAX_API_KEYunset:plan()skips the scout call entirely (HAX guard in app.py).app.harnessmonkey-patch passesos.environthrough unchanged.Test plan
ruff checkon touched files — cleanpytest tests/test_environment_scout.py— 17/17 passpytestsuite — net zero new failures vs. baseline(12 pre-existing failures from Python 3.10 / project's >=3.12 mismatch
are unchanged)
HAX_API_KEYset, trigger a build with a goal thatrequires Railway access, verify scout pauses with a form, submit a
token, confirm coder receives it as
RAILWAY_TOKENenv (deferred —same kind of stack-level verification we ran for feat(hitl): generic ask_user_via_form capability for selected reasoners #77)
Suggested future work (out of scope here)
HaxClient(encryption_key=...)isnot yet plumbed through. Today the credential travels Hax → control
plane → agent via HTTPS but isn't end-to-end encrypted at the Hax
payload level. Add this when secrets cross trust boundaries.
later auth failure is the user's problem. A future reasoner could
emit a sentinel from inside the coder ("need GH_TOKEN to call X")
and re-trigger a targeted ask.
🤖 Generated with Claude Code