feat(hitl): environment scout — negotiate scoped credentials before architecture by AbirAbbas · Pull Request #78 · Agent-Field/SWE-AF

AbirAbbas · 2026-05-26T19:10:06Z

Stacked on top of #77. Merge that one first (or rebase this onto main after it lands).

Summary

A new reasoner — run_environment_scout — runs once between PM and
Architect when HAX is enabled. It reads the PRD + repo, identifies third-
party services whose absence would block the work, and asks the user via
a single Hax mega-form for scoped, temporary tokens. Negotiated
credentials are injected into every downstream harness call (Architect,
SprintPlanner, Coder, QA, Reviewer, CI Fixer, …) as env vars — never
written to disk, never persisted through the control plane.

Example flow: the scout sees railway.toml, recognizes the PRD requires
adding a new endpoint that queries the DB, and asks:

"I detected Railway. The PRD requires DB queries. Mint a token at
https://railway.com/account/tokens (project token, read-only, 1-day expiry)
and paste below. Leave blank to skip."

User mints a 1-day token, pastes it, build resumes with RAILWAY_TOKEN
available to the coder.

Architecture

build()
 ├─ try:
 │   ├─ Phase 1: PM ──> PRD
 │   ├─ Phase 1.5: ENVIRONMENT SCOUT ←── new ──┐
 │   │   • reads PRD + repo                   │
 │   │   • LLM emits AskUserForm              │ negotiated values
 │   │   • Hax pause → user submits           │ stashed in
 │   │   • LLM extracts values + stores       │ process-local
 │   │   ├─> store_scoped_credentials(run_id) ┘ dict keyed by run_id
 │   ├─ Phase 2: Architect ┐
 │   ├─ Phase 3: TechLead  │  every harness call goes through
 │   ├─ Phase 4: Sprint    │  app.harness (monkey-patched once)
 │   ├─ Phase 5: Execute   │  which merges os.environ + stored creds
 │   ├─ Phase 6: PR        │  into the subprocess env
 │   ...                   ┘
 └─ finally:
     clear_scoped_credentials(run_id)

What's new

Substrate (commit 1)

swe_af/hitl/services.py — knowledge base of 9 common services
(Railway, Fly.io, Vercel, Supabase, Sentry, Datadog, GitHub, OpenAI,
Anthropic) with env var conventions, mint URLs, permissions hints, and
signal files. Plus detect_services_from_repo() for a deterministic
static pre-pass the LLM scout can build on.
swe_af/hitl/credentials_store.py — process-local, run-scoped dict.
Thread-safe, isolates concurrent builds, never persists. Full design
rationale in the module docstring.
swe_af/hitl/scout_schema.py — ScoutResult Pydantic with explicit
guidance that scoped_credentials must be excluded from any
serialization that crosses a logging boundary.

Reasoner + integration (commit 2)

swe_af/prompts/environment_scout.py — system prompt + task-prompt
builder. Strong guidance on when NOT to ask.
swe_af/reasoners/pipeline.py — @router.reasoner async def run_environment_scout. Same wrapper shape as the three reasoners in
feat(hitl): generic ask_user_via_form capability for selected reasoners #77; uses run_with_ask_user with budget=2. Stores credentials
directly in the in-memory dict; returns a dump that EXCLUDES
scoped_credentials so the control plane workflow_execution row
never sees secret values.
swe_af/app.py:
- plan() — Phase 1.5 calls the scout between PM and Architect.
  Guarded by HAX_API_KEY check so it's a no-op when Hax is disabled.
- build() body wrapped in try/finally so
  clear_scoped_credentials(run_id) ALWAYS runs on exit (success or
  exception). Eliminates leakage across builds in the same process.
- app.harness monkey-patched once at module load to auto-merge
  stored credentials into the harness env on EVERY call. Avoids
  touching the 25+ existing call sites.

Tests (commit 3)

17 new unit tests covering services detection, credentials store
(round-trip, filtering, isolation, thread safety, get-returns-copy,
inject-into-env layering), and the scout closure pass-1 → pause →
pass-2 round-trip. All mocked; no real Hax, no real harness.

Security boundary

The scout's negotiated credentials live in one place only:
a module-level dict in swe_af.hitl.credentials_store, keyed by the
build's run_id. They are:

Never logged: every notification message lists env var names only,
never values. app.note lines from the scout count negotiated +
skipped, no secrets.
Never serialized through app.call: the scout's return dict
excludes scoped_credentials; downstream reasoners look up via
get_scoped_credentials(run_id), not via kwarg.
Never written to disk: no artifacts file, no log file, no DB.
Cleared on every exit: try/finally in build() guarantees
clear_scoped_credentials(run_id) runs even on uncaught exception.
Isolated per build: keyed by run_id; concurrent builds in the
same agent process cannot see each other's credentials.

This is in-memory only. If the agent process restarts mid-build, the
credentials are lost — same as any other in-memory state. We chose this
over app.memory(scope=run) (persists to control-plane DB) and over
BuildConfig (serializes through app.call).

Backwards compatibility

With HAX_API_KEY unset:

plan() skips the scout call entirely (HAX guard in app.py).
app.harness monkey-patch passes os.environ through unchanged.
Net effect: identical to today's behavior.

Test plan

ruff check on touched files — clean
pytest tests/test_environment_scout.py — 17/17 pass
Full pytest suite — net zero new failures vs. baseline
(12 pre-existing failures from Python 3.10 / project's >=3.12 mismatch
are unchanged)
CI green on this PR
Manual: with HAX_API_KEY set, trigger a build with a goal that
requires Railway access, verify scout pauses with a form, submit a
token, confirm coder receives it as RAILWAY_TOKEN env (deferred —
same kind of stack-level verification we ran for feat(hitl): generic ask_user_via_form capability for selected reasoners #77)

Suggested future work (out of scope here)

Encrypted form fields: hax-sdk's HaxClient(encryption_key=...) is
not yet plumbed through. Today the credential travels Hax → control
plane → agent via HTTPS but isn't end-to-end encrypted at the Hax
payload level. Add this when secrets cross trust boundaries.
Mid-execution credential top-up: today the scout runs once and any
later auth failure is the user's problem. A future reasoner could
emit a sentinel from inside the coder ("need GH_TOKEN to call X")
and re-trigger a targeted ask.

🤖 Generated with Claude Code

AbirAbbas · 2026-05-28T18:09:24Z

Manual validation (local, end-to-end — one simulated hop)

Ran a full build against a local control plane + Hax with a repo containing service config (railway.toml + a pg dependency) and a goal that requires the production DB:

✅ Detection — scout identified "Railway Postgres" and the DATABASE_URL env var from repo signals (railway.toml, package.json pg, server.js using process.env.DATABASE_URL).
✅ Form — built an ask_user_form with a single DATABASE_URL input field (mint URL + evidence in the description) and paused the workflow.
✅ Resume — on submit, scout pass-2 read the value from prior_user_responses and populated scoped_credentials.
✅ Injection — the submitted value was injected into the downstream architect's harness subprocess environment; verified the exact DATABASE_URL=… present in the running subprocess's env.
✅ No secret leak — scoped_credentials is excluded from the control-plane-logged reasoner return value; the secret value appears nowhere in the execution logs (only the env-var name in the summary line).

Simulated hop: hosted Hax can't reach the local control plane, so the Hax→CP webhook was relayed by POSTing a correctly HMAC-signed payload to the CP. Real signature verification was exercised (a wrongly-shaped payload was rejected with 400; the correct one accepted with 200). A real human submit + full Hax→CP delivery should be confirmed on the shared/Railway stack.

Merge note: this PR is stacked on #77 (base = feat/hitl-ask-user-via-form). Its test CI hasn't triggered because the workflow only runs for PRs based on main. After #77 merges and this retargets to main, CI will run — confirm green before merging.

…re + schema) Three new modules under swe_af/hitl/: - services.py — knowledge base of 9 common third-party services (Railway, Fly.io, Vercel, Supabase, Sentry, Datadog, GitHub, OpenAI, Anthropic) with their env var conventions, mint URLs, permissions hints, and signal files. Plus detect_services_from_repo() for a deterministic static pre-pass the LLM scout can build on. - credentials_store.py — process-local, execution-scoped dict for the credentials the scout negotiates. Keyed by run_id, thread-safe, isolates concurrent builds, NEVER persists. The full discussion of why this is in-memory (not BuildConfig, not app.memory, not the filesystem) lives in the module docstring. - scout_schema.py — ScoutResult Pydantic model used as the harness schema. Includes an explicit "scoped_credentials must NEVER round- trip through model_dump unless excluded" comment for callers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ness env Adds the new reasoner that runs once between PM and Architect when HAX is enabled. The scout reads the PRD + repo, identifies third-party services whose absence would block the work, and asks the user for scoped / temporary tokens via a single Hax mega-form. Submitted values are stashed in the in-memory credentials store keyed by run_id; the scout's return payload OMITS scoped_credentials so the secrets never reach the control- plane workflow_execution row. - swe_af/prompts/environment_scout.py — system prompt + task-prompt builder. Strong guidance on when NOT to ask (purely local PRD, prior answers already cover the question, no genuine PRD-blocking requirement). - swe_af/reasoners/pipeline.py — @router.reasoner async def run_environment_scout. Same wrapper shape as the three reasoners from PR #77; uses run_with_ask_user with budget=2. - swe_af/app.py: * plan() — Phase 1.5 calls run_environment_scout via app.call BETWEEN PM and architect; guarded so it runs only when HAX_API_KEY is set. * build() body wrapped in try/finally so clear_scoped_credentials ALWAYS runs on exit (success or exception). Eliminates secret leakage across builds within the same agent process. * app.harness is monkey-patched once at module load to auto-inject stored credentials as env vars on EVERY harness call across the pipeline. Avoids touching the 25+ existing call sites. Backwards-compatible: with HAX_API_KEY unset, plan() skips the scout and the monkey-patched harness passes os.environ through unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three pillars covered: - services.py — KNOWN_SERVICES inventory bounds, missing-path safety, file + directory signal detection, prompt-summary rendering. - credentials_store.py — round-trip, blank/None filtering, isolation between execution_ids, get-returns-copy, concurrent thread safety, inject-into-env layering rules. - scout closure round-trip — pass 1 emits ask_user_form via the wrapper, pass 2 sees prior_user_responses and returns scoped_credentials; no-services-detected short-circuits the pause; model_dump(exclude={"scoped_credentials"}) actually strips the field. All tests mock HaxClient + app.pause; no real network, no real harness. Pin a baseline of 8+ services so future trimming is visible in diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

AbirAbbas mentioned this pull request May 28, 2026

feat(hitl): generic ask_user_via_form capability for selected reasoners #77

Merged

7 tasks

AbirAbbas and others added 3 commits May 28, 2026 15:21

AbirAbbas force-pushed the feat/environment-scout branch from cb9f647 to bb5f197 Compare May 28, 2026 19:32

AbirAbbas changed the base branch from feat/hitl-ask-user-via-form to main May 28, 2026 19:32

AbirAbbas merged commit 73251cb into main May 28, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(hitl): environment scout — negotiate scoped credentials before architecture#78

feat(hitl): environment scout — negotiate scoped credentials before architecture#78
AbirAbbas merged 3 commits into
mainfrom
feat/environment-scout

AbirAbbas commented May 26, 2026

Uh oh!

AbirAbbas commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AbirAbbas commented May 26, 2026

Summary

Architecture

What's new

Substrate (commit 1)

Reasoner + integration (commit 2)

Tests (commit 3)

Security boundary

Backwards compatibility

Test plan

Suggested future work (out of scope here)

Uh oh!

AbirAbbas commented May 28, 2026

Manual validation (local, end-to-end — one simulated hop)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant