Skip to content

feat(hitl): environment scout — negotiate scoped credentials before architecture#78

Merged
AbirAbbas merged 3 commits into
mainfrom
feat/environment-scout
May 28, 2026
Merged

feat(hitl): environment scout — negotiate scoped credentials before architecture#78
AbirAbbas merged 3 commits into
mainfrom
feat/environment-scout

Conversation

@AbirAbbas
Copy link
Copy Markdown
Collaborator

Stacked on top of #77. Merge that one first (or rebase this onto main after it lands).

Summary

A new reasoner — run_environment_scout — runs once between PM and
Architect when HAX is enabled. It reads the PRD + repo, identifies third-
party services whose absence would block the work, and asks the user via
a single Hax mega-form for scoped, temporary tokens. Negotiated
credentials are injected into every downstream harness call (Architect,
SprintPlanner, Coder, QA, Reviewer, CI Fixer, …) as env vars — never
written to disk, never persisted through the control plane.

Example flow: the scout sees railway.toml, recognizes the PRD requires
adding a new endpoint that queries the DB, and asks:

"I detected Railway. The PRD requires DB queries. Mint a token at
https://railway.com/account/tokens (project token, read-only, 1-day expiry)
and paste below. Leave blank to skip."

User mints a 1-day token, pastes it, build resumes with RAILWAY_TOKEN
available to the coder.

Architecture

build()
 ├─ try:
 │   ├─ Phase 1: PM ──> PRD
 │   ├─ Phase 1.5: ENVIRONMENT SCOUT ←── new ──┐
 │   │   • reads PRD + repo                   │
 │   │   • LLM emits AskUserForm              │ negotiated values
 │   │   • Hax pause → user submits           │ stashed in
 │   │   • LLM extracts values + stores       │ process-local
 │   │   ├─> store_scoped_credentials(run_id) ┘ dict keyed by run_id
 │   ├─ Phase 2: Architect ┐
 │   ├─ Phase 3: TechLead  │  every harness call goes through
 │   ├─ Phase 4: Sprint    │  app.harness (monkey-patched once)
 │   ├─ Phase 5: Execute   │  which merges os.environ + stored creds
 │   ├─ Phase 6: PR        │  into the subprocess env
 │   ...                   ┘
 └─ finally:
     clear_scoped_credentials(run_id)

What's new

Substrate (commit 1)

  • swe_af/hitl/services.py — knowledge base of 9 common services
    (Railway, Fly.io, Vercel, Supabase, Sentry, Datadog, GitHub, OpenAI,
    Anthropic) with env var conventions, mint URLs, permissions hints, and
    signal files. Plus detect_services_from_repo() for a deterministic
    static pre-pass the LLM scout can build on.
  • swe_af/hitl/credentials_store.py — process-local, run-scoped dict.
    Thread-safe, isolates concurrent builds, never persists. Full design
    rationale in the module docstring.
  • swe_af/hitl/scout_schema.pyScoutResult Pydantic with explicit
    guidance that scoped_credentials must be excluded from any
    serialization that crosses a logging boundary.

Reasoner + integration (commit 2)

  • swe_af/prompts/environment_scout.py — system prompt + task-prompt
    builder. Strong guidance on when NOT to ask.
  • swe_af/reasoners/pipeline.py@router.reasoner async def run_environment_scout. Same wrapper shape as the three reasoners in
    feat(hitl): generic ask_user_via_form capability for selected reasoners #77; uses run_with_ask_user with budget=2. Stores credentials
    directly in the in-memory dict; returns a dump that EXCLUDES
    scoped_credentials so the control plane workflow_execution row
    never sees secret values.
  • swe_af/app.py:
    • plan() — Phase 1.5 calls the scout between PM and Architect.
      Guarded by HAX_API_KEY check so it's a no-op when Hax is disabled.
    • build() body wrapped in try/finally so
      clear_scoped_credentials(run_id) ALWAYS runs on exit (success or
      exception). Eliminates leakage across builds in the same process.
    • app.harness monkey-patched once at module load to auto-merge
      stored credentials into the harness env on EVERY call. Avoids
      touching the 25+ existing call sites.

Tests (commit 3)

17 new unit tests covering services detection, credentials store
(round-trip, filtering, isolation, thread safety, get-returns-copy,
inject-into-env layering), and the scout closure pass-1 → pause →
pass-2 round-trip. All mocked; no real Hax, no real harness.

Security boundary

The scout's negotiated credentials live in one place only:
a module-level dict in swe_af.hitl.credentials_store, keyed by the
build's run_id. They are:

  • Never logged: every notification message lists env var names only,
    never values. app.note lines from the scout count negotiated +
    skipped, no secrets.
  • Never serialized through app.call: the scout's return dict
    excludes scoped_credentials; downstream reasoners look up via
    get_scoped_credentials(run_id), not via kwarg.
  • Never written to disk: no artifacts file, no log file, no DB.
  • Cleared on every exit: try/finally in build() guarantees
    clear_scoped_credentials(run_id) runs even on uncaught exception.
  • Isolated per build: keyed by run_id; concurrent builds in the
    same agent process cannot see each other's credentials.

This is in-memory only. If the agent process restarts mid-build, the
credentials are lost — same as any other in-memory state. We chose this
over app.memory(scope=run) (persists to control-plane DB) and over
BuildConfig (serializes through app.call).

Backwards compatibility

With HAX_API_KEY unset:

  • plan() skips the scout call entirely (HAX guard in app.py).
  • app.harness monkey-patch passes os.environ through unchanged.
  • Net effect: identical to today's behavior.

Test plan

  • ruff check on touched files — clean
  • pytest tests/test_environment_scout.py — 17/17 pass
  • Full pytest suite — net zero new failures vs. baseline
    (12 pre-existing failures from Python 3.10 / project's >=3.12 mismatch
    are unchanged)
  • CI green on this PR
  • Manual: with HAX_API_KEY set, trigger a build with a goal that
    requires Railway access, verify scout pauses with a form, submit a
    token, confirm coder receives it as RAILWAY_TOKEN env (deferred —
    same kind of stack-level verification we ran for feat(hitl): generic ask_user_via_form capability for selected reasoners #77)

Suggested future work (out of scope here)

  • Encrypted form fields: hax-sdk's HaxClient(encryption_key=...) is
    not yet plumbed through. Today the credential travels Hax → control
    plane → agent via HTTPS but isn't end-to-end encrypted at the Hax
    payload level. Add this when secrets cross trust boundaries.
  • Mid-execution credential top-up: today the scout runs once and any
    later auth failure is the user's problem. A future reasoner could
    emit a sentinel from inside the coder ("need GH_TOKEN to call X")
    and re-trigger a targeted ask.

🤖 Generated with Claude Code

@AbirAbbas
Copy link
Copy Markdown
Collaborator Author

Manual validation (local, end-to-end — one simulated hop)

Ran a full build against a local control plane + Hax with a repo containing service config (railway.toml + a pg dependency) and a goal that requires the production DB:

  • Detection — scout identified "Railway Postgres" and the DATABASE_URL env var from repo signals (railway.toml, package.json pg, server.js using process.env.DATABASE_URL).
  • Form — built an ask_user_form with a single DATABASE_URL input field (mint URL + evidence in the description) and paused the workflow.
  • Resume — on submit, scout pass-2 read the value from prior_user_responses and populated scoped_credentials.
  • Injection — the submitted value was injected into the downstream architect's harness subprocess environment; verified the exact DATABASE_URL=… present in the running subprocess's env.
  • No secret leakscoped_credentials is excluded from the control-plane-logged reasoner return value; the secret value appears nowhere in the execution logs (only the env-var name in the summary line).

Simulated hop: hosted Hax can't reach the local control plane, so the Hax→CP webhook was relayed by POSTing a correctly HMAC-signed payload to the CP. Real signature verification was exercised (a wrongly-shaped payload was rejected with 400; the correct one accepted with 200). A real human submit + full Hax→CP delivery should be confirmed on the shared/Railway stack.

Merge note: this PR is stacked on #77 (base = feat/hitl-ask-user-via-form). Its test CI hasn't triggered because the workflow only runs for PRs based on main. After #77 merges and this retargets to main, CI will run — confirm green before merging.

AbirAbbas and others added 3 commits May 28, 2026 15:21
…re + schema)

Three new modules under swe_af/hitl/:

  - services.py — knowledge base of 9 common third-party services (Railway,
    Fly.io, Vercel, Supabase, Sentry, Datadog, GitHub, OpenAI, Anthropic)
    with their env var conventions, mint URLs, permissions hints, and
    signal files. Plus detect_services_from_repo() for a deterministic
    static pre-pass the LLM scout can build on.
  - credentials_store.py — process-local, execution-scoped dict for the
    credentials the scout negotiates. Keyed by run_id, thread-safe,
    isolates concurrent builds, NEVER persists. The full discussion of
    why this is in-memory (not BuildConfig, not app.memory, not the
    filesystem) lives in the module docstring.
  - scout_schema.py — ScoutResult Pydantic model used as the harness
    schema. Includes an explicit "scoped_credentials must NEVER round-
    trip through model_dump unless excluded" comment for callers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ness env

Adds the new reasoner that runs once between PM and Architect when HAX is
enabled. The scout reads the PRD + repo, identifies third-party services
whose absence would block the work, and asks the user for scoped /
temporary tokens via a single Hax mega-form. Submitted values are stashed
in the in-memory credentials store keyed by run_id; the scout's return
payload OMITS scoped_credentials so the secrets never reach the control-
plane workflow_execution row.

  - swe_af/prompts/environment_scout.py — system prompt + task-prompt
    builder. Strong guidance on when NOT to ask (purely local PRD, prior
    answers already cover the question, no genuine PRD-blocking
    requirement).
  - swe_af/reasoners/pipeline.py — @router.reasoner async def
    run_environment_scout. Same wrapper shape as the three reasoners
    from PR #77; uses run_with_ask_user with budget=2.
  - swe_af/app.py:
    * plan() — Phase 1.5 calls run_environment_scout via app.call BETWEEN
      PM and architect; guarded so it runs only when HAX_API_KEY is set.
    * build() body wrapped in try/finally so clear_scoped_credentials
      ALWAYS runs on exit (success or exception). Eliminates secret
      leakage across builds within the same agent process.
    * app.harness is monkey-patched once at module load to auto-inject
      stored credentials as env vars on EVERY harness call across the
      pipeline. Avoids touching the 25+ existing call sites.

Backwards-compatible: with HAX_API_KEY unset, plan() skips the scout and
the monkey-patched harness passes os.environ through unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three pillars covered:

  - services.py — KNOWN_SERVICES inventory bounds, missing-path safety,
    file + directory signal detection, prompt-summary rendering.
  - credentials_store.py — round-trip, blank/None filtering, isolation
    between execution_ids, get-returns-copy, concurrent thread safety,
    inject-into-env layering rules.
  - scout closure round-trip — pass 1 emits ask_user_form via the
    wrapper, pass 2 sees prior_user_responses and returns
    scoped_credentials; no-services-detected short-circuits the pause;
    model_dump(exclude={"scoped_credentials"}) actually strips the field.

All tests mock HaxClient + app.pause; no real network, no real harness.
Pin a baseline of 8+ services so future trimming is visible in diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@AbirAbbas AbirAbbas force-pushed the feat/environment-scout branch from cb9f647 to bb5f197 Compare May 28, 2026 19:32
@AbirAbbas AbirAbbas changed the base branch from feat/hitl-ask-user-via-form to main May 28, 2026 19:32
@AbirAbbas AbirAbbas merged commit 73251cb into main May 28, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant