Skip to content

5CypZ747c3c87onpmy3sfbhVvCCdUkTNecGWECwAViNsjoGj#1592

Open
shallowtensr wants to merge 1 commit into
unarbos:mainfrom
shallowtensr:submit-v9z
Open

5CypZ747c3c87onpmy3sfbhVvCCdUkTNecGWECwAViNsjoGj#1592
shallowtensr wants to merge 1 commit into
unarbos:mainfrom
shallowtensr:submit-v9z

Conversation

@shallowtensr

@shallowtensr shallowtensr commented May 14, 2026

Copy link
Copy Markdown

Sub-agent fanout architecture + supporting context-quality levers. Net +1412/-221 in agent.py, single file, no contract / sampling / hostname changes. Companion-test gate and other base infrastructure preserved unchanged.


Core mechanism: <parallel_edits> XML protocol fans out independent file edits to child _solve_attempt calls via ThreadPoolExecutor. Children run in lean mode (skip preload) with the brief carrying target-file content + siblings + issue. Per-batch wall cap 200s, size cap _RALPH_MAX_PER_BATCH=5, _RALPH_MAX_WORKERS=4, child max_steps=_RALPH_CHILD_MAX_STEPS=10. Suppressed for single-file batches.

Safety invariants:

  • Recursion-depth guard: children get _lean_mode=True + _recursion_depth=1. Triple-checked: (a) _solve_attempt entry asserts _recursion_depth < 2, (b) the <parallel_edits> extraction site sets _is_child and skips parsing, (c) _ralph_subagent dispatch site asserts child kwargs match the invariant before spawn.
  • Worst-case inference: _RALPH_MAX_WORKERS × _RALPH_CHILD_MAX_STEPS = 4 × 10 = 40 chat calls per round, bounded further by WALL_CLOCK_BUDGET_SECONDS=248.
  • Validator routing: every child + critic chat_completion threads validator-supplied (api_base, api_key, model). No new endpoints, no env reads.

Context-quality levers:

  • Distinctive-phrase preload — grep issue-quoted strings + multilingual tokens.
  • Read-cache short-circuit + pagination-loop detector — close pagination-evasion paths.
  • Path-segment vs substring token matching.
  • Canonical-file content grep — boost files containing issue-quoted snake_case keys; guides edits to where keys already live (avoids orphan-duplicate creation in parallel locations).

Defensive passes:

  • _revert_syntactically_broken_filesast.parse + brace-balance for JS/TS/Rust/Go/Java/C/Kotlin/C#/PHP. Reverts to HEAD on partial-edit breakage.
  • _revert_docker_mount_mode_artifacts — restores HEAD mode on files whose diff is mode-only and content-empty (docker volume mounts emit spurious 100755 → 100644 on shell scripts the agent never touched). Validator generates the submitted diff from working tree, so the on-disk restore is required.

Prompt work: Style-task gate suppresses _strip_low_signal_hunks on formatting sweeps (those whitespace-only hunks are the work product).


Base preserved unchanged: solve() signature + return shape; _resolve_inference_config; all DEFAULT_MODEL / API_BASE / API_KEY constants; every refinement gate including companion-test execution; multi-shot wrapper; DANGEROUS_PATTERNS; # MINER-EDITABLE / # VALIDATOR CONTRACT markers; _EDGECASE_GUARDRAIL; extract_command / _read_context_file helpers. Stdlib only (concurrent.futures). No sampling/hostname/env changes.

@github-actions

github-actions Bot commented May 14, 2026

Copy link
Copy Markdown

OpenRouter PR Judge

Verdict: WARN
Model: anthropic/claude-opus-4.7
Threshold: 70

Score Value
Overall 72
Real edit 80
Safety 85
Scope 75
Contract 90

Summary

Large (~1600 line) but mechanically substantive PR. Headline change is a new <parallel_edits> fanout that parses an <edit file=...> block from the model and dispatches per-file sub-agents via ThreadPoolExecutor as recursive _solve_attempt calls in 'lean mode' with a recursion-depth guard (depth>=2 disallowed, children cannot fan out further). Also adds a cat-cache short-circuit for repeated reads, a pagination-loop detector that forces a write, a style-task gate that suppresses low-signal-hunk stripping for formatting issues, a distinctive-phrase / snake_case-quoted-key grep for context ranking, parallelized git grep, a parallel-nudge refinement turn, a syntax-revert post-pass that restores HEAD on files whose patch is syntactically broken, and a docker-mount mode-only diff revert. solve() signature, return shape, validator contract, DANGEROUS_PATTERNS, and stdlib-only constraint are preserved. _EDGECASE_GUARDRAIL scrubbing is retained.

Static Checks

  • Large patch with 1630 changed lines; judge should inspect for churn.

Judge Reasons

  • Diff is genuinely mechanical: new fanout architecture, new caching, new detectors — not a rename/reorder/Goodhart sweep over the base file.
  • Only stdlib import added is concurrent.futures.ThreadPoolExecutor / as_completed.
  • Recursion is bounded: child _solve_attempt sets _lean_mode=True and _recursion_depth=1; fanout path is gated on _is_child to forbid grandchildren and asserts the invariant at dispatch and at function entry.
  • All child solves pass through the validator-supplied model/api_base/api_key — no new endpoints, no hardcoded secrets, no env-var additions.
  • solve() public signature and return-dict shape preserved; get_patch gains an optional issue= kwarg (internal helper, not contract).
  • MINER-EDITABLE / # VALIDATOR CONTRACT markers preserved; identifiers remain readable; docstrings retained on new helpers.

  • Style-task gate (_issue_is_style_task) is conservative and only suppresses an existing hygiene step when issue language strongly indicates formatting work — plausible behavior change, not scoring-game shaping.
  • _EDGECASE_GUARDRAIL list is retained and still applied; no weakening of DANGEROUS_PATTERNS.

Risks

  • scope-drift: parallel fanout can burst up to ~4 workers × 10 child steps = 40 inference calls per round, considerably more than the single-thread baseline; legitimate but high-impact on validator proxy load.
  • goodhart (minor): build_parallel_nudge_prompt and revised budget-pressure prompts heavily push the model toward writes regardless of confidence, which could increase 'just-edit-something' outputs on tasks where exploration was actually warranted.
  • obfuscation (minor): the patch parser accepts both <parallel_edits> and [parallel_edits] bracket forms — slightly odd surface area but justified in the comment; not used to hide anything.

Required Changes

  • No required changes returned.

@shallowtensr shallowtensr force-pushed the submit-v9z branch 13 times, most recently from 358143f to 48c912b Compare May 14, 2026 21:39
@shallowtensr shallowtensr changed the title 5CypZ747c3c87onpmy3sfbhVvCCdUkTNecGWECwAViNsjoGj sub-agent fanout + lean leaves + 15-lever harness expansion 5CypZ747c3c87onpmy3sfbhVvCCdUkTNecGWECwAViNsjoGj May 14, 2026
@github-actions

github-actions Bot commented May 14, 2026

Copy link
Copy Markdown

Ninja PR Scope Guard

Verdict: PASS
Author: shallowtensr
External contributor file allowlist: agent.py

This PR satisfies the external contributor file-scope and agent.py contract rules.

Changed Files

  • agent.py

@shallowtensr shallowtensr force-pushed the submit-v9z branch 4 times, most recently from a2cc513 to 574cf83 Compare May 14, 2026 21:51
…t `+1412/-221` in `agent.py`, single file, no contract / sampling / hostname changes. Companion-test gate and other base infrastructure preserved unchanged.

---

**Core mechanism**: `<parallel_edits>` XML protocol fans out independent file edits to child `_solve_attempt` calls via `ThreadPoolExecutor`. Children run in lean mode (skip preload) with the brief carrying target-file content + siblings + issue. Per-batch wall cap 200s, size cap `_RALPH_MAX_PER_BATCH=5`, `_RALPH_MAX_WORKERS=4`, child `max_steps=_RALPH_CHILD_MAX_STEPS=10`. Suppressed for single-file batches.

**Safety invariants**:

- **Recursion-depth guard**: children get `_lean_mode=True` + `_recursion_depth=1`. Triple-checked: (a) `_solve_attempt` entry asserts `_recursion_depth < 2`, (b) the `<parallel_edits>` extraction site sets `_is_child` and skips parsing, (c) `_ralph_subagent` dispatch site asserts child kwargs match the invariant before spawn.
- **Worst-case inference**: `_RALPH_MAX_WORKERS × _RALPH_CHILD_MAX_STEPS = 4 × 10 = 40` chat calls per round, bounded further by `WALL_CLOCK_BUDGET_SECONDS=248`.
- **Validator routing**: every child + critic `chat_completion` threads validator-supplied `(api_base, api_key, model)`. No new endpoints, no env reads.

**Context-quality levers**:

- Distinctive-phrase preload — grep issue-quoted strings + multilingual tokens.
- Read-cache short-circuit + pagination-loop detector — close pagination-evasion paths.
- Path-segment vs substring token matching.
- Canonical-file content grep — boost files containing issue-quoted snake_case keys; guides edits to where keys already live (avoids orphan-duplicate creation in parallel locations).

**Defensive passes**:

- `_revert_syntactically_broken_files` — `ast.parse` + brace-balance for JS/TS/Rust/Go/Java/C/Kotlin/C#/PHP. Reverts to HEAD on partial-edit breakage.
- `_revert_docker_mount_mode_artifacts` — restores HEAD mode on files whose diff is mode-only and content-empty (docker volume mounts emit spurious `100755 → 100644` on shell scripts the agent never touched). Validator generates the submitted diff from working tree, so the on-disk restore is required.

**Prompt work**: Style-task gate suppresses `_strip_low_signal_hunks` on formatting sweeps (those whitespace-only hunks are the work product).

---

**Base preserved unchanged**: `solve()` signature + return shape; `_resolve_inference_config`; all `DEFAULT_MODEL` / `API_BASE` / `API_KEY` constants; every refinement gate including companion-test execution; multi-shot wrapper; `DANGEROUS_PATTERNS`; `# MINER-EDITABLE` / `# VALIDATOR CONTRACT` markers; `_EDGECASE_GUARDRAIL`; `extract_command` / `_read_context_file` helpers. Stdlib only (`concurrent.futures`). No sampling/hostname/env changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant