Skip to content

Add implementation failure summary to remote CI#9160

Merged
schrockn merged 8 commits intomasterfrom
plnd/add-impl-failure-summary-03-10-0912
Mar 11, 2026
Merged

Add implementation failure summary to remote CI#9160
schrockn merged 8 commits intomasterfrom
plnd/add-impl-failure-summary-03-10-0912

Conversation

@schrockn
Copy link
Member

@schrockn schrockn commented Mar 10, 2026

When claude --print exits with code 1 during remote implementation, surface a human-readable failure diagnosis in the PR as a comment and GitHub Actions job summary. Reads the session JSONL tail, sends it to Haiku for diagnosis, and posts the markdown summary.

Key Changes

  • New erk exec summarize-impl-failure command to analyze session logs and generate failure diagnostics
  • Added .github/prompts/impl-failure-summarize.md template for Haiku prompt
  • Integrated failure summarization into plan-implement.yml workflow to run on implementation failures
  • Pure functions for session extraction, prompt building, and comment formatting with comprehensive unit tests
Files Changed

Added (3 files)

  • src/erk/cli/commands/exec/scripts/summarize_impl_failure.py - Implementation failure diagnosis using Haiku with session JSONL parsing and PR commenting
  • tests/unit/cli/commands/exec/scripts/test_summarize_impl_failure.py - Unit tests for session extraction, prompt building, and comment body formatting
  • .github/prompts/impl-failure-summarize.md - Haiku prompt template for failure analysis

Modified (2 files)

  • src/erk/cli/commands/exec/group.py - Register new summarize-impl-failure command
  • .github/workflows/plan-implement.yml - Add workflow step to summarize failures and write to GITHUB_STEP_SUMMARY
  • .claude/skills/erk-exec/reference.md - Document new exec command and options
original-plan

Plan: Add Implementation Failure Summary to Remote CI

Context

When claude --print exits with code 1 during remote implementation in plan-implement.yml, the only error message is "Implementation failed with exit code: 1". This is useless for debugging. The session JSONL is already captured and pushed to planned-pr-context, but nobody analyzes it. We want to use Haiku to produce a human-readable failure summary and surface it in two places: as a PR comment on the plan PR, and as a GitHub Actions job summary (visible on the Actions run page).

The existing ci-generate-summaries pattern (fetch logs -> truncate -> prompt Haiku -> post PR comment) provides the exact template to follow.

Approach

Create a new exec command erk exec summarize-impl-failure that reads the raw session JSONL, extracts the tail, sends it to Haiku for diagnosis, posts the summary as a PR comment, and outputs it for use as a job summary. Add a workflow step that calls this on failure and writes the output to $GITHUB_STEP_SUMMARY.

Files to Create

1. .github/prompts/impl-failure-summarize.md

Haiku prompt template. Focused on: what was the agent doing when it stopped, did it encounter an error or just stop, what files/operations were involved.

2. src/erk/cli/commands/exec/scripts/summarize_impl_failure.py

New exec script following the ci_generate_summaries.py pattern.

CLI: erk exec summarize-impl-failure --session-file <path> --pr-number <N> [--exit-code <N>]

Key functions:

  • _extract_session_tail(session_file: Path, *, max_entries: int) -> SessionTail | None — Read JSONL, take last N entries (50), convert to compressed XML via generate_compressed_xml() from preprocess_session.py. Return dataclass with total_events, last_entries_xml, has_result_event.
  • _build_failure_prompt(*, session_tail: SessionTail, exit_code: int | None, prompts_dir: Path) -> str — Load template, substitute variables. Follow _build_summary_prompt() pattern from ci_generate_summaries.py using get_bundled_github_dir().
  • _post_failure_comment(*, pr_number: int, comment_body: str, cwd: Path) -> None — Post via run_subprocess_with_context + gh pr comment.

Flow:

  1. require_cwd(ctx), require_prompt_executor(ctx)
  2. Extract session tail (last 50 entries -> compressed XML)
  3. If empty/None, post minimal "Session too short to analyze" comment
  4. Build prompt from template
  5. Call Haiku via executor.execute_prompt()
  6. Build markdown comment with ## Implementation Failure Summary header
  7. Post to PR as comment
  8. Print the same markdown to stdout (for workflow to capture and write to $GITHUB_STEP_SUMMARY)
  9. Always exit 0 (diagnostic, never blocks workflow)

Reuse:

  • generate_compressed_xml() from preprocess_session.py — converts JSONL entries to compact XML
  • get_bundled_github_dir() from erk.artifacts.paths — locates prompt templates
  • require_prompt_executor() from erk_shared.context.helpers — Haiku access
  • run_subprocess_with_context() from erk_shared.subprocess_utilsgh pr comment

3. tests/unit/cli/commands/exec/scripts/test_summarize_impl_failure.py

Test pure functions: _extract_session_tail (empty, small, normal, large sessions), _build_failure_prompt (template substitution, fallback), comment body formatting.

Files to Modify

4. src/erk/cli/commands/exec/group.py

Add import and registration:

from erk.cli.commands.exec.scripts.summarize_impl_failure import summarize_impl_failure
# ...
exec_group.add_command(summarize_impl_failure, name="summarize-impl-failure")

5. .github/workflows/plan-implement.yml

Insert new step AFTER "Update plan header" (line ~313) and BEFORE "Handle implementation outcome" (line ~315):

- name: Summarize implementation failure
  if: steps.implement.outputs.implementation_success != 'true' && steps.session.outputs.session_file
  continue-on-error: true
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
    GH_TOKEN: ${{ github.token }}
    SESSION_FILE: ${{ steps.session.outputs.session_file }}
    PR_NUMBER: ${{ inputs.pr_number }}
    EXIT_CODE: ${{ steps.implement.outputs.exit_code }}
  run: |
    SUMMARY=$(erk exec summarize-impl-failure \
      --session-file "$SESSION_FILE" \
      --pr-number "$PR_NUMBER" \
      --exit-code "$EXIT_CODE")
    echo "$SUMMARY" >> "$GITHUB_STEP_SUMMARY"

Key: continue-on-error: true so failures here don't block the workflow. The script posts a PR comment internally and prints the summary to stdout, which the workflow captures and writes to $GITHUB_STEP_SUMMARY for the Actions run page.

Verification

  1. Unit tests: Run pytest tests/unit/cli/commands/exec/scripts/test_summarize_impl_failure.py
  2. Local smoke test: Create a sample session JSONL, run erk exec summarize-impl-failure --session-file /tmp/test.jsonl --pr-number 9999 --exit-code 1 (will fail at Haiku call without API key but validates parsing)
  3. CI integration: Dispatch a plan implementation that will fail and verify the PR comment appears
plan-header
schema_version: '2'
created_at: '2026-03-10T09:12:37.612051+00:00'
created_by: schrockn
plan_comment_id: null
last_dispatched_run_id: '22904218520'
last_dispatched_node_id: WFR_kwLOPxC3hc8AAAAFVTKjmA
last_dispatched_at: '2026-03-10T13:14:00.482042+00:00'
last_local_impl_at: null
last_local_impl_event: null
last_local_impl_session: null
last_local_impl_user: null
last_remote_impl_at: '2026-03-10T13:23:50+00:00'
last_remote_impl_run_id: '22904218520'
last_remote_impl_session_id: be981a8d-5d51-4de4-bbd2-3be3cbb53e11
branch_name: plnd/add-impl-failure-summary-03-10-0912
created_from_session: 85824250-b215-4c75-9a7b-3396fbc5e2a9
lifecycle_stage: impl
last_session_branch: planned-pr-context/9160
last_session_id: be981a8d-5d51-4de4-bbd2-3be3cbb53e11
last_session_at: '2026-03-10T13:23:47.594214+00:00'
last_session_source: remote
---

To replicate this PR locally, run:

erk pr teleport 9160

@schrockn schrockn added the erk-pr PR created by erk label Mar 10, 2026
@schrockn
Copy link
Member Author

Plan Queued for Implementation

submission-queued
status: queued
queued_at: '2026-03-10T13:14:00.482042+00:00'
submitted_by: schrockn
validation_results:
  pr_is_open: true
  has_erk_pr_title: true
expected_workflow: plan-implement
trigger_mechanism: label-based-webhook
pr_number: 9160

Plan submitted by schrockn at 2026-03-10T13:14:00.482042+00:00.

The plan-implement workflow has been dispatched via direct dispatch.

Workflow run: https://github.com/dagster-io/erk/actions/runs/22904218520

@github-actions
Copy link
Contributor

⚙️ GitHub Action Started

workflow-started
status: started
started_at: '2026-03-10T13:14:39Z'
workflow_run_id: '22904218520'
workflow_run_url: https://github.com/dagster-io/erk/actions/runs/22904218520
branch_name: plnd/add-impl-failure-summary-03-10-0912
pr_number: 9160

Setup completed successfully.

Branch: plnd/add-impl-failure-summary-03-10-0912
PR: #9160
Status: Ready for implementation

View workflow run

schrockn added a commit that referenced this pull request Mar 10, 2026
@github-actions github-actions bot changed the title [erk-pr] Plan: Add Implementation Failure Summary to Remote CI Add implementation failure summary to remote CI Mar 10, 2026
When `claude --print` exits with code 1 during remote implementation, surface a human-readable failure diagnosis in the PR as a comment and GitHub Actions job summary. Reads the session JSONL tail, sends it to Haiku for diagnosis, and posts the markdown summary.

## Key Changes

- New `erk exec summarize-impl-failure` command to analyze session logs and generate failure diagnostics
- Added `.github/prompts/impl-failure-summarize.md` template for Haiku prompt
- Integrated failure summarization into `plan-implement.yml` workflow to run on implementation failures
- Pure functions for session extraction, prompt building, and comment formatting with comprehensive unit tests

<details>
<summary>Files Changed</summary>

### Added (3 files)
- `src/erk/cli/commands/exec/scripts/summarize_impl_failure.py` - Implementation failure diagnosis using Haiku with session JSONL parsing and PR commenting
- `tests/unit/cli/commands/exec/scripts/test_summarize_impl_failure.py` - Unit tests for session extraction, prompt building, and comment body formatting
- `.github/prompts/impl-failure-summarize.md` - Haiku prompt template for failure analysis

### Modified (4 files)
- `src/erk/cli/commands/exec/group.py` - Register new summarize-impl-failure command
- `.github/workflows/plan-implement.yml` - Add workflow step to summarize failures and write to GITHUB_STEP_SUMMARY
- `.claude/skills/erk-exec/reference.md` - Document new exec command and options
- `docs/learned/integrations/github-review-decision.md`, `docs/learned/tui/status-indicators.md` - Minor formatting fixes

</details>
@schrockn schrockn force-pushed the plnd/add-impl-failure-summary-03-10-0912 branch from 67009c0 to 924b233 Compare March 10, 2026 13:24
@github-actions github-actions bot marked this pull request as ready for review March 10, 2026 13:25
@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

✅ Dignified Code Simplifier Review

Last updated: 2026-03-10 10:42:54 PT

Found 0 violations. All code follows dignified-python standards.

Details

Patterns Checked

✅ LBYL over EAFP - Compliant
✅ Pathlib with encoding - Compliant
✅ Absolute imports only - Compliant
✅ Frozen dataclasses - Compliant
✅ O(1) properties - Compliant
✅ No default parameters - Compliant
✅ Max 4 indentation levels - Compliant

Files Reviewed

  • src/erk/cli/commands/exec/scripts/summarize_impl_failure.py: 0 violations
  • src/erk/cli/commands/exec/group.py: 0 violations
  • tests/unit/cli/commands/exec/scripts/test_summarize_impl_failure.py: 0 violations
  • .github/workflows/plan-implement.yml: 0 violations
  • .github/prompts/impl-failure-summarize.md: 0 violations
  • .claude/skills/erk-exec/reference.md: 0 violations

Activity Log

  • 2026-03-10 10:42:54 PT: Follow-up review - 0 new violations detected. Previous violations resolved.
  • 2026-03-10 07:39:18 PT: All violations resolved - no new issues detected in current diff
  • 2026-03-10 06:28:13 PT: Found 2 violations (construct-and-append loop in summarize_impl_failure.py:62, default parameter in summarize_impl_failure.py:181)
  • 2026-03-10 06:26:27 PT: Initial review - no violations detected. Code is clean and follows dignified-python standards.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

✅ Test Coverage Review

Last updated: 2026-03-10 10:43:04 PT

Found 0 violations. All new source files have test coverage.

Details

Patterns Checked

✅ All new source files have test coverage

Source Files

File Status Tests
src/erk/cli/commands/exec/scripts/summarize_impl_failure.py Added tests/unit/cli/commands/exec/scripts/test_summarize_impl_failure.py
src/erk/cli/commands/exec/group.py Modified ➖ Excluded (import-only)

Files Reviewed

  • summarize_impl_failure.py: 0 violations
  • group.py: 0 violations

Activity Log

  • 2026-03-10 10:43:04 PT: All new source files have test coverage; 11 test cases covering 4 pure functions with comprehensive edge cases
  • 2026-03-10 07:39:27 PT: All new source files have test coverage; 11 test cases covering 4 pure functions with comprehensive edge cases

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

✅ Dignified Python Review

Last updated: 2026-03-10 10:42:54 PT

Found 0 violations across 3 Python files. All code complies with dignified-python standards.

Details

Patterns Checked

✅ LBYL over EAFP - None found
✅ Path operations (.exists before .resolve) - None found
✅ Frozen dataclasses - Compliant
✅ Keyword-only arguments (5+) - Compliant
✅ Default parameter values - None found (tests excluded by design)
✅ Import organization - Compliant
✅ No re-exports - None found
✅ Property performance - None found
✅ Exception handling - Compliant

Files Reviewed

  • src/erk/cli/commands/exec/group.py: 0 violations
  • src/erk/cli/commands/exec/scripts/summarize_impl_failure.py: 0 violations
  • tests/unit/cli/commands/exec/scripts/test_summarize_impl_failure.py: 0 violations

Activity Log

  • 2026-03-10 10:42:54 PT: All violations resolved - previously addressed issues verified compliant
  • 2026-03-10 07:39:09 PT: All violations resolved - code review passes all dignified-python standards
  • 2026-03-10 06:28:04 PT: No violations detected - code complies with all dignified-python standards

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

✅ Tripwires Review

Last updated: 2026-03-10 07:39:40 PT

No tripwire violations detected. Code follows all established patterns.

Details

Tripwires Triggered

Architecture:

CLI:

Testing:

CI:

Tier 1 Pattern Matches (Mechanical)

subprocess\.run\( - None found (uses run_subprocess_with_context wrapper)
Path\.home\(\) - None found (uses context injection)
import time\b|time\.sleep\(|datetime\.now\( - None found
monkeypatch\|@patch - None found in tests
@patch|monkeypatch\. - None found

Tier 2 Semantic Matches (LLM-derived)

✅ Using bare subprocess calls - None found
✅ Adding exec script without context injection - Not applicable (follows pattern)
✅ Creating tests without proper structure - Not applicable (follows pattern)
✅ CI steps without proper error handling - Not applicable (continue-on-error set correctly)

Violations Summary

No violations found.

Files Reviewed

  • : 0 violations
  • : 0 violations
  • : 0 violations
  • : 0 violations
  • : 0 violations

Activity Log

  • 2026-03-10 07:39:40 PT: No tripwire violations detected. Code follows all established patterns (subprocess wrappers, context injection, exec script structure, test coverage).
  • 2026-03-10 06:27:44 PT: No tripwire violations detected. Code follows all established patterns (subprocess wrappers, context injection, exec script structure, test coverage).

Copy link
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

schrockn and others added 2 commits March 10, 2026 10:34
- Simplify construct-and-append loop to list comprehension
- Remove default=None from --exit-code Click option

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@schrockn schrockn merged commit 9a59f70 into master Mar 11, 2026
19 checks passed
@schrockn schrockn deleted the plnd/add-impl-failure-summary-03-10-0912 branch March 11, 2026 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

erk-pr PR created by erk

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant