Add implementation failure summary to remote CI by schrockn · Pull Request #9160 · dagster-io/erk

schrockn · 2026-03-10T13:12:38Z

When claude --print exits with code 1 during remote implementation, surface a human-readable failure diagnosis in the PR as a comment and GitHub Actions job summary. Reads the session JSONL tail, sends it to Haiku for diagnosis, and posts the markdown summary.

Key Changes

New erk exec summarize-impl-failure command to analyze session logs and generate failure diagnostics
Added .github/prompts/impl-failure-summarize.md template for Haiku prompt
Integrated failure summarization into plan-implement.yml workflow to run on implementation failures
Pure functions for session extraction, prompt building, and comment formatting with comprehensive unit tests

Files Changed

Added (3 files)

src/erk/cli/commands/exec/scripts/summarize_impl_failure.py - Implementation failure diagnosis using Haiku with session JSONL parsing and PR commenting
tests/unit/cli/commands/exec/scripts/test_summarize_impl_failure.py - Unit tests for session extraction, prompt building, and comment body formatting
.github/prompts/impl-failure-summarize.md - Haiku prompt template for failure analysis

Modified (2 files)

src/erk/cli/commands/exec/group.py - Register new summarize-impl-failure command
.github/workflows/plan-implement.yml - Add workflow step to summarize failures and write to GITHUB_STEP_SUMMARY
.claude/skills/erk-exec/reference.md - Document new exec command and options

original-plan

Plan: Add Implementation Failure Summary to Remote CI

Context

When claude --print exits with code 1 during remote implementation in plan-implement.yml, the only error message is "Implementation failed with exit code: 1". This is useless for debugging. The session JSONL is already captured and pushed to planned-pr-context, but nobody analyzes it. We want to use Haiku to produce a human-readable failure summary and surface it in two places: as a PR comment on the plan PR, and as a GitHub Actions job summary (visible on the Actions run page).

The existing ci-generate-summaries pattern (fetch logs -> truncate -> prompt Haiku -> post PR comment) provides the exact template to follow.

Approach

Create a new exec command erk exec summarize-impl-failure that reads the raw session JSONL, extracts the tail, sends it to Haiku for diagnosis, posts the summary as a PR comment, and outputs it for use as a job summary. Add a workflow step that calls this on failure and writes the output to $GITHUB_STEP_SUMMARY.

Files to Create

1. `.github/prompts/impl-failure-summarize.md`

Haiku prompt template. Focused on: what was the agent doing when it stopped, did it encounter an error or just stop, what files/operations were involved.

2. `src/erk/cli/commands/exec/scripts/summarize_impl_failure.py`

New exec script following the ci_generate_summaries.py pattern.

CLI: erk exec summarize-impl-failure --session-file <path> --pr-number <N> [--exit-code <N>]

Key functions:

_extract_session_tail(session_file: Path, *, max_entries: int) -> SessionTail | None — Read JSONL, take last N entries (50), convert to compressed XML via generate_compressed_xml() from preprocess_session.py. Return dataclass with total_events, last_entries_xml, has_result_event.
_build_failure_prompt(*, session_tail: SessionTail, exit_code: int | None, prompts_dir: Path) -> str — Load template, substitute variables. Follow _build_summary_prompt() pattern from ci_generate_summaries.py using get_bundled_github_dir().
_post_failure_comment(*, pr_number: int, comment_body: str, cwd: Path) -> None — Post via run_subprocess_with_context + gh pr comment.

Flow:

require_cwd(ctx), require_prompt_executor(ctx)
Extract session tail (last 50 entries -> compressed XML)
If empty/None, post minimal "Session too short to analyze" comment
Build prompt from template
Call Haiku via executor.execute_prompt()
Build markdown comment with ## Implementation Failure Summary header
Post to PR as comment
Print the same markdown to stdout (for workflow to capture and write to $GITHUB_STEP_SUMMARY)
Always exit 0 (diagnostic, never blocks workflow)

Reuse:

generate_compressed_xml() from preprocess_session.py — converts JSONL entries to compact XML
get_bundled_github_dir() from erk.artifacts.paths — locates prompt templates
require_prompt_executor() from erk_shared.context.helpers — Haiku access
run_subprocess_with_context() from erk_shared.subprocess_utils — gh pr comment

3. `tests/unit/cli/commands/exec/scripts/test_summarize_impl_failure.py`

Test pure functions: _extract_session_tail (empty, small, normal, large sessions), _build_failure_prompt (template substitution, fallback), comment body formatting.

Files to Modify

4. `src/erk/cli/commands/exec/group.py`

Add import and registration:

from erk.cli.commands.exec.scripts.summarize_impl_failure import summarize_impl_failure
# ...
exec_group.add_command(summarize_impl_failure, name="summarize-impl-failure")

5. `.github/workflows/plan-implement.yml`

Insert new step AFTER "Update plan header" (line ~313) and BEFORE "Handle implementation outcome" (line ~315):

- name: Summarize implementation failure
  if: steps.implement.outputs.implementation_success != 'true' && steps.session.outputs.session_file
  continue-on-error: true
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
    GH_TOKEN: ${{ github.token }}
    SESSION_FILE: ${{ steps.session.outputs.session_file }}
    PR_NUMBER: ${{ inputs.pr_number }}
    EXIT_CODE: ${{ steps.implement.outputs.exit_code }}
  run: |
    SUMMARY=$(erk exec summarize-impl-failure \
      --session-file "$SESSION_FILE" \
      --pr-number "$PR_NUMBER" \
      --exit-code "$EXIT_CODE")
    echo "$SUMMARY" >> "$GITHUB_STEP_SUMMARY"

Key: continue-on-error: true so failures here don't block the workflow. The script posts a PR comment internally and prints the summary to stdout, which the workflow captures and writes to $GITHUB_STEP_SUMMARY for the Actions run page.

Verification

Unit tests: Run pytest tests/unit/cli/commands/exec/scripts/test_summarize_impl_failure.py
Local smoke test: Create a sample session JSONL, run erk exec summarize-impl-failure --session-file /tmp/test.jsonl --pr-number 9999 --exit-code 1 (will fail at Haiku call without API key but validates parsing)
CI integration: Dispatch a plan implementation that will fail and verify the PR comment appears

plan-header

schema_version: '2'
created_at: '2026-03-10T09:12:37.612051+00:00'
created_by: schrockn
plan_comment_id: null
last_dispatched_run_id: '22904218520'
last_dispatched_node_id: WFR_kwLOPxC3hc8AAAAFVTKjmA
last_dispatched_at: '2026-03-10T13:14:00.482042+00:00'
last_local_impl_at: null
last_local_impl_event: null
last_local_impl_session: null
last_local_impl_user: null
last_remote_impl_at: '2026-03-10T13:23:50+00:00'
last_remote_impl_run_id: '22904218520'
last_remote_impl_session_id: be981a8d-5d51-4de4-bbd2-3be3cbb53e11
branch_name: plnd/add-impl-failure-summary-03-10-0912
created_from_session: 85824250-b215-4c75-9a7b-3396fbc5e2a9
lifecycle_stage: impl
last_session_branch: planned-pr-context/9160
last_session_id: be981a8d-5d51-4de4-bbd2-3be3cbb53e11
last_session_at: '2026-03-10T13:23:47.594214+00:00'
last_session_source: remote

---

To replicate this PR locally, run:

erk pr teleport 9160

schrockn · 2026-03-10T13:14:09Z

Plan Queued for Implementation

submission-queued

status: queued
queued_at: '2026-03-10T13:14:00.482042+00:00'
submitted_by: schrockn
validation_results:
  pr_is_open: true
  has_erk_pr_title: true
expected_workflow: plan-implement
trigger_mechanism: label-based-webhook
pr_number: 9160

Plan submitted by schrockn at 2026-03-10T13:14:00.482042+00:00.

The plan-implement workflow has been dispatched via direct dispatch.

Workflow run: https://github.com/dagster-io/erk/actions/runs/22904218520

github-actions · 2026-03-10T13:14:39Z

⚙️ GitHub Action Started

workflow-started

status: started
started_at: '2026-03-10T13:14:39Z'
workflow_run_id: '22904218520'
workflow_run_url: https://github.com/dagster-io/erk/actions/runs/22904218520
branch_name: plnd/add-impl-failure-summary-03-10-0912
pr_number: 9160

Setup completed successfully.

Branch: plnd/add-impl-failure-summary-03-10-0912
PR: #9160
Status: Ready for implementation

View workflow run

When `claude --print` exits with code 1 during remote implementation, surface a human-readable failure diagnosis in the PR as a comment and GitHub Actions job summary. Reads the session JSONL tail, sends it to Haiku for diagnosis, and posts the markdown summary. ## Key Changes - New `erk exec summarize-impl-failure` command to analyze session logs and generate failure diagnostics - Added `.github/prompts/impl-failure-summarize.md` template for Haiku prompt - Integrated failure summarization into `plan-implement.yml` workflow to run on implementation failures - Pure functions for session extraction, prompt building, and comment formatting with comprehensive unit tests <details> <summary>Files Changed</summary> ### Added (3 files) - `src/erk/cli/commands/exec/scripts/summarize_impl_failure.py` - Implementation failure diagnosis using Haiku with session JSONL parsing and PR commenting - `tests/unit/cli/commands/exec/scripts/test_summarize_impl_failure.py` - Unit tests for session extraction, prompt building, and comment body formatting - `.github/prompts/impl-failure-summarize.md` - Haiku prompt template for failure analysis ### Modified (4 files) - `src/erk/cli/commands/exec/group.py` - Register new summarize-impl-failure command - `.github/workflows/plan-implement.yml` - Add workflow step to summarize failures and write to GITHUB_STEP_SUMMARY - `.claude/skills/erk-exec/reference.md` - Document new exec command and options - `docs/learned/integrations/github-review-decision.md`, `docs/learned/tui/status-indicators.md` - Minor formatting fixes </details>

github-actions · 2026-03-10T13:26:35Z

✅ Dignified Code Simplifier Review

Last updated: 2026-03-10 10:42:54 PT

Found 0 violations. All code follows dignified-python standards.

Details

Patterns Checked

✅ LBYL over EAFP - Compliant
✅ Pathlib with encoding - Compliant
✅ Absolute imports only - Compliant
✅ Frozen dataclasses - Compliant
✅ O(1) properties - Compliant
✅ No default parameters - Compliant
✅ Max 4 indentation levels - Compliant

Files Reviewed

src/erk/cli/commands/exec/scripts/summarize_impl_failure.py: 0 violations
src/erk/cli/commands/exec/group.py: 0 violations
tests/unit/cli/commands/exec/scripts/test_summarize_impl_failure.py: 0 violations
.github/workflows/plan-implement.yml: 0 violations
.github/prompts/impl-failure-summarize.md: 0 violations
.claude/skills/erk-exec/reference.md: 0 violations

Activity Log

2026-03-10 10:42:54 PT: Follow-up review - 0 new violations detected. Previous violations resolved.
2026-03-10 07:39:18 PT: All violations resolved - no new issues detected in current diff
2026-03-10 06:28:13 PT: Found 2 violations (construct-and-append loop in summarize_impl_failure.py:62, default parameter in summarize_impl_failure.py:181)
2026-03-10 06:26:27 PT: Initial review - no violations detected. Code is clean and follows dignified-python standards.

github-actions · 2026-03-10T13:26:36Z

✅ Test Coverage Review

Last updated: 2026-03-10 10:43:04 PT

Found 0 violations. All new source files have test coverage.

Details

Patterns Checked

✅ All new source files have test coverage

Source Files

File	Status	Tests
`src/erk/cli/commands/exec/scripts/summarize_impl_failure.py`	Added	✅ `tests/unit/cli/commands/exec/scripts/test_summarize_impl_failure.py`
`src/erk/cli/commands/exec/group.py`	Modified	➖ Excluded (import-only)

Files Reviewed

summarize_impl_failure.py: 0 violations
group.py: 0 violations

Activity Log

2026-03-10 10:43:04 PT: All new source files have test coverage; 11 test cases covering 4 pure functions with comprehensive edge cases
2026-03-10 07:39:27 PT: All new source files have test coverage; 11 test cases covering 4 pure functions with comprehensive edge cases

src/erk/cli/commands/exec/scripts/summarize_impl_failure.py

github-actions · 2026-03-10T13:28:14Z

✅ Dignified Python Review

Last updated: 2026-03-10 10:42:54 PT

Found 0 violations across 3 Python files. All code complies with dignified-python standards.

Details

Patterns Checked

✅ LBYL over EAFP - None found
✅ Path operations (.exists before .resolve) - None found
✅ Frozen dataclasses - Compliant
✅ Keyword-only arguments (5+) - Compliant
✅ Default parameter values - None found (tests excluded by design)
✅ Import organization - Compliant
✅ No re-exports - None found
✅ Property performance - None found
✅ Exception handling - Compliant

Files Reviewed

src/erk/cli/commands/exec/group.py: 0 violations
src/erk/cli/commands/exec/scripts/summarize_impl_failure.py: 0 violations
tests/unit/cli/commands/exec/scripts/test_summarize_impl_failure.py: 0 violations

Activity Log

2026-03-10 10:42:54 PT: All violations resolved - previously addressed issues verified compliant
2026-03-10 07:39:09 PT: All violations resolved - code review passes all dignified-python standards
2026-03-10 06:28:04 PT: No violations detected - code complies with all dignified-python standards

github-actions · 2026-03-10T13:28:44Z

✅ Tripwires Review

Last updated: 2026-03-10 07:39:40 PT

No tripwire violations detected. Code follows all established patterns.

Details

Tripwires Triggered

Architecture:

Subprocess wrappers → loaded Subprocess Wrappers
Context injection → loaded Erk Architecture Patterns

CLI:

Exec script patterns → loaded Exec Script Patterns

Testing:

Test structure → loaded Erk Test Reference

CI:

CI workflow patterns → loaded GitHub Actions Workflow Patterns

Tier 1 Pattern Matches (Mechanical)

✅ subprocess\.run$ - None found (uses run_subprocess_with_context wrapper)
✅ Path\.home\($ - None found (uses context injection)
✅ import time\b|time\.sleep\(|datetime\.now\( - None found
✅ monkeypatch\|@patch - None found in tests
✅ @patch|monkeypatch\. - None found

Tier 2 Semantic Matches (LLM-derived)

✅ Using bare subprocess calls - None found
✅ Adding exec script without context injection - Not applicable (follows pattern)
✅ Creating tests without proper structure - Not applicable (follows pattern)
✅ CI steps without proper error handling - Not applicable (continue-on-error set correctly)

Violations Summary

No violations found.

Files Reviewed

: 0 violations
: 0 violations
: 0 violations
: 0 violations
: 0 violations

Activity Log

2026-03-10 07:39:40 PT: No tripwire violations detected. Code follows all established patterns (subprocess wrappers, context injection, exec script structure, test coverage).
2026-03-10 06:27:44 PT: No tripwire violations detected. Code follows all established patterns (subprocess wrappers, context injection, exec script structure, test coverage).

schrockn · 2026-03-10T14:32:32Z

Add implementation failure summary to remote CI #9160 👈 (View in Graphite)
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

- Simplify construct-and-append loop to list comprehension - Remove default=None from --exit-code Click option Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add plan: Plan: Add Implementation Failure Summary to Remote CI

f9782f2

schrockn added the erk-pr PR created by erk label Mar 10, 2026

Add plan for PR #9160

e1965a4

Remove plan staging dirs before implementation

7e02c71

schrockn added a commit that referenced this pull request Mar 10, 2026

Push impl session for plan #9160

b0c18e4

github-actions bot changed the title ~~[erk-pr] Plan: Add Implementation Failure Summary to Remote CI~~ Add implementation failure summary to remote CI Mar 10, 2026

schrockn added a commit that referenced this pull request Mar 10, 2026

Push impl session be981a8d-5d51-4de4-bbd2-3be3cbb53e11 for plan #9160

e433fc2

schrockn force-pushed the plnd/add-impl-failure-summary-03-10-0912 branch from 67009c0 to 924b233 Compare March 10, 2026 13:24

github-actions bot marked this pull request as ready for review March 10, 2026 13:25

schrockn and others added 2 commits March 10, 2026 13:25

Trigger CI workflows

6d54dd7

Auto-fix formatting (docs-sync + ruff + Prettier)

315dc5d

github-actions bot reviewed Mar 10, 2026

View reviewed changes

src/erk/cli/commands/exec/scripts/summarize_impl_failure.py Outdated Show resolved Hide resolved

github-actions bot reviewed Mar 10, 2026

View reviewed changes

src/erk/cli/commands/exec/scripts/summarize_impl_failure.py Outdated Show resolved Hide resolved

schrockn and others added 2 commits March 10, 2026 10:34

Address PR review comments (batch 1/1)

007f5d0

- Simplify construct-and-append loop to list comprehension - Remove default=None from --exit-code Click option Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

update

09c51f5

schrockn merged commit 9a59f70 into master Mar 11, 2026
19 checks passed

schrockn deleted the plnd/add-impl-failure-summary-03-10-0912 branch March 11, 2026 15:44

schrockn mentioned this pull request Mar 11, 2026

[erk-learn] Learn: Add implementation failure summary to remote CI #9205

Draft

Conversation

schrockn commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

Added (3 files)

Modified (2 files)

Plan: Add Implementation Failure Summary to Remote CI

Context

Approach

Files to Create

1. .github/prompts/impl-failure-summarize.md

2. src/erk/cli/commands/exec/scripts/summarize_impl_failure.py

3. tests/unit/cli/commands/exec/scripts/test_summarize_impl_failure.py

Files to Modify

4. src/erk/cli/commands/exec/group.py

5. .github/workflows/plan-implement.yml

Verification

Uh oh!

schrockn commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Dignified Code Simplifier Review

Patterns Checked

Files Reviewed

Activity Log

Uh oh!

github-actions bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Test Coverage Review

Patterns Checked

Source Files

Files Reviewed

Activity Log

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Dignified Python Review

Patterns Checked

Files Reviewed

Activity Log

Uh oh!

github-actions bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Tripwires Review

Tripwires Triggered

Tier 1 Pattern Matches (Mechanical)

Tier 2 Semantic Matches (LLM-derived)

Violations Summary

Files Reviewed

Activity Log

Uh oh!

schrockn commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

schrockn commented Mar 10, 2026 •

edited

Loading

1. `.github/prompts/impl-failure-summarize.md`

2. `src/erk/cli/commands/exec/scripts/summarize_impl_failure.py`

3. `tests/unit/cli/commands/exec/scripts/test_summarize_impl_failure.py`

4. `src/erk/cli/commands/exec/group.py`

5. `.github/workflows/plan-implement.yml`

github-actions bot commented Mar 10, 2026 •

edited

Loading

github-actions bot commented Mar 10, 2026 •

edited

Loading

github-actions bot commented Mar 10, 2026 •

edited

Loading

github-actions bot commented Mar 10, 2026 •

edited

Loading