Skip to content

Add Entire blame and why commands#1305

Open
suhaanthayyil wants to merge 17 commits into
entireio:mainfrom
suhaanthayyil:codex/entire-blame-why
Open

Add Entire blame and why commands#1305
suhaanthayyil wants to merge 17 commits into
entireio:mainfrom
suhaanthayyil:codex/entire-blame-why

Conversation

@suhaanthayyil
Copy link
Copy Markdown

Summary

Adds first-class entire blame and entire why commands for AI-aware line attribution.

What changed

  • Adds entire blame <file> [--line N|N-M] [--json].
  • Adds entire why <file[:line]> [--json].
  • Resolves current file lines through git blame --line-porcelain.
  • Enriches blamed commits with Entire-Checkpoint trailers plus checkpoint/session metadata from entire/checkpoints/v1.
  • Labels lines as [AI], [HU], [MX], or [??] for uncommitted lines.
  • Shows prompt, session, checkpoint, agent, model, commit, and a checkpoint explain hint for a specific line.
  • Adds stable JSON output and unit coverage for parser/range/output behavior.
  • Documents the commands in the README command table.

Why

After an AI agent edits a repo, git blame can show the commit, but not why the line exists. These commands connect a line back to the Entire checkpoint and original agent prompt, so users can inspect AI-authored code without manually chasing commit trailers and checkpoint metadata.

Validation

  • go test ./cmd/entire/cli -run 'Test(ParseBlamePorcelain|ParseAttributionLineRange|Attribution)' -count=1
  • go test ./cmd/entire/cli -run TestAttribution -count=1
  • go test ./cmd/entire/cli/checkpoint ./cmd/entire/cli/trailers -count=1
  • go build -o /tmp/entire-blame-why-clean ./cmd/entire
  • go test ./...
  • go vet ./cmd/entire/cli/...
  • mise run lint
  • mise run test:ci:core
  • mise run test:e2e:canary

Canary result: 59/59 vogon tests passed and 4/4 roger-roger tests passed.

Manual validation on a real Entire-enabled repo:

  • Ran the built CLI against /Users/suhaan/Documents/Ultron.
  • entire blame references/entire-checkpointing.md --line 60-70 showed mixed [AI] and [HU] attribution.
  • entire why references/entire-checkpointing.md:68 resolved the line to Codex, model gpt-5.5, checkpoint 9a91ce5c55f2, session 019e6ba4, commit a77cd651, and the original prompt.

@suhaanthayyil suhaanthayyil requested a review from a team as a code owner May 30, 2026 14:00
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds two new CLI commands—entire blame and entire why—to attribute current file lines to Entire checkpoints (via git blame --line-porcelain) and to explain the checkpoint/prompt context behind a specific file/line, with optional JSON output.

Changes:

  • Introduces entire blame <file> [--line N|N-M] [--json] and entire why <file[:line]> [--json].
  • Implements attribution resolution that maps blamed commits to Entire-Checkpoint trailers and checkpoint/session metadata.
  • Adds unit tests for blame porcelain parsing, line range parsing, and JSON/output behavior; updates README command table/docs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
README.md Documents the new blame and why commands and adds brief usage context.
cmd/entire/cli/root.go Registers the new blame and why Cobra commands on the root command.
cmd/entire/cli/attribution.go Implements the attribution pipeline, rendering, JSON output, and command handlers for blame/why.
cmd/entire/cli/attribution_test.go Adds unit tests covering parsing and output behaviors for the new attribution functionality.

Comment on lines +173 to +180
result, err := resolveFileAttribution(ctx, file, false)
if err != nil {
return err
}
if lineRange != nil {
result.Lines = filterAttributionLines(result.Lines, *lineRange)
result.Summary = summarizeAttributionLines(result.Lines)
}
Comment on lines +580 to +591
func runGitBlame(ctx context.Context, repoRoot, file string) ([]rawBlameLine, error) {
cmd := exec.CommandContext(ctx, "git", "-C", repoRoot, "blame", "--line-porcelain", "--", file)
var stderr bytes.Buffer
cmd.Stderr = &stderr
out, err := cmd.Output()
if err != nil {
msg := strings.TrimSpace(stderr.String())
if msg == "" {
msg = err.Error()
}
return nil, fmt.Errorf("git blame failed for %s: %s", file, msg)
}
Comment on lines +784 to +787
fmt.Fprintf(w, "\n %s %d in %s\n", sty.render(sty.bold, "Line"), line.LineNumber, sty.render(sty.bold, file))
if line.Content != "" {
fmt.Fprintf(w, " %s\n\n", sty.render(sty.dim, strings.TrimSpace(line.Content)))
}
Comment thread cmd/entire/cli/attribution.go Outdated
Comment on lines +376 to +380
if len(candidates) > 1 {
line.Candidates = candidates
} else if len(candidates) == 1 {
line.Candidates = candidates
}
@suhaanthayyil suhaanthayyil force-pushed the codex/entire-blame-why branch from b479d9a to 0ef2a90 Compare May 31, 2026 13:55
@suhaanthayyil
Copy link
Copy Markdown
Author

suhaanthayyil commented May 31, 2026

Updated this PR to address the attribution review feedback:

  • --line --json now prunes checkpoint context to only the returned lines.
  • Mixed attribution now follows the selected/file-matching checkpoint instead of unrelated trailers on the same commit.
  • git blame errors now wrap the underlying exec error.
  • entire why preserves leading whitespace in the displayed source line.
  • Simplified the redundant candidates assignment.

Validation rerun:

  • go test ./cmd/entire/cli -run 'TestAttribution|TestParseBlame|TestParseAttribution|TestRunGitBlame' -count=1
  • go test ./cmd/entire/cli -count=1
  • go test ./...
  • mise run lint

Comment thread cmd/entire/cli/attribution.go Outdated
return parseBlamePorcelain(string(out))
}

var blameHeaderRe = regexp.MustCompile(`^([0-9a-f]{40})\s+\d+\s+(\d+)(?:\s+\d+)?$`)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would not work for SHA-256 repos which would be 64 chars. Probably good to add support for both.

Comment thread cmd/entire/cli/attribution.go Outdated
}
continue
}
if ctx, ok := resolver.checkpointCache[candidate.CheckpointID]; ok {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shadows the function parameter ctx, it's not an issue but maybe we should use a different name here.

Copy link
Copy Markdown
Collaborator

@Soph Soph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I did run /simplify with claude code that removed some duplication and then also two other things I flagged as comments.

@suhaanthayyil suhaanthayyil force-pushed the codex/entire-blame-why branch from 8094537 to 2e14aab Compare June 2, 2026 00:57
@suhaanthayyil
Copy link
Copy Markdown
Author

Updated again and kept Soph's /simplify changes in the branch.

Fixes included:

  • supports both SHA-1 and SHA-256 git blame object IDs
  • treats both 40-zero and 64-zero blame IDs as uncommitted lines
  • renames the shadowed ctx local to checkpointCtx

Validation rerun on the rebased branch:

  • go test ./cmd/entire/cli -run 'TestParseBlamePorcelain|TestIsZeroCommit|TestAttribution|TestRunGitBlame|TestParseAttribution' -count=1
  • go test ./cmd/entire/cli -count=1
  • go test ./... -count=1
  • go vet ./cmd/entire/cli/...

@suhaanthayyil suhaanthayyil force-pushed the codex/entire-blame-why branch from 5beab2d to 604941a Compare June 2, 2026 17:35
@suhaanthayyil suhaanthayyil force-pushed the codex/entire-blame-why branch from 604941a to 6ecc60c Compare June 2, 2026 18:30
@suhaanthayyil
Copy link
Copy Markdown
Author

Pushed the compact blame alignment fix.

Root cause:

  • The compact header was using the new 6-char Agent width, but the row formatter was still hardcoded to a 10-char Agent field, so Author, Checkpoint, and Content drifted right in rows.

What changed:

  • Compact blame rows now use the same dynamic widths as the header.
  • Agent is 6 chars, matching Author.
  • The line column still uses max(len("Line"), len(maxLine)), so it expands from 4 chars to 5 chars at line 10000 and beyond.
  • Added regression checks for exact compact column starts and the 9999/10000 case.

Validation rerun:

  • go test ./cmd/entire/cli -run 'TestAttributionBlame|TestParseBlamePorcelain|TestIsZeroCommit|TestRunGitBlame|TestParseAttribution|TestResolveCheckpointSummaryProvider_ConfiguredExternalProvider' -count=1
  • go test ./cmd/entire/cli -count=1
  • go test ./... -count=1
  • go vet ./cmd/entire/cli/...
  • mise run lint
  • local compact and long blame demo output

suhaanthayyil and others added 9 commits June 2, 2026 14:40
Mark `entire blame` and `entire why` as Hidden during maturation, matching
the pattern used by review/investigate/org/project/repo/grant. Register both
in the labs experimental-command registry so they're discoverable via
`entire labs`, and reword the README to point users there instead of listing
them in the main command table.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 0c07b873709c
… rule

Two related correctness bugs in line attribution:

1. `entire blame` and `entire why` disagreed on the tag. resolveLine marks a
   line [MX] only when its *preferred* candidate is mixed, but the why-time
   enrichment path (enrichAttributionLineWithFetch) marked it [MX] if *any*
   candidate was mixed. A line with a non-mixed preferred candidate and a mixed
   secondary one rendered [AI] in blame and [MX] in why. Extract a single
   authorshipForPreferred() rule used by both paths.

2. Mixed was checkpoint-scoped, not session-scoped. readCheckpointContext
   OR-combined the checkpoint-wide CombinedAttribution with every session's
   attribution, so a line from an agent-only file got tagged [MX] whenever the
   checkpoint *also* touched a hand-edited file. Scope Mixed to the session
   whose work actually touched this file; fall back to the checkpoint-wide
   attribution only when no session metadata resolves.

Adds a regression test where the file-touching session is pure-AI but the
checkpoint's combined attribution is mixed — the line must stay [AI].

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: ef513bf77157
attributionCandidate and attributionCheckpointContext declared identical field
lists with identical JSON tags, kept in sync only by a candidateFromContext()
cast. Adding a field meant editing both. Alias attributionCandidate to
attributionCheckpointContext so there is one definition, and drop the now-noop
conversion helper.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 91b11a441843
Two silent-degradation paths in readCheckpointContext gave the user false
confidence:

- When a blamed file matched none of a checkpoint's sessions (e.g. it was
  renamed after the checkpoint), session selection silently fell back to the
  first session and presented its agent/prompt as exact. Track whether the
  match was by file, and when it was a multi-session fallback set a new
  SessionFallback flag; `entire why` now prints a "may have been renamed" hint.

- When a checkpoint's sessions all failed to read (summary present, per-session
  metadata unreadable), the line kept a confident [AI] tag with blank
  agent/model/prompt and no indication. Set MetadataMissing in that case so the
  existing "trailer-level attribution only" hint shows and the why path retries
  via remote fetch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 7f9d62c609b2
renderAttributionBlameCompact and renderAttributionBlameLong each repeated the
same header print, empty-file short-circuit, and trailing summary call around
their column layouts. Pull that scaffolding into renderAttributionBlameTable
and pass each variant's table body as a callback. Output is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 5fc3dec002a5
The summary computed AI/Human/Mixed percentages by independent integer
division, so they drifted from a coherent whole — e.g. one line each of AI,
Human, and Mixed rendered 33% / 33% / 33% = 99%. Apportion with the
largest-remainder (Hamilton) method across all four buckets (including
uncommitted, which shares the 100% but is shown only as a count) so the visible
figures stay coherent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: e2b9528e57db
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants