Skip to content

[Feature]: Separate bullet-point reordering from content changes in diff preview #711

@SandroPacella

Description

@SandroPacella

Is your feature request related to a problem? Please describe.

When the LLM tailors a resume, it often reorders experience bullet points to put the most relevant ones first — which is great. But the diff preview doesn't account for this reordering, so the user sees misleading diffs where completely unrelated bullets are paired together.

Example of what the user sees today:

− Maintained and enhanced React-based lead acquisition form reliably handling
  hundreds of monthly submissions, implementing new features and managing
  version control through Git

+ Implemented comprehensive testing and monitoring strategy using BackstopJS
  for visual regression testing, ensuring consistent deployments across 600+ pages

These aren't the same bullet point — the original was about a React form and the "replacement" is about a testing strategy. But because the LLM moved them to different positions, the diff algorithm (SequenceMatcher in _append_list_changes at apps/backend/app/services/improver.py, lines 307-383) matches them positionally and reports them as a single "modified" change. The user has no way to tell that the first bullet was merely moved, not deleted and replaced.

This makes the diff preview untrustworthy for the exact scenario where users need it most — understanding what the AI actually changed before accepting.

Related: #710 (inline word-level diff would further improve readability, but is a separate concern from the matching problem described here).

Describe the solution you'd like

Separate reordering from content changes so the user can review them independently. Concretely:

  1. Match bullets by content similarity, not position. Before diffing, align original and improved bullet lists using fuzzy/semantic matching (e.g., cosine similarity on TF-IDF vectors, or difflib.SequenceMatcher at the word level on each pair, picking the best match above a threshold). This correctly identifies:

    • Moved bullets — same or very similar content at a different index
    • Modified bullets — similar content with wording changes (these are the real edits)
    • Added bullets — no good match in the original
    • Removed bullets — no good match in the improved version
  2. Show reordering as a distinct, low-noise change type. In the diff preview, moved-but-unchanged bullets could be shown with a subtle indicator (e.g., an arrow icon and "Moved from position 3 → 1") rather than the current alarming red-strikethrough / green-new treatment. This lets users quickly skim past reordering and focus on actual content edits.

  3. Optionally, apply changes in two phases:

    • Phase 1 — Content diff: Show the user what words changed in each bullet, with bullets matched by similarity (not position). The user reviews and accepts/rejects content changes.
    • Phase 2 — Reorder: After content changes are accepted, show the new ordering and let the user accept or reject it.

    This two-phase approach means the user never sees the confusing "these two unrelated bullets got swapped" artifact.

Describe alternatives you've considered

  1. Instruct the LLM not to reorder bullets — Simplest fix, but sacrifices a genuinely valuable feature (relevance-based ordering). Not ideal.
  2. Always show side-by-side full lists instead of per-bullet diffs — Avoids false pairings but makes it harder to see what actually changed in each bullet. Doesn't scale well when there are many bullets.
  3. Use the LLM itself to report which bullets map to which — The LLM could return a mapping (e.g., {0: 2, 1: 0, 2: 1}) alongside the tailored content. This is accurate when it works but adds another failure mode (hallucinated mappings). Could work as a supplement to algorithmic matching.

Additional context

The root cause is in _append_list_changes (apps/backend/app/services/improver.py, lines 307-383), which uses SequenceMatcher(a=original_items, b=improved_items). This treats each bullet as an atomic token and matches by longest-common-subsequence of the list, not by the content of each bullet. When the LLM reorders bullets, the LCS shrinks and SequenceMatcher falls back to positional "replace" ops that pair unrelated bullets.

The same issue affects the RegenerateDiffPreview component (apps/frontend/components/builder/regenerate-diff-preview.tsx) which displays original vs. new content blocks positionally.

A content-similarity-based matching step before the existing SequenceMatcher call would fix the pairing problem without changing any frontend code — the frontend already handles added, removed, and modified change types correctly; it just needs to receive correctly paired data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions