Skip to content

Consolidate diff output filtering#311

Merged
rs545837 merged 1 commit into
Ataraxy-Labs:mainfrom
Iron-Ham:Iron-Ham/consolidate-diff-output-filtering
Jun 4, 2026
Merged

Consolidate diff output filtering#311
rs545837 merged 1 commit into
Ataraxy-Labs:mainfrom
Iron-Ham:Iron-Ham/consolidate-diff-output-filtering

Conversation

@Iron-Ham
Copy link
Copy Markdown
Contributor

@Iron-Ham Iron-Ham commented Jun 2, 2026

Summary

  • consolidate binary file diff reporting, diff output escaping, and cosmetic orphan filtering into one branch
  • keep diff formatter, JSON, CLI, and MCP reporting behavior aligned across the related changes

Supersedes #237, #290, #294.
Closes #181.
Closes #273.
Fixes #272.

Test plan

  • cargo test --manifest-path crates/Cargo.toml -p sem-cli --test diff_file_compare
  • cargo test --manifest-path crates/Cargo.toml -p sem-cli --test diff_patch
  • cargo test --manifest-path crates/Cargo.toml -p sem-cli --test diff_output_hardening
  • cargo test --manifest-path crates/Cargo.toml -p sem-cli --test diff_no_cosmetics
  • cargo test --manifest-path crates/Cargo.toml -p sem-cli commands::diff::tests
  • cargo test --manifest-path crates/Cargo.toml -p sem-core parser::differ::tests
  • cargo check --manifest-path crates/Cargo.toml -p sem-core -p sem-cli -p sem-mcp

Copy link
Copy Markdown

@inspect-review inspect-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inspect review

Triage: 106 entities analyzed | 0 critical, 0 high, 48 medium, 58 low
Verdict: standard_review

Findings (7)

  1. [low] In recalculate_diff_summary, compound changes (Moved/Renamed/Reordered with content changes) are double-counted in modified_count. The logic increments modified_count for both the change type bucket AND when has_content_change() is true, but these should be mutually exclusive counts.
  2. [low] In retain_non_cosmetic_changes, the filter logic allows Moved/Renamed/Reordered changes to pass through even when structural_change == Some(false), but the comment says 'Drop purely cosmetic changes'. This creates inconsistency where cosmetic compound changes are retained while cosmetic modifications are dropped.
  3. [low] In orphan_structural_change, the function returns Some(bool) but calls plugin.structural_hash_content() which also returns Option<String>. If the plugin returns None for either before or after content, the function returns None, meaning orphan changes will have structural_change: None instead of a definitive true/false, breaking the assumption that orphans always have structural_change set.
  4. [low] In diff_command for file comparison mode, when content_a or content_b is None (binary), the code creates a FileChange but sets before_content and after_content to the Option values directly. This means binary files will have None content, but the status is always FileStatus::Modified even if one side doesn't exist (should be Added/Deleted).
  5. [low] In parse_unified_diff, the code sets has_binary_content = true when read_file_compare_content returns Ok(None), but this flag is only used in the final check if !has_binary_content && before_content.is_none() && after_content.is_none(). If only the after_content is binary (before is text), the warning will still print because before_content.is_none() is false, even though we detected binary content.
  6. [low] In retain_non_cosmetic_changes, the filter logic allows cosmetic Moved/Renamed/Reordered changes to pass through, but the comment says 'Drop purely cosmetic changes'. This creates inconsistency where cosmetic modifications are dropped but cosmetic moves/renames/reorders are kept, which may not be the intended behavior.
  7. [low] In orphan_structural_change, the function calls plugin.structural_hash_content() with before_content.unwrap_or_default() and after_content.unwrap_or_default(). If both are None, this will compare empty strings and return Some(false), incorrectly marking a deletion or addition as non-structural when it should be structural.

Reviewed by inspect | Entity-level triage found 0 high-risk changes

@Iron-Ham Iron-Ham force-pushed the Iron-Ham/consolidate-diff-output-filtering branch from f16a0df to 507e0be Compare June 3, 2026 21:47
Copy link
Copy Markdown

@inspect-review inspect-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inspect review

Triage: 106 entities analyzed | 0 critical, 0 high, 48 medium, 58 low
Verdict: standard_review

Findings (7)

  1. [low] In recalculate_diff_summary, compound changes (Moved/Renamed/Reordered with content changes) are double-counted in modified_count. The logic adds 1 to both the change-type bucket (moved_count/renamed_count/reordered_count) AND modified_count, but the summary total calculation doesn't account for this double-counting, leading to inflated totals.
  2. [low] In retain_non_cosmetic_changes, the logic now preserves Moved/Renamed/Reordered changes even when structural_change == Some(false), but this contradicts the intent to drop purely cosmetic changes. A cosmetic move/rename/reorder should still be dropped if it has no structural impact.
  3. [low] In diff_command for file comparison mode, when content_a or content_b is None (binary), the code creates a FileChange with status: FileStatus::Modified unconditionally. However, if one side doesn't exist (e.g., comparing /dev/null to a binary file), this should be Added or Deleted, not Modified.
  4. [low] In orphan_structural_change, the function calls plugin.structural_hash_content() with before_content.unwrap_or_default() and after_content.unwrap_or_default(). If both are None (binary files), this will hash empty strings and return Some(false), incorrectly marking binary changes as non-structural.
  5. [low] In read_file_compare_content, the function returns Ok(None) for binary files but returns an Err for UTF-8 conversion errors. The caller in diff_command treats both cases differently: Ok(None) sets has_binary_content = true, but Err(_) is silently ignored with an empty match arm, potentially causing the file to be skipped entirely instead of being reported as binary.
  6. [low] In file_compare_changes call site, when content_a or content_b is None (binary), the code creates a FileChange with both before_content and after_content potentially None, but then passes (vec![change], false). However, the tuple's second element (language_mismatch boolean) is hardcoded to false even though no language comparison was performed. This could mask actual language mismatches in mixed text/binary scenarios.
  7. [low] In orphan_structural_change, the function returns Some(bool) but calls plugin.structural_hash_content() which also returns Option<String>. If the plugin returns None for either before or after hash, the function returns None, meaning structural_change will be None. However, the test orphan_code_change_is_structural expects Some(true), suggesting the code should handle None hashes differently (e.g., treat missing hash as structural change).

Reviewed by inspect | Entity-level triage found 0 high-risk changes

@rs545837 rs545837 merged commit 78c4891 into Ataraxy-Labs:main Jun 4, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants