chore(report): add PR diff report generator #3

nowNick · 2025-09-16T10:47:49Z

This commit adds a Python script that generates
much more verbose PR Diff.

The results of these reports are presented in Kong's repository:

[DO NOT MERGE] chore(*): side by side PR diffs Kong/kong#14756

This commit adds a Python script that generates much more verbose PR Diff.

samugi · 2025-09-16T13:11:43Z

pr_diff.py

+                # Fallback if exact match fails
+                file_path = lines[0].split()[2][2:]


is this an expected scenario? Or should we log something / panic?

Yeah, you're right - we should panic if it doesn't match.

samugi · 2025-09-16T13:23:06Z

pr_diff.py

+
+class Hunk:
+    def __init__(self, text: str):
+        self.text = text.strip()


if I followed everything correctly, text here begins with a hunk header like: @@ -5,3 +5,6 @@. What happens if two diffs are identical but (for example due do the PRs being based on different branches) are applied on different lines of the files? Would that mistakenly count as a different diff then? Or have I missed something?

Other than that, using the md5 checksum should work fine as a strategy to compare hunks, as long as the two diffs are generated with the same number of lines of context around the changes, which appears to be 3 (non configurable) for gh pr diff.

I'm wondering if we should limit the comparison to the actual changes, similarly to what was happening before:

gh-compr/gh-compr

Lines 79 to 80 in 3785a2d

pr1NoCtxDiff=$(echo "$pr1Diff" | grep -v '^[^+-]')

pr2NoCtxDiff=$(echo "$pr2Diff" | grep -v '^[^+-]')

You are correct - however I'm not sure if we could reliably tell if given diff is the same or not. There certainly are the cases where "left" hunk is the same as the "right" hunk but only line numbers differ - but can we be sure in that case that these should be aligned? I think there are two scenarios:

Some lines were added at the top of "right" hunk so the diff is the same but line numbers differ.

The "right" hunk appears in a completely different place in the file and results in different execution flow.

I'm not sure how to distinguish between these two 🤔

I'm not sure either at the moment. But this seems to happen quite often, especially when comparing backports, so we really need to find a solution. Otherwise, many diffs will be marked as different even when they (logically) aren't.

This change makes it easier to identify identical diffs (100% similarity cases). However, I'm worried that false positives would create confusion, since the similarity score would appear "wrong".

The current code might be vulnerable to 2 if I remember correctly. I think - maybe - that if the purpose is to validate that nothing was missed/forgotten between two PR, it could be good enough to check that there are no missing hunks, and trust the developers that they were fit in the right place. WDYT?

I agree - we can tackle the 2 a bit better. I'm experimenting with the algorithm. I've added hunk similarity comparison based on + / - lines. Here's an example:
Kong/kong#14756 (comment)

What do you think?

We might also add sort of fuzzy matching but I'm not sure if that won't hinder fidelity of the score 🤔

chore(report): add PR diff report generator

1f1417f

This commit adds a Python script that generates much more verbose PR Diff.

samugi reviewed Sep 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

chore(report): add PR diff report generator #3

chore(report): add PR diff report generator #3

Uh oh!

nowNick commented Sep 16, 2025

Uh oh!

samugi Sep 16, 2025

Uh oh!

nowNick Sep 19, 2025

Uh oh!

samugi Sep 16, 2025

Uh oh!

nowNick Sep 19, 2025

Uh oh!

samugi Sep 22, 2025

Uh oh!

nowNick Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Fallback if exact match fails
		file_path = lines[0].split()[2][2:]

	pr1NoCtxDiff=$(echo "$pr1Diff" \| grep -v '^[^+-]')
	pr2NoCtxDiff=$(echo "$pr2Diff" \| grep -v '^[^+-]')

Uh oh!

chore(report): add PR diff report generator #3

Are you sure you want to change the base?

chore(report): add PR diff report generator #3

Uh oh!

Conversation

nowNick commented Sep 16, 2025

Uh oh!

samugi Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

nowNick Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

samugi Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

nowNick Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

samugi Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

nowNick Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants