conflicts: add conflict labels to `MergedTree` and store in backends #7850

scott2000 · 2025-10-26T02:34:27Z

I split this PR off of #7692 because I think these changes could be reviewed separately to figure out the storage of conflict labels before we go into the specifics of adding the labels for each type of conflict and rendering them in conflict markers. I added some justification for this approach in the description of the first commit, but I'd be open to a different approach if reviewers think it would work better.

Checklist

If applicable:

I have updated CHANGELOG.md
I have updated the documentation (README.md, docs/, demos/)
I have updated the config schema (cli/src/config-schema.json)
I have added/updated tests to cover my changes

scott2000 · 2025-10-31T23:27:53Z

Hey @martinvonz, could you take a look and let me know your thoughts on this approach when you get a chance?

lib/src/content_hash.rs

martinvonz · 2025-11-02T07:03:32Z

lib/src/backend.rs

+    /// The labels for the sides of the merged tree
+    labels: ConflictLabels,


It seems a bit strange to add labels to MergedTreeId. Would it make sense to add them to Commit instead?

I tested out adding them to Commit instead of MergedTreeId, but I think it makes the code more complicated since we need to separately keep track of the labels and pass them down everywhere. In many places, we currently assume that a MergedTreeId can be read as a MergedTree from the store without any additional information, so it's a lot easier if MergedTreeId contains the same labels as MergedTree.

For instance, MergedTreeBuilder returns a MergedTreeId, but it internally reads trees and calls MergedTree::resolve, so we would need MergedTreeBuilder::new to take ConflictLabels as an extra argument, and we'd need MergedTreeBuilder::write_tree to return BackendResult<(MergedTreeId, ConflictLabels)>, or otherwise we would need to convert it to work directly with MergedTree instead of MergedTreeId.

Is there a reason why MergedTree needs to contain Merge<Tree> instead of Merge<TreeId>? Most of the methods end up recursively reading subtrees anyway, so it might be nice to be able to get a MergedTree object without having to read the root trees first. If that were the case, maybe we could use MergedTree in most of the places we previously used MergedTreeId (e.g. MergedTreeBuilder, CommitBuilder, diff editors, etc.), and then replace MergedTreeId with a plain Merge<TreeId> everywhere else? This would make Commit::tree synchronous and infallible, since it wouldn't have to read the root trees from the store.

What do you think about at least creating a new type wrapping MergedTreeId and ConflictLabels? My problem with making ConflictLabels part of MergedTreeId is that they're not logically part of the ID.

Is there a reason why MergedTree needs to contain Merge<Tree> instead of Merge<TreeId>?

I think we often have the trees already so maybe that's why. It may still be a good idea to make the change you suggest. The caching in Store means that it should be cheap to re-read a tree, while reading a tree we would otherwise not need is of course not as cheap (to be clear: this is an argument for your suggestion).

By the way, I was also wondering if it makes sense for MergedTree::sub_tree() and similar to return a Merge<Tree> instead of a MergeTree. Then the conflict labels would have to be passed around separately, but an advantage might be that they wouldn't be passed around by value as much, so we would probably not need the Arc. It might be much more work, though, so perhaps it's a bad idea.

Another possible abstraction is to replace ConflictLabels by something like LabeledMerge<T> { merge: Merge<T>, labels: Merge<String> }. It's probably not a good idea.

Sorry that I'm leaving you with more questions than answers :) No need to test out all my suggestions. The only think I feel somewhat strongly about is that we shouldn't embed the conflicts in MergedTreeId since that makes it more than an ID.

What do you think about at least creating a new type wrapping MergedTreeId and ConflictLabels? My problem with making ConflictLabels part of MergedTreeId is that they're not logically part of the ID.

Yeah, that might be a good idea. I do agree that it's confusing that MergedTreeId contains conflict labels. Maybe if we had Merge<TreeId> as the basic ID type, then LabeledMergedTreeId adding labels, and then LabeledMergedTree with Merge<Tree> and labels it could work? But the naming is getting a bit unwieldy then. Let me play around with different options and see if I can find something that works well.

By the way, I was also wondering if it makes sense for MergedTree::sub_tree() and similar to return a Merge<Tree> instead of a MergeTree.

Yeah that might be nice. In that case maybe we could also migrate some of the methods on MergedTree that rely on the trees already being read to be on Merge<Tree> instead, which would make it easier to convert MergedTree to contain Merge<TreeId>.

Another possible abstraction is to replace ConflictLabels by something like LabeledMerge<T> { merge: Merge<T>, labels: Merge<String> }. It's probably not a good idea.

Yeah, that sounds similar to something I had tried before, but it could be good (I think I had tried adding labels as an optional extra type argument to Merge).

Sorry that I'm leaving you with more questions than answers :) No need to test out all my suggestions.

No, thanks for all the suggestions! These are all really helpful actually.

Adding this as a separate type will help maintain the invariants that resolved merges cannot have labels, and labels cannot be the empty string. I also added `Arc` since the labels will often need to be cloned, such as in `MergedTree::sub_tree`, `MergedTree::id`, and when reading and writing root trees. I think that storing separate conflict labels for each term of the conflict is the best approach for a couple reasons. Mainly, I think it integrates well with the existing conflict algebra. For instance, a diff of (A - B) and a diff of (B - C) can be easily combined to create a new diff of (A - C), and if we associate a label with each term, then the labels will also naturally be carried over as well. Also, I think it would be simpler to implement than other approaches (such as storing labels for diffs instead of terms), since conflict labels can re-use existing logic from `Merge<T>`. For simplicity, I also think we shouldn't allow mixing labeled terms and unlabeled terms (i.e. if any term doesn't have a label, then we discard all labels and leave the entire merge unlabeled). I think it could be confusing to have conflicts where, for instance, one side says "rebase destination" and another side only says "side #2" with no further information. In cases like these, I think it's better to just fall back to the old labels. In the future, I expect that most conflicts should have labels (since we should eventually be adding labels everywhere conflicts can happen).

Since two merged trees can now have the same contents but different conflict labels, many places that previously compared `MergedTreeId` for equality now need to ignore the conflict labels. To solve this, I added a `MergedTreeId::has_changes` method to check whether the underlying `Merge<TreeId>` has any changes.

To implement simplification of conflict labels, I decided to add more functions such as `zip` and `unzip` to `Merge`. I think these functions could be useful in other situations so I thought this was a nice solution, but an alternative solution could be to make `get_simplified_mapping` and `apply_simplified_mapping` public and manually apply the same mapping to both merges.

The old method is renamed to `MergedTree::merge_unlabeled` to make it easy to find unmigrated callers. The goal is that almost all callers will eventually use `MergedTree::merge` to add labels, unless the resulting tree is never visible to the user.

Conflict labels are stored in a separate header for backwards compatibility.

scott2000 requested a review from a team as a code owner October 26, 2025 02:34

scott2000 force-pushed the scott2000/conflict-labels-storage branch 3 times, most recently from f795177 to f04b616 Compare October 31, 2025 23:27

scott2000 force-pushed the scott2000/conflict-labels-storage branch 2 times, most recently from 6307b55 to 7398991 Compare November 2, 2025 03:12

martinvonz reviewed Nov 2, 2025

View reviewed changes

scott2000 added 6 commits November 2, 2025 07:38

git_backend: store conflict labels in commit header

83b2419

Conflict labels are stored in a separate header for backwards compatibility.

protos: add conflict labels to working copy and simple backend

0d3d525

scott2000 force-pushed the scott2000/conflict-labels-storage branch from 7398991 to 0d3d525 Compare November 2, 2025 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

conflicts: add conflict labels to `MergedTree` and store in backends #7850

conflicts: add conflict labels to `MergedTree` and store in backends #7850

Uh oh!

scott2000 commented Oct 26, 2025

Uh oh!

scott2000 commented Oct 31, 2025

Uh oh!

Uh oh!

martinvonz Nov 2, 2025

Uh oh!

scott2000 Nov 2, 2025

Uh oh!

martinvonz Nov 3, 2025

Uh oh!

scott2000 Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		/// The labels for the sides of the merged tree
		labels: ConflictLabels,

conflicts: add conflict labels to MergedTree and store in backends #7850

Are you sure you want to change the base?

conflicts: add conflict labels to MergedTree and store in backends #7850

Uh oh!

Conversation

scott2000 commented Oct 26, 2025

Checklist

Uh oh!

scott2000 commented Oct 31, 2025

Uh oh!

Uh oh!

martinvonz Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

scott2000 Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

martinvonz Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

scott2000 Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

conflicts: add conflict labels to `MergedTree` and store in backends #7850

conflicts: add conflict labels to `MergedTree` and store in backends #7850