Skip to content

Conversation

@scott2000
Copy link
Contributor

#1176

I split this PR off of #7692 because I think these changes could be reviewed separately to figure out the storage of conflict labels before we go into the specifics of adding the labels for each type of conflict and rendering them in conflict markers. I added some justification for this approach in the description of the first commit, but I'd be open to a different approach if reviewers think it would work better.

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (cli/src/config-schema.json)
  • I have added/updated tests to cover my changes

@scott2000 scott2000 requested a review from a team as a code owner October 26, 2025 02:34
@scott2000 scott2000 force-pushed the scott2000/conflict-labels-storage branch 3 times, most recently from f795177 to f04b616 Compare October 31, 2025 23:27
@scott2000
Copy link
Contributor Author

Hey @martinvonz, could you take a look and let me know your thoughts on this approach when you get a chance?

@scott2000 scott2000 force-pushed the scott2000/conflict-labels-storage branch 2 times, most recently from 6307b55 to 7398991 Compare November 2, 2025 03:12
Comment on lines +162 to +163
/// The labels for the sides of the merged tree
labels: ConflictLabels,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit strange to add labels to MergedTreeId. Would it make sense to add them to Commit instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested out adding them to Commit instead of MergedTreeId, but I think it makes the code more complicated since we need to separately keep track of the labels and pass them down everywhere. In many places, we currently assume that a MergedTreeId can be read as a MergedTree from the store without any additional information, so it's a lot easier if MergedTreeId contains the same labels as MergedTree.

For instance, MergedTreeBuilder returns a MergedTreeId, but it internally reads trees and calls MergedTree::resolve, so we would need MergedTreeBuilder::new to take ConflictLabels as an extra argument, and we'd need MergedTreeBuilder::write_tree to return BackendResult<(MergedTreeId, ConflictLabels)>, or otherwise we would need to convert it to work directly with MergedTree instead of MergedTreeId.

Is there a reason why MergedTree needs to contain Merge<Tree> instead of Merge<TreeId>? Most of the methods end up recursively reading subtrees anyway, so it might be nice to be able to get a MergedTree object without having to read the root trees first. If that were the case, maybe we could use MergedTree in most of the places we previously used MergedTreeId (e.g. MergedTreeBuilder, CommitBuilder, diff editors, etc.), and then replace MergedTreeId with a plain Merge<TreeId> everywhere else? This would make Commit::tree synchronous and infallible, since it wouldn't have to read the root trees from the store.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about at least creating a new type wrapping MergedTreeId and ConflictLabels? My problem with making ConflictLabels part of MergedTreeId is that they're not logically part of the ID.

Is there a reason why MergedTree needs to contain Merge<Tree> instead of Merge<TreeId>?

I think we often have the trees already so maybe that's why. It may still be a good idea to make the change you suggest. The caching in Store means that it should be cheap to re-read a tree, while reading a tree we would otherwise not need is of course not as cheap (to be clear: this is an argument for your suggestion).

By the way, I was also wondering if it makes sense for MergedTree::sub_tree() and similar to return a Merge<Tree> instead of a MergeTree. Then the conflict labels would have to be passed around separately, but an advantage might be that they wouldn't be passed around by value as much, so we would probably not need the Arc. It might be much more work, though, so perhaps it's a bad idea.

Another possible abstraction is to replace ConflictLabels by something like LabeledMerge<T> { merge: Merge<T>, labels: Merge<String> }. It's probably not a good idea.

Sorry that I'm leaving you with more questions than answers :) No need to test out all my suggestions. The only think I feel somewhat strongly about is that we shouldn't embed the conflicts in MergedTreeId since that makes it more than an ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about at least creating a new type wrapping MergedTreeId and ConflictLabels? My problem with making ConflictLabels part of MergedTreeId is that they're not logically part of the ID.

Yeah, that might be a good idea. I do agree that it's confusing that MergedTreeId contains conflict labels. Maybe if we had Merge<TreeId> as the basic ID type, then LabeledMergedTreeId adding labels, and then LabeledMergedTree with Merge<Tree> and labels it could work? But the naming is getting a bit unwieldy then. Let me play around with different options and see if I can find something that works well.

By the way, I was also wondering if it makes sense for MergedTree::sub_tree() and similar to return a Merge<Tree> instead of a MergeTree.

Yeah that might be nice. In that case maybe we could also migrate some of the methods on MergedTree that rely on the trees already being read to be on Merge<Tree> instead, which would make it easier to convert MergedTree to contain Merge<TreeId>.

Another possible abstraction is to replace ConflictLabels by something like LabeledMerge<T> { merge: Merge<T>, labels: Merge<String> }. It's probably not a good idea.

Yeah, that sounds similar to something I had tried before, but it could be good (I think I had tried adding labels as an optional extra type argument to Merge).

Sorry that I'm leaving you with more questions than answers :) No need to test out all my suggestions.

No, thanks for all the suggestions! These are all really helpful actually.

Adding this as a separate type will help maintain the invariants that
resolved merges cannot have labels, and labels cannot be the empty
string. I also added `Arc` since the labels will often need to be
cloned, such as in `MergedTree::sub_tree`, `MergedTree::id`, and when
reading and writing root trees.

I think that storing separate conflict labels for each term of the
conflict is the best approach for a couple reasons. Mainly, I think it
integrates well with the existing conflict algebra. For instance, a diff
of (A - B) and a diff of (B - C) can be easily combined to create a new
diff of (A - C), and if we associate a label with each term, then the
labels will also naturally be carried over as well. Also, I think it
would be simpler to implement than other approaches (such as storing
labels for diffs instead of terms), since conflict labels can re-use
existing logic from `Merge<T>`.

For simplicity, I also think we shouldn't allow mixing labeled terms and
unlabeled terms (i.e. if any term doesn't have a label, then we discard
all labels and leave the entire merge unlabeled). I think it could be
confusing to have conflicts where, for instance, one side says "rebase
destination" and another side only says "side #2" with no further
information. In cases like these, I think it's better to just fall back
to the old labels. In the future, I expect that most conflicts should
have labels (since we should eventually be adding labels everywhere
conflicts can happen).
Since two merged trees can now have the same contents but different
conflict labels, many places that previously compared `MergedTreeId` for
equality now need to ignore the conflict labels. To solve this, I added
a `MergedTreeId::has_changes` method to check whether the underlying
`Merge<TreeId>` has any changes.
To implement simplification of conflict labels, I decided to add more
functions such as `zip` and `unzip` to `Merge`. I think these functions
could be useful in other situations so I thought this was a nice
solution, but an alternative solution could be to make
`get_simplified_mapping` and `apply_simplified_mapping` public and
manually apply the same mapping to both merges.
The old method is renamed to `MergedTree::merge_unlabeled` to make it
easy to find unmigrated callers. The goal is that almost all callers
will eventually use `MergedTree::merge` to add labels, unless the
resulting tree is never visible to the user.
Conflict labels are stored in a separate header for backwards
compatibility.
@scott2000 scott2000 force-pushed the scott2000/conflict-labels-storage branch from 7398991 to 0d3d525 Compare November 2, 2025 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants