-
Notifications
You must be signed in to change notification settings - Fork 774
conflicts: add conflict labels to MergedTree and store in backends
#7850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f795177 to
f04b616
Compare
|
Hey @martinvonz, could you take a look and let me know your thoughts on this approach when you get a chance? |
6307b55 to
7398991
Compare
| /// The labels for the sides of the merged tree | ||
| labels: ConflictLabels, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a bit strange to add labels to MergedTreeId. Would it make sense to add them to Commit instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested out adding them to Commit instead of MergedTreeId, but I think it makes the code more complicated since we need to separately keep track of the labels and pass them down everywhere. In many places, we currently assume that a MergedTreeId can be read as a MergedTree from the store without any additional information, so it's a lot easier if MergedTreeId contains the same labels as MergedTree.
For instance, MergedTreeBuilder returns a MergedTreeId, but it internally reads trees and calls MergedTree::resolve, so we would need MergedTreeBuilder::new to take ConflictLabels as an extra argument, and we'd need MergedTreeBuilder::write_tree to return BackendResult<(MergedTreeId, ConflictLabels)>, or otherwise we would need to convert it to work directly with MergedTree instead of MergedTreeId.
Is there a reason why MergedTree needs to contain Merge<Tree> instead of Merge<TreeId>? Most of the methods end up recursively reading subtrees anyway, so it might be nice to be able to get a MergedTree object without having to read the root trees first. If that were the case, maybe we could use MergedTree in most of the places we previously used MergedTreeId (e.g. MergedTreeBuilder, CommitBuilder, diff editors, etc.), and then replace MergedTreeId with a plain Merge<TreeId> everywhere else? This would make Commit::tree synchronous and infallible, since it wouldn't have to read the root trees from the store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about at least creating a new type wrapping MergedTreeId and ConflictLabels? My problem with making ConflictLabels part of MergedTreeId is that they're not logically part of the ID.
Is there a reason why
MergedTreeneeds to containMerge<Tree>instead ofMerge<TreeId>?
I think we often have the trees already so maybe that's why. It may still be a good idea to make the change you suggest. The caching in Store means that it should be cheap to re-read a tree, while reading a tree we would otherwise not need is of course not as cheap (to be clear: this is an argument for your suggestion).
By the way, I was also wondering if it makes sense for MergedTree::sub_tree() and similar to return a Merge<Tree> instead of a MergeTree. Then the conflict labels would have to be passed around separately, but an advantage might be that they wouldn't be passed around by value as much, so we would probably not need the Arc. It might be much more work, though, so perhaps it's a bad idea.
Another possible abstraction is to replace ConflictLabels by something like LabeledMerge<T> { merge: Merge<T>, labels: Merge<String> }. It's probably not a good idea.
Sorry that I'm leaving you with more questions than answers :) No need to test out all my suggestions. The only think I feel somewhat strongly about is that we shouldn't embed the conflicts in MergedTreeId since that makes it more than an ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about at least creating a new type wrapping
MergedTreeIdandConflictLabels? My problem with makingConflictLabelspart ofMergedTreeIdis that they're not logically part of the ID.
Yeah, that might be a good idea. I do agree that it's confusing that MergedTreeId contains conflict labels. Maybe if we had Merge<TreeId> as the basic ID type, then LabeledMergedTreeId adding labels, and then LabeledMergedTree with Merge<Tree> and labels it could work? But the naming is getting a bit unwieldy then. Let me play around with different options and see if I can find something that works well.
By the way, I was also wondering if it makes sense for
MergedTree::sub_tree()and similar to return aMerge<Tree>instead of aMergeTree.
Yeah that might be nice. In that case maybe we could also migrate some of the methods on MergedTree that rely on the trees already being read to be on Merge<Tree> instead, which would make it easier to convert MergedTree to contain Merge<TreeId>.
Another possible abstraction is to replace
ConflictLabelsby something likeLabeledMerge<T> { merge: Merge<T>, labels: Merge<String> }. It's probably not a good idea.
Yeah, that sounds similar to something I had tried before, but it could be good (I think I had tried adding labels as an optional extra type argument to Merge).
Sorry that I'm leaving you with more questions than answers :) No need to test out all my suggestions.
No, thanks for all the suggestions! These are all really helpful actually.
Adding this as a separate type will help maintain the invariants that resolved merges cannot have labels, and labels cannot be the empty string. I also added `Arc` since the labels will often need to be cloned, such as in `MergedTree::sub_tree`, `MergedTree::id`, and when reading and writing root trees. I think that storing separate conflict labels for each term of the conflict is the best approach for a couple reasons. Mainly, I think it integrates well with the existing conflict algebra. For instance, a diff of (A - B) and a diff of (B - C) can be easily combined to create a new diff of (A - C), and if we associate a label with each term, then the labels will also naturally be carried over as well. Also, I think it would be simpler to implement than other approaches (such as storing labels for diffs instead of terms), since conflict labels can re-use existing logic from `Merge<T>`. For simplicity, I also think we shouldn't allow mixing labeled terms and unlabeled terms (i.e. if any term doesn't have a label, then we discard all labels and leave the entire merge unlabeled). I think it could be confusing to have conflicts where, for instance, one side says "rebase destination" and another side only says "side #2" with no further information. In cases like these, I think it's better to just fall back to the old labels. In the future, I expect that most conflicts should have labels (since we should eventually be adding labels everywhere conflicts can happen).
Since two merged trees can now have the same contents but different conflict labels, many places that previously compared `MergedTreeId` for equality now need to ignore the conflict labels. To solve this, I added a `MergedTreeId::has_changes` method to check whether the underlying `Merge<TreeId>` has any changes.
To implement simplification of conflict labels, I decided to add more functions such as `zip` and `unzip` to `Merge`. I think these functions could be useful in other situations so I thought this was a nice solution, but an alternative solution could be to make `get_simplified_mapping` and `apply_simplified_mapping` public and manually apply the same mapping to both merges.
The old method is renamed to `MergedTree::merge_unlabeled` to make it easy to find unmigrated callers. The goal is that almost all callers will eventually use `MergedTree::merge` to add labels, unless the resulting tree is never visible to the user.
Conflict labels are stored in a separate header for backwards compatibility.
7398991 to
0d3d525
Compare
#1176
I split this PR off of #7692 because I think these changes could be reviewed separately to figure out the storage of conflict labels before we go into the specifics of adding the labels for each type of conflict and rendering them in conflict markers. I added some justification for this approach in the description of the first commit, but I'd be open to a different approach if reviewers think it would work better.
Checklist
If applicable:
CHANGELOG.mdREADME.md,docs/,demos/)cli/src/config-schema.json)