You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Maps have a union_with_key method, like union but with a callback to decide what to do when a key exists on both sides:
pubfnunion_with_key<F>(self,other:Self,mutf:F) -> SelfwhereF:FnMut(&K,V,V) -> V
Both HashMap and OrdMap are internally immutable trees with sharable copy-on-write nodes. Rc::make_mut is used to mutate a node, cloning it if it was previously shared. When many possibly-large maps exist that are created by cloning a previous map and making a possibly-small number of changes, this sharing can be very significant and minimizing the number of make_mut calls can be impactful.
Would you be open to adding a new API for this?
Stage 1
The implementation of union_with_key consumes and iterates one the input map, while it mutates then returns the other one. Because the signature of the callback involves owned V values however, it needs to always call remove then insert. This leads to more make_mut calls than necessary when the value that was already in the map being mutated ends up being used.
Adding a new method where the callback takes borrowed values would help in that case:
pubfnunion_with_merge<F>(self,other:Self,mutmerge:F) -> SelfwhereF:FnMut(&K,/* left: */&V,/* right: */&V) -> MergeResult<V>{for(key, right_value)in other {matchself.get(&key){// get() does not use make_mut() where remove() doesNone => {self.insert(key, right_value);}Some(left_value) => {matchmerge(&key, left_value,&right_value){MergeResult::UseLeftValue => {},// No insert() hereMergeResult::UseRightValue => {self.insert(key, right_value);},MergeResult::UseNewValue(new_value) => {self.insert(key, new_value);},}}}}self}
(Omitted: swapping the two maps to mutate the larger one: #163. That swap is where MergeResult::UseRightValue becomes useful.)
Stage 2
This is a bit fuzzy. More of an intuition that something might be possible, than a fully-formed plan.
Writing this issue is motivated by an optimization effort of an algorithm in Mercurial (copy tracing). Currently that algorithm does the equivalent of Stage 1 above when one of the two maps to merge is much (2×) larger than the other. When they are of similar size it goes even further to maximize re-use: call OrdMap::diff and collect all respective key-value pairs that we’d need to insert to transform each of the two maps into a merged map. Only then we pick the side with the smaller set of changes.
Since #113diff short-circuits shared internal subtrees based on ptr_eq. So more node sharing make diff faster and avoids redundant merge computations, which in turn increases sharing in a virtuous cycle.
Still, collecting two sets of changes before applying one of them feels like more work than necessary.
A union_with_merge method in the im crate could potentially walk internal trees directly instead of using ConsumingIter, merge one sub-tree at a time, short-circuit shared sub-trees based on ptr_eq, and… handwave… have some better algorithm than the one based of OrdMap::diff.
Walking internal trees while mutating them to insert merged value and maintaining all invariants would likely be tricky (diff is comparatively easier since read-only). But does this sound possible at all?
The text was updated successfully, but these errors were encountered:
Maps have a
union_with_key
method, likeunion
but with a callback to decide what to do when a key exists on both sides:Both
HashMap
andOrdMap
are internally immutable trees with sharable copy-on-write nodes.Rc::make_mut
is used to mutate a node, cloning it if it was previously shared. When many possibly-large maps exist that are created by cloning a previous map and making a possibly-small number of changes, this sharing can be very significant and minimizing the number ofmake_mut
calls can be impactful.Would you be open to adding a new API for this?
Stage 1
The implementation of
union_with_key
consumes and iterates one the input map, while it mutates then returns the other one. Because the signature of the callback involves ownedV
values however, it needs to always callremove
theninsert
. This leads to moremake_mut
calls than necessary when the value that was already in the map being mutated ends up being used.Adding a new method where the callback takes borrowed values would help in that case:
(Omitted: swapping the two maps to mutate the larger one: #163. That swap is where
MergeResult::UseRightValue
becomes useful.)Stage 2
This is a bit fuzzy. More of an intuition that something might be possible, than a fully-formed plan.
Writing this issue is motivated by an optimization effort of an algorithm in Mercurial (copy tracing). Currently that algorithm does the equivalent of Stage 1 above when one of the two maps to merge is much (2×) larger than the other. When they are of similar size it goes even further to maximize re-use: call
OrdMap::diff
and collect all respective key-value pairs that we’d need to insert to transform each of the two maps into a merged map. Only then we pick the side with the smaller set of changes.Since #113
diff
short-circuits shared internal subtrees based onptr_eq
. So more node sharing makediff
faster and avoids redundant merge computations, which in turn increases sharing in a virtuous cycle.Still, collecting two sets of changes before applying one of them feels like more work than necessary.
A
union_with_merge
method in theim
crate could potentially walk internal trees directly instead of usingConsumingIter
, merge one sub-tree at a time, short-circuit shared sub-trees based onptr_eq
, and… handwave… have some better algorithm than the one based ofOrdMap::diff
.Walking internal trees while mutating them to insert merged value and maintaining all invariants would likely be tricky (
diff
is comparatively easier since read-only). But does this sound possible at all?The text was updated successfully, but these errors were encountered: