Skip to content

Add set operations to @immut/hash{map, set} and @internal/sparse_array Summary #2145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

Asterless
Copy link

Related Issue

Fixes #1830: Efficient set operations for @immut/hash{map, set}

Changes

@immut/hashmap

  • union_with(f: (K, V, V) => V)
    Merges two hashmaps, resolving key conflicts with a custom function f.
  • intersection()
    Returns a new hashmap containing keys present in both input maps, with values from the first map.
  • intersection_with(f: (K, V, V) => V)
    Computes intersection, resolving overlapping keys' values with function f.
  • difference()
    Returns entries present in the first map but not in the second.

@immut/hashset

  • intersection()
    Returns a new set containing elements common to both input sets.
  • difference()
    Returns elements present in the first set but not in the second.

@internal/sparse_array

  • intersection()
    Computes index-wise intersection of two sparse arrays.
  • difference()
    Computes index-wise difference between two sparse arrays.

Motivation

These changes provide a more complete and consistent set of set operations for immutable collections, making it easier to perform common set algebra tasks and improving API parity across collection types.

Tests

Added and updated unit tests for all new and modified methods to ensure correctness and expected behavior.

Checklist

All new and existing tests pass
Code is formatted and documented where appropriate

Copy link

peter-jerry-ye-code-review bot commented May 21, 2025

Potential memory inefficiency in intersection implementation

Category
Correctness
Code Snippet
pub fn[K : Eq + Hash] intersection(self : T[K], other : T[K]) -> T[K] {
match (self, other) {
(Branch(sa1), Branch(sa2)) => {
let res = sa1.intersection(sa2, fn(m1, m2) { m1.intersection(m2) })
if res.size() == 0 {
Empty
} else {
Branch(res)
}
}
}
}
Recommendation
Consider creating the Branch variant only after verifying the intersection has elements:

(Branch(sa1), Branch(sa2)) => {
  let res = sa1.intersection(sa2, fn(m1, m2) { m1.intersection(m2) })
  match res.size() {
    0 => Empty
    _ => Branch(res)
  }
}

Reasoning
The current implementation creates a Branch variant unnecessarily when the intersection is empty. While functionally correct, this creates an intermediate allocation that is immediately discarded.

Missing documentation for type parameters in function signatures

Category
Maintainability
Code Snippet
pub fn[K : Eq + Hash, V] intersection_with(
self : T[K, V],
other : T[K, V],
f : (K, V, V) -> V
) -> T[K, V]
Recommendation
Add documentation explaining the meaning of type parameters and function parameters:

///| Computes the intersection of two hashmaps
/// K: Key type that must implement Eq + Hash
/// V: Value type
/// f: Function to resolve conflicts between values
pub fn[K : Eq + Hash, V] intersection_with(...)

Reasoning
While the function names are descriptive, documenting type parameters helps users understand the requirements and constraints of generic functions, improving API usability.

Multiple traversals in union_with implementation

Category
Performance
Code Snippet
pub fn[K : Eq + Hash, V] union_with(...) {
match (self, other) {
(_, _) =>
self
.iter()
.fold(init=other, fn(m, kv) {
match m.get(kv.0) {...}
})
}
}
Recommendation
Consider implementing a single-pass merge algorithm that processes both maps simultaneously rather than iterating and looking up values separately
Reasoning
The current fallback case iterates through one map while doing lookups in the other, leading to O(n log n) complexity instead of possible O(n) with a single-pass merge approach.

@coveralls
Copy link
Collaborator

coveralls commented May 21, 2025

Pull Request Test Coverage Report for Build 6935

Details

  • 52 of 103 (50.49%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.4%) to 92.018%

Changes Missing Coverage Covered Lines Changed/Added Lines %
immut/internal/sparse_array/sparse_array.mbt 15 16 93.75%
immut/hashset/HAMT.mbt 7 26 26.92%
immut/hashmap/HAMT.mbt 30 61 49.18%
Totals Coverage Status
Change from base Build 6934: -0.4%
Covered Lines: 8738
Relevant Lines: 9496

💛 - Coveralls

@Asterless Asterless marked this pull request as ready for review May 21, 2025 21:39
@bobzhang bobzhang force-pushed the feature/20250522_HAMT branch from 44c0a79 to eb6ef82 Compare May 22, 2025 01:12
@peter-jerry-ye peter-jerry-ye requested a review from Lampese May 22, 2025 02:46
Asterless and others added 2 commits May 22, 2025 11:10
Add set operations to @immut/hash{map, set} and @internal/sparse_array Summary and re-fmt
@bobzhang bobzhang requested a review from Guest0x0 May 25, 2025 01:00
other : T[K, V],
f : (K, V, V) -> V
) -> T[K, V] {
match (self, other) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just make it a method pub fn[K : Eq + Hash, V] T::union_with(

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!😊 It's just that I don't quite understand why this was done. When I added this method, I observed that other methods didn't make such modifications

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Efficient set operations for @immut/hash{map, set}
4 participants