Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add robin_map/set::erase_fast() method (fixes #75) #76

Merged
merged 3 commits into from
Apr 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 39 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,6 @@ int main() {
}
```


#### Serialization

The library provides an efficient way to serialize and deserialize a map or a set so that it can be saved to a file or send through the network.
Expand Down Expand Up @@ -478,6 +477,45 @@ int main() {
}
```

#### Performance pitfalls

Two potential performance pitfalls involving `tsl::robin_map` and
`tsl::robin_set` are noteworthy:

1. *Bad hashes*. Hash functions that produce many collisions can lead to the
following surprising behavior: when the number of collisions exceeds a
certain threshold, the hash table will automatically expand to fix the
problem. However, in degenerate cases, this expansion might have _no effect_
on the collision count, causing a failure mode where a linear sequence of
insertion leads to exponential storage growth.

This case has mainly been observed when using the default power-of-two
growth strategy with the default STL `std::hash<T>` for arithmetic types
`T`, which is often an identity! See issue
[#39](https://github.com/Tessil/robin-map/issues/39) for an example. The
solution is simple: use a better hash function and/or `tsl::robin_pg_set` /
`tsl::robin_pg_map`.

2. *Element erasure and low load factors*. `tsl::robin_map` and
`tsl::robin_set` mirror the STL map/set API, which exposes an `iterator
erase(iterator)` method that removes an element at a certain position,
returning a valid iterator that points to the next element.

Constructing this new iterator object requires walking to the next nonempty
bucket in the table, which can be a expensive operation when the hash table
has a low *load factor* (i.e., when `capacity()` is much larger then
`size()`).

The `erase()` method furthermore never shrinks & re-hashes the table as
this is not permitted by the specification of this function. A linear
sequence of random removals without intermediate insertions can then lead to
a degenerate case with quadratic runtime cost.

In such cases, an iterator return value is often not even needed, so the
cost is entirely unnecessary. Both `tsl::robin_set` and `tsl::robin_map`
therefore provide an alternative erasure method `void erase_fast(iterator)`
that does not return an iterator to avoid having to find the next element.

### License

The code is licensed under the MIT license, see the [LICENSE file](LICENSE) for details.
9 changes: 5 additions & 4 deletions include/tsl/robin_hash.h
Original file line number Diff line number Diff line change
Expand Up @@ -820,6 +820,10 @@ class robin_hash : private Hash, private KeyEqual, private GrowthPolicy {
return try_emplace(std::forward<K>(key), std::forward<Args>(args)...).first;
}

void erase_fast(iterator pos) {
erase_from_bucket(pos);
}

/**
* Here to avoid `template<class K> size_type erase(const K& key)` being used
* when we use an `iterator` instead of a `const_iterator`.
Expand All @@ -836,8 +840,6 @@ class robin_hash : private Hash, private KeyEqual, private GrowthPolicy {
++pos;
}

m_try_shrink_on_next_insert = true;

return pos;
}

Expand Down Expand Up @@ -916,8 +918,6 @@ class robin_hash : private Hash, private KeyEqual, private GrowthPolicy {
auto it = find(key, hash);
if (it != end()) {
erase_from_bucket(it);
m_try_shrink_on_next_insert = true;

return 1;
} else {
return 0;
Expand Down Expand Up @@ -1211,6 +1211,7 @@ class robin_hash : private Hash, private KeyEqual, private GrowthPolicy {
previous_ibucket = ibucket;
ibucket = next_bucket(ibucket);
}
m_try_shrink_on_next_insert = true;
}

template <class K, class... Args>
Expand Down
8 changes: 8 additions & 0 deletions include/tsl/robin_map.h
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,14 @@ class robin_map {
}
size_type erase(const key_type& key) { return m_ht.erase(key); }

/**
* Erase the element at position 'pos'. In contrast to the regular erase()
* function, erase_fast() does not return an iterator. This allows it to be
* faster especially in hash tables with a low load factor, where finding the
* next nonempty bucket would be costly.
*/
void erase_fast(iterator pos) { return m_ht.erase_fast(pos); }

/**
* Use the hash value 'precalculated_hash' instead of hashing the key. The
* hash value should be the same as hash_function()(key). Useful to speed-up
Expand Down
8 changes: 8 additions & 0 deletions include/tsl/robin_set.h
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,14 @@ class robin_set {
}
size_type erase(const key_type& key) { return m_ht.erase(key); }

/**
* Erase the element at position 'pos'. In contrast to the regular erase()
* function, erase_fast() does not return an iterator. This allows it to be
* faster especially in hash sets with a low load factor, where finding the
* next nonempty bucket would be costly.
*/
void erase_fast(iterator pos) { return m_ht.erase_fast(pos); }

/**
* Use the hash value 'precalculated_hash' instead of hashing the key. The
* hash value should be the same as hash_function()(key). Useful to speed-up
Expand Down
11 changes: 11 additions & 0 deletions tests/robin_map_tests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1448,4 +1448,15 @@ BOOST_AUTO_TEST_CASE(test_precalculated_hash) {
BOOST_CHECK_EQUAL(map.erase(4, map.hash_function()(2)), 0);
}

BOOST_AUTO_TEST_CASE(test_erase_fast) {
using Map = tsl::robin_map<int, int>;
Map map;
map.emplace(4, 5);
auto it = map.find(4);
BOOST_CHECK(it != map.end());
map.erase_fast(it);
BOOST_CHECK(map.size() == 0);
}


BOOST_AUTO_TEST_SUITE_END()
10 changes: 10 additions & 0 deletions tests/robin_set_tests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -161,4 +161,14 @@ BOOST_AUTO_TEST_CASE(test_serialize_deserialize) {
BOOST_CHECK(set_deserialized == set);
}

BOOST_AUTO_TEST_CASE(test_erase_fast) {
using Set = tsl::robin_set<int>;
Set set;
set.emplace(4);
auto it = set.find(4);
BOOST_CHECK(it != set.end());
set.erase_fast(it);
BOOST_CHECK(set.size() == 0);
}

BOOST_AUTO_TEST_SUITE_END()
Loading