Skip to content

Conversation

@copybara-service
Copy link

Reimplement weakref_lru_cache on top of a custom hash map type.

The existing weakref_lru_cache has a number of concurrency problems that are unearthed by tsan and the tests added in this PR, but the key problem that made me decide that the existing approach does not work is the following:

  • if we're using a std::unordered_map<>, then we must protect it by a mutex mu_ that is not the GIL because the GIL can be released during Python equality functions, and std::unordered_map<> would not be happy with this. This implies that the lock order must be mu_ then GIL, otherwise it would not be ok to release and reacquire the GIL.
  • but the tp_traverse GC traversal handler must hold the GIL (or, at least, the Python threads are all in a stopped state because Python GC is stop-the-world, which is much the same as holding the GIL), but would need to acquire mu_ to traverse the std::unordered_map. And since we are in the middle of GC, we cannot release the GIL. So there is no valid lock ordering possible.

This PR takes a different approach:

  • we add a new hash map ReentrantHashMap which is heavily inspired by absl::flat_hash_map.
  • the key features of ReentrantHashMap is that it allows the map to be mutated from the equality function. We use a simple optimistic versioning scheme wherein we detect mutations during equality tests and retry the lookup.
  • this scheme is not robust to equality tests that intentionally mutate the map (e.g., we might loop endlessly) but the goal is to catch incidental mutation because of, e.g., releasing the GIL, and we assume that is improbable.

The existing weakref_lru_cache has a number of concurrency problems that are unearthed by tsan and the tests added in this PR, but the key problem that made me decide that the existing approach does not work is the following:
* if we're using a std::unordered_map<>, then we must protect it by a mutex mu_ that is not the GIL because the GIL can be released during Python equality functions, and std::unordered_map<> would not be happy with this. This implies that the lock order must be mu_ then GIL, otherwise it would not be ok to release and reacquire the GIL.
* but the `tp_traverse` GC traversal handler must hold the GIL (or, at least, the Python threads are all in a stopped state because Python GC is stop-the-world, which is much the same as holding the GIL), but would need to acquire mu_ to traverse the std::unordered_map. And since we are in the middle of GC, we cannot release the GIL. So there is no valid lock ordering possible.

This PR takes a different approach:
* we add a new hash map ReentrantHashMap which is heavily inspired by absl::flat_hash_map.
* the key features of ReentrantHashMap is that it allows the map to be mutated from the equality function. We use a simple optimistic versioning scheme wherein we detect mutations during equality tests and retry the lookup.
* this scheme is not robust to equality tests that intentionally mutate the map (e.g., we might loop endlessly) but the goal is to catch incidental mutation because of, e.g., releasing the GIL, and we assume that is improbable.

PiperOrigin-RevId: 868228444
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant