Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: auto flush active memtable when there is many tombstones #13308

Open
zaidoon1 opened this issue Jan 17, 2025 · 0 comments
Open

Comments

@zaidoon1
Copy link
Contributor

zaidoon1 commented Jan 17, 2025

If there happens to be lots of deletes for recently written data that is still in the memtable, it's possible to max out cpu usage of rocksdb when there is many prefix iterators that have to iterate over all the tombstones:

Image

notice we don't have tombstones in SST files:
Image

BUT we do have many tombstones in the active memtable:

Image

This was the root cause of #13191 (comment)

Ideally we want something like CompactOnDeletionCollector but for memtables.

There is something very basic implemented for range tombstones: https://github.com/facebook/rocksdb/blob/v9.10.0/db/memtable.h#L835-L837 that triggers auto flush based on number of range tombstones that we can replicate and do the same for number of "regular" tombstones but ideally, we implement a similar semantic as CompactOnDeletionCollector to look at overall ratio of deleted/live keys as well as consecutive tombstones

@cbi42 what do you think? This seems like a very useful feature and I'm happy to at least implement the basic version of this similar to https://github.com/facebook/rocksdb/blob/v9.10.0/db/memtable.h#L835-L837 if you think my analysis is valid and there is no other feature in rocksdb that can do what I want.

The alternative approach:

If we don't want to bother with this, then in a background thread on an interval (every minute or so), I can get the value of rocksdb.num-deletes-active-mem-table and compare that against rocksdb.num-entries-active-mem-table to calculate the overall deletion/total ratio similar to what CompactOnDeletionCollector does for SST files and if that it it past the threshold then trigger a manual flush. I believe this should work but it feels odd to do something like this based on rocksdb metrics. Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant