You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If there happens to be lots of deletes for recently written data that is still in the memtable, it's possible to max out cpu usage of rocksdb when there is many prefix iterators that have to iterate over all the tombstones:
notice we don't have tombstones in SST files:
BUT we do have many tombstones in the active memtable:
Ideally we want something like CompactOnDeletionCollector but for memtables.
There is something very basic implemented for range tombstones: https://github.com/facebook/rocksdb/blob/v9.10.0/db/memtable.h#L835-L837 that triggers auto flush based on number of range tombstones that we can replicate and do the same for number of "regular" tombstones but ideally, we implement a similar semantic as CompactOnDeletionCollector to look at overall ratio of deleted/live keys as well as consecutive tombstones
If we don't want to bother with this, then in a background thread on an interval (every minute or so), I can get the value of rocksdb.num-deletes-active-mem-table and compare that against rocksdb.num-entries-active-mem-table to calculate the overall deletion/total ratio similar to what CompactOnDeletionCollector does for SST files and if that it it past the threshold then trigger a manual flush. I believe this should work but it feels odd to do something like this based on rocksdb metrics. Thoughts?
The text was updated successfully, but these errors were encountered:
If there happens to be lots of deletes for recently written data that is still in the memtable, it's possible to max out cpu usage of rocksdb when there is many prefix iterators that have to iterate over all the tombstones:
notice we don't have tombstones in SST files:
BUT we do have many tombstones in the active memtable:
This was the root cause of #13191 (comment)
Ideally we want something like CompactOnDeletionCollector but for memtables.
There is something very basic implemented for range tombstones: https://github.com/facebook/rocksdb/blob/v9.10.0/db/memtable.h#L835-L837 that triggers auto flush based on number of range tombstones that we can replicate and do the same for number of "regular" tombstones but ideally, we implement a similar semantic as CompactOnDeletionCollector to look at overall ratio of deleted/live keys as well as consecutive tombstones
@cbi42 what do you think? This seems like a very useful feature and I'm happy to at least implement the basic version of this similar to https://github.com/facebook/rocksdb/blob/v9.10.0/db/memtable.h#L835-L837 if you think my analysis is valid and there is no other feature in rocksdb that can do what I want.
The alternative approach:
If we don't want to bother with this, then in a background thread on an interval (every minute or so), I can get the value of
rocksdb.num-deletes-active-mem-table
and compare that againstrocksdb.num-entries-active-mem-table
to calculate the overall deletion/total ratio similar to what CompactOnDeletionCollector does for SST files and if that it it past the threshold then trigger a manual flush. I believe this should work but it feels odd to do something like this based on rocksdb metrics. Thoughts?The text was updated successfully, but these errors were encountered: