-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force snapshot under memory pressure #25685
Comments
@pauldix few thoughts/queries,
At a high level we have 2 options as you suggested there,
I'm wondering if we could use process memory usage (because it's less intrusive) - I think this probably masks real buffer usage and it could be the caches using most of the memory, however it's fairly straight forward to trigger the snapshot based on whole process memory usage. If we'd like to still proceed with the memory tracking for Would you still like to proceed with tracking memory for |
So the So you could get a size by walking the databases and tables and summing it all together. That might be a good place to start. The tricky bit is it won't be exact and ultimately it's the process memory that matters since that's what'll trigger an OOM kill. So I'm honestly not sure what would be the best approach here. The other thing about process memory is that it can spike if there's some expensive query and we wouldn't want that triggering a premature persistence. So you'd need to track some sort of moving average if using that. I think, use the |
Ok - thanks, do we need to check the size of last cache/meta cache as well as |
We don't need to take the size of those caches into account for this. |
Is it OK to flush the WAL buffer when we force the snapshot? I can then follow the same path as |
I think forcing a wal flush is fine |
- The core of the change is to introduce another method `force_flush_buffer` in `Wal` trait. This gives a handle to choose when to kick off snapshot. - A higher level background loop is introduced that checks the overall table buffer size every `N` seconds and if it is greater than a threshold (`X`) then it calls `force_flush_buffer`. Both `N` and `X` are configurable through cli. `N` defaults to 10s and `X` defaults to 70% - Some refactoring of the code went on to make sure the calls made via `Wal` trait to flush buffer and cleanup any snapshot is reused across both branches (forcing snapshot and normal wal buffer flush) closes: #25685
- The main change is to detach wal and snapshot, in a way all 3 of the following things can happen - flush the wal buffer only (already handled, before this commit) - flush wal buffer and snapshot (already handled, before this commit) - snapshot without flushing wal buffer (introduced in this commit) This is achieved by introducing another method `snapshot` in `WalFileNotifier` trait. The main dependency between wal and snapshot is the `wal_file_number`, since this is tracked in `SnapshotTracker` separately we can switch to using `SnapshotTracker`'s `last_wal_file_number` instead of the one that comes through the `WalContents`. - A higher level background loop is introduced that checks the overall table buffer size every `N` seconds and if it is greater than a threshold (`X`) then it calls `snapshot` method. Both `N` and `X` are configurable through cli. `N` defaults to 10s and `X` defaults to 70% - Some refactoring of code so that existing methods can be reused when only snapshotting closes: #25685
- The main change is to detach wal and snapshot, in a way all 3 of the following things can happen - flush the wal buffer only (already handled, before this commit) - flush wal buffer and snapshot (already handled, before this commit) - snapshot without flushing wal buffer (introduced in this commit) This is achieved by introducing another method `snapshot` in `WalFileNotifier` trait. The main dependency between wal and snapshot is the `wal_file_number`, since this is tracked in `SnapshotTracker` separately we can switch to using `SnapshotTracker`'s `last_wal_file_number` instead of the one that comes through the `WalContents`. - A higher level background loop is introduced that checks the overall table buffer size every `N` seconds and if it is greater than a threshold (`X`) then it calls `snapshot` method. Both `N` and `X` are configurable through cli. `N` defaults to 10s and `X` defaults to 70% - Some refactoring of code so that existing methods can be reused when only snapshotting closes: #25685
- The main change is to detach wal and snapshot, in a way all 3 of the following things can happen - flush the wal buffer only (already handled, before this commit) - flush wal buffer and snapshot (already handled, before this commit) - snapshot without flushing wal buffer (introduced in this commit) This is achieved by introducing another method `snapshot` in `WalFileNotifier` trait. The main dependency between wal and snapshot is the `wal_file_number`, since this is tracked in `SnapshotTracker` separately we can switch to using `SnapshotTracker`'s `last_wal_file_number` instead of the one that comes through the `WalContents`. - A higher level background loop is introduced that checks the overall table buffer size every `N` seconds and if it is greater than a threshold (`X`) then it calls `snapshot` method. Both `N` and `X` are configurable through cli. `N` defaults to 10s and `X` defaults to 70% - Some refactoring of code so that existing methods can be reused when only snapshotting closes: #25685
- The main change is to detach wal and snapshot, in a way all 3 of the following things can happen - flush the wal buffer only (already handled, before this commit) - flush wal buffer and snapshot (already handled, before this commit) - snapshot without flushing wal buffer (introduced in this commit) This is achieved by introducing another method `snapshot` in `WalFileNotifier` trait. The main dependency between wal and snapshot is the `wal_file_number`, since this is tracked in `SnapshotTracker` separately we can switch to using `SnapshotTracker`'s `last_wal_file_number` instead of the one that comes through the `WalContents`. - A higher level background loop is introduced that checks the overall table buffer size every `N` seconds and if it is greater than a threshold (`X`) then it calls `snapshot` method. Both `N` and `X` are configurable through cli. `N` defaults to 10s and `X` defaults to 70% - Some refactoring of code so that existing methods can be reused when only snapshotting closes: #25685
- The main change is to detach wal and snapshot, in a way all 3 of the following things can happen - flush the wal buffer only (already handled, before this commit) - flush wal buffer and snapshot (already handled, before this commit) - snapshot without flushing wal buffer (introduced in this commit) This is achieved by introducing another method `snapshot` in `WalFileNotifier` trait. The main dependency between wal and snapshot is the `wal_file_number`, since this is tracked in `SnapshotTracker` separately we can switch to using `SnapshotTracker`'s `last_wal_sequence_number` instead of the one that comes through the `WalContents`. - A higher level background loop is introduced that checks the overall table buffer size every `N` seconds and if it is greater than a threshold (`X`) then it calls `snapshot` method. Both `N` and `X` are configurable through cli. `N` defaults to 10s and `X` defaults to 70% - Some refactoring of code so that existing methods can be reused when only snapshotting closes: #25685
- The main change is to detach wal and snapshot, in a way all 3 of the following things can happen - flush the wal buffer only (already handled, before this commit) - flush wal buffer and snapshot (already handled, before this commit) - snapshot without flushing wal buffer (introduced in this commit) This is achieved by introducing another method `snapshot` in `WalFileNotifier` trait. The main dependency between wal and snapshot is the `wal_file_number`, since this is tracked in `SnapshotTracker` separately we can switch to using `SnapshotTracker`'s `last_wal_sequence_number` instead of the one that comes through the `WalContents`. - A higher level background loop is introduced that checks the overall table buffer size every `N` seconds and if it is greater than a threshold (`X`) then it calls `snapshot` method. Both `N` and `X` are configurable through cli. `N` defaults to 10s and `X` defaults to 70% - Some refactoring of code so that existing methods can be reused when only snapshotting closes: #25685
By default, the server will only snapshot after it receives 900 WAL files and it only tries to snapshot 600 of them. Under high load this can lead to runaway memory use in the buffer.
We'll need a configuration option to set a memory threshold for when we should force a snapshot. Ideally, the threshold would be based on the size of the
QueryableBuffer
, but that might be inaccurate and expensive to measure. We could also use the process memory size. Let's try the QueryableBuffer option and see how it works.Set the default value to 70% of whatever the detected system memory is.
We should have a background task that checks every 10s to see if the queryable buffer has hit this threshold. If it has, we should force a snapshot in which everything in the queryable buffer will be persisted to Parquet and all WAL files will be cleared out.
We can develop something a little more precise and less like a sledgehammer later.
The text was updated successfully, but these errors were encountered: