Skip to content

Use page tracking for snapshot and restore #683

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

simongdavies
Copy link
Contributor

@simongdavies simongdavies commented Jul 1, 2025

This pull request introduces snapshotting and restoring of sandbox state using dirty page tracking rather than copying the entire memory state each time, in many/most scenarios where sandboxes have larger than default amounts of memory this results in better perforamnce and reduced memory usage.

However, due to inefficiencies in the way dirty page tracking works on mshv and the fact that the implementation of tracking is not done for Windows this is not universally true.

Included below are screenshots showing the difference in time taken for some benchmarks with and without dirty page tracking enabled, along with some explanations of the differences seen.

The changes comprise:

Host Shared Memory Dirty Page tracking

  • Added a custom signal handler for SIGSEGV to support dirty page tracking for host memory mapped into a VM. Updated documentation to reflect this change and added debugging instructions for handling SIGSEGV in GDB and LLDB.

VM Dirty page tracking

  • Enabled dirty page tracking for mshv and KVM drivers to track changes made to memory in the guest

Snapshot Management

  • Added a new snapshot manager module to create, manage and restore memory snapshots.

Benchmarking Enhancements

  • Refactored the guest_call_benchmark_large_param function to support benchmarks for multiple parameter sizes and added a new sandbox_heap_size_benchmark function to measure sandbox creation performance with varying heap sizes.
  • Introduced guest_call_heap_size_benchmark to evaluate guest function call performance with different heap sizes.

Performance Changes

KVM Before

kvm-before

KVM After

kvm-after

KVM shows massive positive changes in performance when creating large sandboxes and calling functions in sandboxes with large memory configurations , although not measured the amount of memory consumed should have reduced considerably. In the scenario where large parameters are passed to the sandbox there is a degradation of performance, this has not been investigated yet, it may be due to the page based mechanism for saving/restoring data being more expensive than copying and restoring all the data. Other work to make large parameter passing more efficient will likely have a large positive impact here as well.

There is some regression in performance for small/default sandbox sizes which is more than likely caused by the overhead of tracking and building/restoring page based snapshots for small memory vs. copying and restoring the entire memory.

mshv2 Before

mshv2-before

mshv2 After

mshv2-after

mshv3 Before

mshv3-before

mshv3 After

mshv-after

Both mshv2 and mshv3 show similar patterns to KVM in that the larger sandbox sizes show large improvements, the default/small sandboxes show regressions and the large parameter sizes show regressions, possibly for the same reason that the KVM one does , again no investigation done here yet.

The biggest difference between KVM and mshv is for two reasons, first when enabling dirty page tracking on mshv the first call to get dirty pages results in a returned bitmap showing all pages dirty, since this would cause us to snapshot all memory as a baseline after enabling dirty page tracking this PR gets the dirty pages immediately and then discards the result. This approach means that we have to make 2 calls to get dirty pages when we really only need one. The impact of this is quite large, especially with larger memory configurations since the call to get dirty pages seems to be O(n) where n is the number of pages in the memory configuration (regardless of if the pages are dirty or not), for a 950mb VM I observed ~1.4ms response for this call, however, this approach with larger sandboxes is still much quicker than copying all the memory. Fixing these issues in mshv will probably bring the performance much closer to KVM.

Windows 2025 Before

windows-before

Windows 2025 After

windows-after

Windows is largely either the same performance or has regressed, this is because the Windows implementation has not been done yet, at the moment each time dirt pages are requested Windows reports that all pages are dirty and the snapshots/restores are done on that basis, there is some overhead of this approach especially when restoring

@simongdavies simongdavies added the kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. label Jul 1, 2025
@simongdavies simongdavies linked an issue Jul 1, 2025 that may be closed by this pull request
@simongdavies simongdavies force-pushed the update-snapshot-and-restore branch 5 times, most recently from 7e85120 to 014eab3 Compare July 3, 2025 15:48
@simongdavies simongdavies changed the title WIP: Use page tracking for snapshot and restore Use page tracking for snapshot and restore Jul 3, 2025
@simongdavies simongdavies force-pushed the update-snapshot-and-restore branch from 014eab3 to 56fd983 Compare July 3, 2025 16:43
Copy link
Contributor

@ludfjig ludfjig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me. The signal handling of SIGSEGV makes me a little nervous, and there seems to be an assumption that Vec<MemoryRegions> are always consecutive in memory, which i think we might break in the future. I also think a refactor of our memory management code would be nice 😅 .

I also remember I had a bunch of tests for evolve/devolve in my previous dirty-pages PR that maybe could be useful to test logic. Those tests were the ones that allowed me to catch the bug in mshv's implementation :P

Also, these regression seems pretty big (compared to main branch)

image

image

for page_idx in bit_index_iterator(&bitmap) {
page_indices.push(current_page + page_idx);
}
current_page += num_pages;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are making an assumption here that all memory regions are consecutive with no gaps. I think Vec<MemoryRegion> has this invariant, but not sure if documented anywhere

shared_mem.with_exclusivity(|e| e.copy_to_slice(&mut buffer, 0))??;
} else {
// Sort pages for deterministic ordering and to enable consecutive page optimization
dirty_pages.sort_unstable();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this sort is redundant


// Collect dirty pages and sort them for consecutive page optimization
let mut dirty_pages: Vec<usize> = bit_index_iterator(dirty_bitmap).collect();
dirty_pages.sort_unstable();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this sort is redundant

pub(super) fn create_new_snapshot<S: SharedMemory>(
&mut self,
shared_mem: &mut S,
dirty_page_map: Option<&Vec<u64>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think(?) this option can also be removed

@@ -606,6 +607,12 @@ impl Hypervisor for HypervWindowsDriver {
Ok(())
}

fn get_and_clear_dirty_pages(&mut self) -> Result<Vec<u64>> {
// For now we just mark all pages dirty which is the equivalent of taking a full snapshot
let total_size = self.mem_regions.iter().map(|r| r.guest_region.len()).sum();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ludfjig ludfjig force-pushed the update-snapshot-and-restore branch 3 times, most recently from 3838d0a to 411a668 Compare July 8, 2025 18:36
Signed-off-by: Simon Davies <[email protected]>
- Implemented `LinuxDirtyPageTracker` for tracking dirty pages in Linux.
  - Utilizes SIGSEGV to detect writes and marks pages as dirty.
  - Supports concurrent access and ensures memory protection.
  - Includes comprehensive tests for various scenarios including overlap detection and concurrent writes.

- Added `WindowsDirtyPageTracker` as a placeholder for Windows dirty page tracking.
  - Currently marks all pages as dirty until further implementation is completed.
  - Includes basic structure and initialization logic.

Signed-off-by: Simon Davies <[email protected]>
…ge tracking after memory is allocated and stopping it once the uninitialized sandbox is evolved

Signed-off-by: Simon Davies <[email protected]>
… offset methods to be public

Signed-off-by: Simon Davies <[email protected]>
…managing memory snapshots and update related methods for improved state management

Signed-off-by: Simon Davies <[email protected]>
… shared_memory_snapshot_manager for improved snapshot management

Signed-off-by: Simon Davies <[email protected]>
@ludfjig ludfjig force-pushed the update-snapshot-and-restore branch from 411a668 to 1231cc6 Compare July 8, 2025 20:59
@ludfjig ludfjig force-pushed the update-snapshot-and-restore branch from 1231cc6 to 9d4d90a Compare July 8, 2025 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement For PRs adding features, improving functionality, docs, tests, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance when creating "larger" Sanboxes
2 participants