Conversation
Use a runtime-mapped base for side metadata and make offsets relative. Replace OnceLock with a fast static base and shared init, update address math, and adjust tests/spec layout and docs accordingly. Also add mmap-noreserve-anywhere support and guard MSRV pin in CI scripts.
Avoid 32-bit overflow in mmapper range limit, adjust side metadata sanity expectations, and use small 32-bit test addresses for contiguous conversion tests. Also constrain mmap annotation handling on 32-bit.
Verify the runtime side metadata base is initialized, aligned, and that global metadata addresses fall within the reserved range. Check 64-bit local base offset consistency.
|
I used
I will run dacapo benchmarks. |
into consideration when quarantine side metadata
|
The following is the performance for Immix using 3x G1 min heap: Immix only uses side metadata during GC, and the mean for STW time is 3.2% slowdown. The worst case is around 11% STW time slowdown ( I probably should measure a generational plan which uses log bits side metadata during mutator time. |
verify_side_metadata_sanity() from Plan::new() to MMTK::new() after mmapping side metadata.
|
This is the result for GenImmix (also using 3x G1 min heap): Generally it showed no slowdown for mutator time. The reason is that there is little change to the JIT'd code. The only difference is that at JIT time, we need to load the side metadata address from a variable instead of a constant -- but this only affects JIT time, not run time. See https://github.com/mmtk/mmtk-openjdk/pull/343/changes#diff-27141e7f6636a2ef36ac24f81d7025e231f1b091660a28026edba98de5c1156a. It also showed no slowdown for STW time. I think the reason is that most GCs are nursery GCs ( |
|
I will further look into the micro benchmarks and the Immix performance. |
| } | ||
|
|
||
| /// Performs the translation of data address (`data_addr`) to metadata address for the specified metadata (`metadata_spec`). | ||
| #[inline(always)] |
There was a problem hiding this comment.
This #[inline(always)] directive is necessary. With this inlined, the microbenchmark shows no slowdown on loading side metadata.
side_metadata_load time: [2.6933 µs 2.6940 µs 2.6948 µs]
change: [−68.453% −68.423% −68.397%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) low severe
1 (1.00%) high mild
I am running Immix again, and see if the inline fixes the slowdown for Immix.
Note that I use PGO when building OpenJDK. If the manual inline directive fixes the slowdown for Immix, that means somehow PGO does not inline the key function for side metadata address calculation.
There was a problem hiding this comment.
... With this inlined, the microbenchmark shows no slowdown...
That's right. The compiler can usually figure out what to inline in real-world VM bindings, but for microbenchmarks (cargo bench), manual inlining matters. I have observed this before, and added some annotations in src/util/test_private/mod.rs. I think you can try adding wrappers in the test_private mod that have #[inline(always)] and keep side_metadata/helpers.rs free of inlining annotations. If manual inlining is still necessary for the OpenJDK binding, we can keep #[inline(always)] in helpers.rs.
There was a problem hiding this comment.
The manual inline does not fix the performance for Immix on dacapo: https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-02-24-Tue-031617&benchmark^build^invocation^iteration&time^time.other^time.stw&|10&iteration^1^4|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata|41&Histogram%20(with%20CI)^build^benchmark&
|
binding-refs |
cdd59e9 to
b4c4539
Compare
| // (starting from zero) and add the runtime base address when computing actual addresses. | ||
| pub(crate) const GLOBAL_SIDE_METADATA_BASE_OFFSET: usize = 0; | ||
|
|
||
| static mut SIDE_METADATA_BASE_ADDRESS: Address = Address::ZERO; |
There was a problem hiding this comment.
I replaced this static mut with a OnceLock, and got the following results on biojava which showed no slowdown for this PR.
https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-02-26-Thu-071955&benchmark^build^invocation^iteration^stickyix&time^time.other^time.stw&|10&iteration^1^4&stickyix^1^null|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata
If we use static mut for the base address, Rust needs to repetitively load from the variable every time we access side metadata. I inspected the assembly code of ImmixSpace::trace_object, and in around 100 instructions, there are 3 extra loads for this PR to get the side metadata address.
Intuitively OnceLock should be slower, as get() always checks if the value is initialized or not before loading it. However, if we use oncelock.get().unwrap_unchecked(), it seems the compiler can assume that there is no need to check or branch, and the value will not be changed so there is no need to repetitively load from it.
Using OnceLock seems to fix the slowdown on biojava, and I am running the experiments again for all the benchmarks.
There was a problem hiding this comment.
This is the results for all the benchmarks. With OnceLock, the slowdown for STW time is 3.5% (compared to 4.3% with static mut).
https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-02-26-Thu-225410&benchmark^build^invocation^iteration^stickyix&time^time.other^time.stw&|10&iteration^1^4|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata|41&Histogram%20(with%20CI)^build^benchmark&
Some benchmarks showed large error bars. The following is a rerun (which hasn't finished yet):
https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-03-01-Sun-070849&benchmark^build^invocation^iteration^stickyix&time^time.other^time.stw&|10&iteration^1^4|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata|41&Histogram%20(with%20CI)^build^benchmark&
Both seem suggest with OnceLock, we see slowdown for batik, but for other benchmarks, OnceLock is a performance improvement.
There was a problem hiding this comment.
I will still need to investigate the slowdown.
| // Initialize side metadat sanity first | ||
| plan.verify_side_metadata_sanity(); | ||
| // Then intiialize SFT because it may use side metadata | ||
| plan.initialize_sft(); |
There was a problem hiding this comment.
These two lines use side metadata, so they have to happen after side metadata is initialized. They used to happen in Plan::new(). I extracted them, as at some point, I called initialize_side_metadata() after Plan::new(). So I had to move those two lines after Plan::new() and initialize_side_metadata().
Now initialize_side_metadata() is called before Plan::new(). It is not necessary to have these two lines here. But I think this is still clearer.
wks
left a comment
There was a problem hiding this comment.
I think when quarantining memory, the mmap strategy is always the same. As I suggested in the comments, we may remove some strategy argument and use a fixed strategy for quarantining.
| let addr = unix_common::mmap_anywhere( | ||
| size, | ||
| align, | ||
| strategy.prot(MmapProtection::NoAccess).reserve(false), |
There was a problem hiding this comment.
I think .prot(MmapProtection::NoAccess).reserve(false) should be orthogonal to dzmamap_anywhere. it is ChunkStateMmapper::quarantine_address_range_anywhere that requires the map to be "no access" and "no reserve" because it is "quarantine".
| ); | ||
| let base = if specified_base.is_zero() { | ||
| MMAPPER | ||
| .quarantine_address_range_anywhere(pages, MmapStrategy::SIDE_METADATA, &anno) |
There was a problem hiding this comment.
We probably don't need to pass MmapStrategy::SIDE_METADATA to quarantine_address_range_anywhere because when quarantining, it must have MmapProtection::NoAccess and reserve: true.
| fn quarantine_address_range_anywhere( | ||
| &self, | ||
| pages: usize, | ||
| strategy: MmapStrategy, |
There was a problem hiding this comment.
We don't need the strategy to be passed in. Like the following excerpt from quarantine_address_range, we can construct the strategy locally:
let mmap_strategy = MmapStrategy::default()
.huge_page(huge_page_option)
.prot(MmapProtection::NoAccess)
.reserve(false)
.replace(false);
OS::dzmmap(group_start, group_bytes, mmap_strategy, anno)?;We can probably add a constant in impl MmapStrategy:
pub const QUARANTINE = MmapStrategy::default()
.prot(MmapProtection::NoAccess)
.reserve(false)
.replace(false);
OS::dzmmap(group_start, group_bytes, mmap_strategy, anno)?;so that we can directly use MmapStrategy::QUARANTINE.huge_page(huge_page_option).
| } | ||
|
|
||
| let start = Address::from_mut_ptr(ptr); | ||
| Ok(start.align_up(align)) |
There was a problem hiding this comment.
If we align up the start, the returned value is no longer the starting address of the mmap, and subsequent munmap may have problems. For example, ChunkStateMmapper::quarantine_address_range_anywhere will try to call munmap with the returned address from mmap, and leave memory still mapped at the beginning and the end.
If the mmap is successful, we should either
- immediately
munmapthe extraneous memory range in the beginning and in the end to make the resulting mmap exactlystart...(start+aligned_size), or - return the actual starting address and the
alloc_size, and let the caller do themunmap.
|
A high-level comment: |
This addresses part of the issues with #1351. This PR allows side metadata to be mmapped dynamically, and also allows side metadata to use a fixed address specified in the option
side_metadata_base_address.