Dynamic side metadata base address by qinsoon · Pull Request #1449 · mmtk/mmtk-core

qinsoon · 2026-02-19T01:07:49Z

This addresses part of the issues with #1351. This PR allows side metadata to be mmapped dynamically, and also allows side metadata to use a fixed address specified in the option side_metadata_base_address.

Use a runtime-mapped base for side metadata and make offsets relative. Replace OnceLock with a fast static base and shared init, update address math, and adjust tests/spec layout and docs accordingly. Also add mmap-noreserve-anywhere support and guard MSRV pin in CI scripts.

Avoid 32-bit overflow in mmapper range limit, adjust side metadata sanity expectations, and use small 32-bit test addresses for contiguous conversion tests. Also constrain mmap annotation handling on 32-bit.

Verify the runtime side metadata base is initialized, aligned, and that global metadata addresses fall within the reserved range. Check 64-bit local base offset consistency.

qinsoon · 2026-02-19T01:12:57Z

I used cargo bench to quickly test the performance.

side_metadata_address_to_meta_address() is 10x slower. With a constant side metadata base address, this function should be completely optimized away with constant folding. But using a dynamically mapped base address, this function needs to be computed.

SideMetadata::load() is roughly 3x slower.

=== This PR ===
side_metadata_address_translation       time:   [4.7127 µs 4.7142 µs 4.7163 µs]
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) low severe
  3 (3.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe
side_metadata_load      time:   [8.8293 µs 8.8310 µs 8.8328 µs]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild

=== master ===
  side_metadata_address_translation
                        time:   [505.57 ns 505.73 ns 505.88 ns]
                        change: [−89.293% −89.285% −89.276%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  6 (6.00%) low mild
  2 (2.00%) high mild

side_metadata_load      time:   [2.6913 µs 2.7012 µs 2.7121 µs]
                        change: [−69.579% −69.518% −69.453%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 21 outliers among 100 measurements (21.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  19 (19.00%) high severe

I will run dacapo benchmarks.

into consideration when quarantine side metadata

qinsoon · 2026-02-22T23:10:00Z

The following is the performance for Immix using 3x G1 min heap:
https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-02-20-Fri-004537&benchmark^build^invocation^iteration&time^time.other^time.stw&|10&iteration^1^4|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata|41&Histogram%20(with%20CI)^build^benchmark&

Immix only uses side metadata during GC, and the mean for STW time is 3.2% slowdown. The worst case is around 11% STW time slowdown (biojava).

I probably should measure a generational plan which uses log bits side metadata during mutator time.

verify_side_metadata_sanity() from Plan::new() to MMTK::new() after mmapping side metadata.

qinsoon · 2026-02-23T22:13:21Z

This is the result for GenImmix (also using 3x G1 min heap):
https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-02-20-Fri-004537&benchmark^build^invocation^iteration&time^time.other^time.stw&|10&iteration^1^4|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata|41&Histogram%20(with%20CI)^build^benchmark&

Generally it showed no slowdown for mutator time. The reason is that there is little change to the JIT'd code. The only difference is that at JIT time, we need to load the side metadata address from a variable instead of a constant -- but this only affects JIT time, not run time. See https://github.com/mmtk/mmtk-openjdk/pull/343/changes#diff-27141e7f6636a2ef36ac24f81d7025e231f1b091660a28026edba98de5c1156a.

It also showed no slowdown for STW time. I think the reason is that most GCs are nursery GCs (CopySpace), and CopySpace only uses side log bits (forwarding bits/pointers are header metadata). So compared to full heap Immix (which uses a lot of side metadata), generational GCs are less affected by this PR.

qinsoon · 2026-02-23T22:16:32Z

I will further look into the micro benchmarks and the Immix performance.

qinsoon · 2026-02-24T03:20:31Z

src/util/metadata/side_metadata/helpers.rs

 }

 /// Performs the translation of data address (`data_addr`) to metadata address for the specified metadata (`metadata_spec`).
+#[inline(always)]


This #[inline(always)] directive is necessary. With this inlined, the microbenchmark shows no slowdown on loading side metadata.

side_metadata_load time: [2.6933 µs 2.6940 µs 2.6948 µs] change: [−68.453% −68.423% −68.397%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) low severe 1 (1.00%) high mild

I am running Immix again, and see if the inline fixes the slowdown for Immix.

Note that I use PGO when building OpenJDK. If the manual inline directive fixes the slowdown for Immix, that means somehow PGO does not inline the key function for side metadata address calculation.

... With this inlined, the microbenchmark shows no slowdown...

That's right. The compiler can usually figure out what to inline in real-world VM bindings, but for microbenchmarks (cargo bench), manual inlining matters. I have observed this before, and added some annotations in src/util/test_private/mod.rs. I think you can try adding wrappers in the test_private mod that have #[inline(always)] and keep side_metadata/helpers.rs free of inlining annotations. If manual inlining is still necessary for the OpenJDK binding, we can keep #[inline(always)] in helpers.rs.

The manual inline does not fix the performance for Immix on dacapo: https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-02-24-Tue-031617&benchmark^build^invocation^iteration&time^time.other^time.stw&|10&iteration^1^4|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata|41&Histogram%20(with%20CI)^build^benchmark&

qinsoon · 2026-02-24T03:55:09Z

binding-refs
OPENJDK21_BINDING_REPO=qinsoon/mmtk-openjdk
OPENJDK21_BINDING_REF=dynamic-side-metadata-address
OPENJDK11_BINDING_REPO=qinsoon/mmtk-openjdk
OPENJDK11_BINDING_REF=dynamic-side-metadata-address-11
JULIA_BINDING_REPO=qinsoon/mmtk-julia
JULIA_BINDING_REF=dynamic-side-metadata-address
JULIA_VM_REPO=qinsoon/julia
JULIA_VM_REF=no-const-side-metadata-address
RUBY_BINDING_REPO=qinsoon/mmtk-ruby
RUBY_BINDING_REF=dynamic-side-metadata-address
JIKESRVM_BINDING_REPO=qinsoon/mmtk-jikesrvm
JIKESRVM_BINDING_REF=dynamic-side-metadata-address

qinsoon · 2026-02-26T23:26:01Z

src/util/metadata/side_metadata/layout.rs

+// (starting from zero) and add the runtime base address when computing actual addresses.
+pub(crate) const GLOBAL_SIDE_METADATA_BASE_OFFSET: usize = 0;
+
+static mut SIDE_METADATA_BASE_ADDRESS: Address = Address::ZERO;


I replaced this static mut with a OnceLock, and got the following results on biojava which showed no slowdown for this PR.
https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-02-26-Thu-071955&benchmark^build^invocation^iteration^stickyix&time^time.other^time.stw&|10&iteration^1^4&stickyix^1^null|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata

If we use static mut for the base address, Rust needs to repetitively load from the variable every time we access side metadata. I inspected the assembly code of ImmixSpace::trace_object, and in around 100 instructions, there are 3 extra loads for this PR to get the side metadata address.

Intuitively OnceLock should be slower, as get() always checks if the value is initialized or not before loading it. However, if we use oncelock.get().unwrap_unchecked(), it seems the compiler can assume that there is no need to check or branch, and the value will not be changed so there is no need to repetitively load from it.

Using OnceLock seems to fix the slowdown on biojava, and I am running the experiments again for all the benchmarks.

This is the results for all the benchmarks. With OnceLock, the slowdown for STW time is 3.5% (compared to 4.3% with static mut).
https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-02-26-Thu-225410&benchmark^build^invocation^iteration^stickyix&time^time.other^time.stw&|10&iteration^1^4|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata|41&Histogram%20(with%20CI)^build^benchmark&

Some benchmarks showed large error bars. The following is a rerun (which hasn't finished yet):
https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-03-01-Sun-070849&benchmark^build^invocation^iteration^stickyix&time^time.other^time.stw&|10&iteration^1^4|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata|41&Histogram%20(with%20CI)^build^benchmark&

Both seem suggest with OnceLock, we see slowdown for batik, but for other benchmarks, OnceLock is a performance improvement.

I will still need to investigate the slowdown.

use side metadata

qinsoon · 2026-03-09T02:23:37Z

src/mmtk.rs

+        // Initialize side metadat sanity first
+        plan.verify_side_metadata_sanity();
+        // Then intiialize SFT because it may use side metadata
+        plan.initialize_sft();


These two lines use side metadata, so they have to happen after side metadata is initialized. They used to happen in Plan::new(). I extracted them, as at some point, I called initialize_side_metadata() after Plan::new(). So I had to move those two lines after Plan::new() and initialize_side_metadata().

Now initialize_side_metadata() is called before Plan::new(). It is not necessary to have these two lines here. But I think this is still clearer.

wks

I think when quarantining memory, the mmap strategy is always the same. As I suggested in the comments, we may remove some strategy argument and use a fixed strategy for quarantining.

wks · 2026-03-09T08:21:26Z

src/util/os/imp/unix_like/linux_like/linux_common.rs

+    let addr = unix_common::mmap_anywhere(
+        size,
+        align,
+        strategy.prot(MmapProtection::NoAccess).reserve(false),


I think .prot(MmapProtection::NoAccess).reserve(false) should be orthogonal to dzmamap_anywhere. it is ChunkStateMmapper::quarantine_address_range_anywhere that requires the map to be "no access" and "no reserve" because it is "quarantine".

wks · 2026-03-09T08:23:38Z

src/util/metadata/side_metadata/layout.rs

+        );
+        let base = if specified_base.is_zero() {
+            MMAPPER
+                .quarantine_address_range_anywhere(pages, MmapStrategy::SIDE_METADATA, &anno)


We probably don't need to pass MmapStrategy::SIDE_METADATA to quarantine_address_range_anywhere because when quarantining, it must have MmapProtection::NoAccess and reserve: true.

wks · 2026-03-09T08:27:25Z

src/util/heap/layout/mmapper/csm/mod.rs

+    fn quarantine_address_range_anywhere(
+        &self,
+        pages: usize,
+        strategy: MmapStrategy,


We don't need the strategy to be passed in. Like the following excerpt from quarantine_address_range, we can construct the strategy locally:

let mmap_strategy = MmapStrategy::default() .huge_page(huge_page_option) .prot(MmapProtection::NoAccess) .reserve(false) .replace(false); OS::dzmmap(group_start, group_bytes, mmap_strategy, anno)?;

We can probably add a constant in impl MmapStrategy:

pub const QUARANTINE = MmapStrategy::default() .prot(MmapProtection::NoAccess) .reserve(false) .replace(false); OS::dzmmap(group_start, group_bytes, mmap_strategy, anno)?;

so that we can directly use MmapStrategy::QUARANTINE.huge_page(huge_page_option).

wks · 2026-03-09T08:57:38Z

src/util/os/imp/unix_like/unix_common.rs

+    }
+
+    let start = Address::from_mut_ptr(ptr);
+    Ok(start.align_up(align))


If we align up the start, the returned value is no longer the starting address of the mmap, and subsequent munmap may have problems. For example, ChunkStateMmapper::quarantine_address_range_anywhere will try to call munmap with the returned address from mmap, and leave memory still mapped at the beginning and the end.

If the mmap is successful, we should either

immediately munmap the extraneous memory range in the beginning and in the end to make the resulting mmap exactly start...(start+aligned_size), or

return the actual starting address and the alloc_size, and let the caller do the munmap.

wks · 2026-03-09T09:16:29Z

A high-level comment: OnceLock has a method OnceLock::get_unchecked() that can bypass the intermediate Option<&T> from OnceLock::get(). I haven't tried whether it is faster than l.get().unwrap_unchecked(), but it is worth trying.

qinsoon added 6 commits February 9, 2026 00:33

Fix 32-bit side metadata tests

99af24b

Avoid 32-bit overflow in mmapper range limit, adjust side metadata sanity expectations, and use small 32-bit test addresses for contiguous conversion tests. Also constrain mmap annotation handling on 32-bit.

Add test for dynamic side metadata base

3e334a4

Verify the runtime side metadata base is initialized, aligned, and that global metadata addresses fall within the reserved range. Check 64-bit local base offset consistency.

Fix style check

7221c4c

Merge branch 'master' into dynamic-side-metadata-address

522c1da

Add benches to test side metadata address calculation

726dc2e

Move side metadata initilaization to MMTK::new(). Take VM side metadata

39377a5

into consideration when quarantine side metadata

qinsoon added 2 commits February 22, 2026 23:44

Introduce MmapResult

6afda68

Avoid accessing side metadata before we mmap its range. Move

1c5998d

verify_side_metadata_sanity() from Plan::new() to MMTK::new() after mmapping side metadata.

Add an inline directive. Fix benches.

51bb82e

qinsoon commented Feb 24, 2026

View reviewed changes

qinsoon added 3 commits February 24, 2026 03:53

Cleanup

4684773

Minor cleanup

cf2b854

Use wrapping_add to print mmap error

b150200

qinsoon added 3 commits February 24, 2026 04:11

Remove SideMetadataOffset, simply use usize instead.

05e464c

Use serial_test for side metadata tests

0f889f5

Fix doc

b4c4539

qinsoon force-pushed the dynamic-side-metadata-address branch from cdd59e9 to b4c4539 Compare February 24, 2026 04:29

qinsoon added 2 commits February 25, 2026 03:13

Remove inline directive

3f92beb

Merge branch 'master' into dynamic-side-metadata-address

95bedc4

qinsoon commented Feb 26, 2026

View reviewed changes

qinsoon added 4 commits March 1, 2026 23:29

Initialize side metadata for tests

0fe3cd5

Change base address to OnceLock

4657042

Fix setting vm side metadata on 32 bits

1960838

Fix style check

f1f1606

qinsoon added 3 commits March 5, 2026 04:46

Tidy up

9ccc7f5

Fix issues on 32 bits

fa6c317

Skip invalid tests on 32 bits

a4db026

qinsoon added the PR-extended-testing Run extended tests for the pull request label Mar 6, 2026

qinsoon added 3 commits March 6, 2026 02:45

Allow specifying side metadata address

17ebe02

Merge branch 'master' into dynamic-side-metadata-address

4ea8099

Initialize side metadata sanity before setting sft for spaces which may

825c634

use side metadata

qinsoon commented Mar 9, 2026

View reviewed changes

qinsoon marked this pull request as ready for review March 9, 2026 02:26

wks reviewed Mar 9, 2026

View reviewed changes

Conversation

qinsoon commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qinsoon commented Feb 19, 2026

Uh oh!

qinsoon commented Feb 22, 2026

Uh oh!

qinsoon commented Feb 23, 2026

Uh oh!

qinsoon commented Feb 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinsoon commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wks left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wks commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qinsoon commented Feb 19, 2026 •

edited

Loading

qinsoon commented Feb 24, 2026 •

edited

Loading