Skip to content

Conversation

@zebreus
Copy link
Contributor

@zebreus zebreus commented Nov 20, 2025

This PR ensures that multiple threads don't access the functions for registering and deregistering frames at the same time, which can lead to crashes, as the functions are not guaranteed to be thread-safe (at least I couldn't find any documentation stating they were).

It also no longer deregisters frames during Drop, which caused issues when exiting the wasmer. Instead, deregistrations are now queued in a GLOBAL_UNREGISTRY and processed before the next batch of registrations.

Fixes #5877

@zebreus
Copy link
Contributor Author

zebreus commented Nov 20, 2025

Testing this is a bit dumb. I ran while target/debug/wasmer --disable-cache cowsay aa --registry=wasmer.io ; do : ; done for 10 minutes, and it didn't crash. Before this PR it would usually abort after 5-10 runs.

Copilot finished reviewing on behalf of zebreus November 20, 2025 16:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses thread safety issues during frame registration/deregistration by introducing a global mutex-protected registry. The key change is deferring frame deregistration from the Drop implementation to avoid crashes during program shutdown and concurrent access.

Key Changes:

  • Introduces GLOBAL_UNREGISTRY mutex to serialize frame registration/deregistration operations
  • Defers actual deregistration by queuing frames in Drop instead of immediately calling deregister functions
  • Processes queued deregistrations before each new registration batch

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 162 to 163
self.compact_unwind_mgr.deregister();
}
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The self.compact_unwind_mgr.deregister() call is inside the loop but appears to be called on the same manager object for each item in the unregistry. This means if there are N registrations in the unregistry, deregister() will be called N times on the same manager. This seems incorrect - it should likely be called once after the loop, similar to how self.compact_unwind_mgr.register() is called once in the publish() method (line 143). Consider moving this call outside the loop.

Suggested change
self.compact_unwind_mgr.deregister();
}
}
self.compact_unwind_mgr.deregister();

Copilot uses AI. Check for mistakes.
// TODO: I am not 100% sure if it's correct to never deregister the `.eh_frame` section. It was this way before
// I started working on this, so I kept it that way.
unsafe {
compact_unwind::__unw_add_dynamic_eh_frame_section(eh_frame.as_ptr() as usize);
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On macOS aarch64, the code registers frames via __unw_add_dynamic_eh_frame_section at line 170, but unlike the non-macOS path (line 245), it doesn't add the registration to self.registrations. This means when Drop is called (line 294), self.registrations will be empty on macOS, so nothing gets queued in the unregistry for later deregistration. This could lead to the registered frames never being deregistered. Consider adding the registration to self.registrations after line 170, similar to how it's done at line 245 for non-macOS platforms.

Suggested change
compact_unwind::__unw_add_dynamic_eh_frame_section(eh_frame.as_ptr() as usize);
let registration = eh_frame.as_ptr() as usize;
compact_unwind::__unw_add_dynamic_eh_frame_section(registration);
self.registrations.push(registration);

Copilot uses AI. Check for mistakes.
@zebreus
Copy link
Contributor Author

zebreus commented Nov 21, 2025

As the code stands before this PR, we never unregister frames on macOS aarch64. This is caused by us only calling deregister on the CompactUnwindManager if there are entries in registrations, but the macOS aarch64 path never adds entries to that.

Copy link
Contributor

@marxin marxin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a smart idea @zebreus, haven't thought about it!

Testing this is a bit dumb. I ran while target/debug/wasmer --disable-cache cowsay aa --registry=wasmer.io ; do : ; done for 10 minutes, and it didn't crash. Before this PR it would usually abort after 5-10 runs.

Yeah, I was basically doing the very same, don't have anything better. I've got an idea how to improvement the implementation, let me do it in the inline comments.

use crate::types::unwind::CompiledFunctionUnwindInfoReference;

/// Represents a registry of function unwind information for System V ABI.
pub struct UnwindRegistry {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about splitting the type into 2: UnpublishedUnwindRegistry and PublishedUnwindRegistry?

#[cfg(not(all(target_os = "macos", target_arch = "aarch64")))]
static GLOBAL_UNREGISTRY: Mutex<Vec<usize>> = Mutex::new(Vec::new());
#[cfg(all(target_os = "macos", target_arch = "aarch64"))]
static GLOBAL_UNREGISTRY: Mutex<(Vec<usize>, Vec<compact_unwind::CompactUnwindManager>)> =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be then unconditional on target: Mutex<Vec<PublishedUnwindRegistry>>.

#[cfg(all(target_os = "macos", target_arch = "aarch64"))]
{
// Acquire the unregistry lock to avoid interleaved registrations/deregistrations.
let mut unregistry = GLOBAL_UNREGISTRY.lock().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we would iterate the PublishedUnwindRegistry, and call PublishedUnwindRegistry::unregister?

fn drop(&mut self) {
if self.published {
unsafe {
let mut unregistry = GLOBAL_UNREGISTRY.lock().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this will simply push to the GLOBAL_UNREGISTRY vector.

@zebreus
Copy link
Contributor Author

zebreus commented Nov 21, 2025

However, the code in this PR causes segfaults in the tests. I can't reproduce it outside of running the whole test suite, but it happens at https://github.com/gcc-mirror/gcc/blob/master/libgcc/unwind-dw2-fde.c#L310, so I assume the pointer is invalid by the time we unregister it.

Which makes sense, as the pointer effectively points into the &[u8] passed to the register frame. Technically, the previous implementation was also unsafe, since that buffer is not guaranteed to remain alive while the drop is executing. That may even be the original issue.

@zebreus
Copy link
Contributor Author

zebreus commented Nov 21, 2025

Copying the eh_frame into a leaked heap allocation in register_frame solves the segfault introduced by my fix.

@zebreus
Copy link
Contributor Author

zebreus commented Nov 26, 2025

I dont really know a good way forward. My best guess would be to restore the previous behaviour of deregistering in drop and add a handler into atexit on linux to set a global atomic flag. And check that global before deregistering. Not sure if the order of that works out, we need to test this well.

@marxin
Copy link
Contributor

marxin commented Nov 26, 2025

Alternative approach #5912.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cargo r --features=llvm -p wasmer-cli run --disable-cache cowsay aa - crashes occasionally

3 participants