-
Notifications
You must be signed in to change notification settings - Fork 922
Attempt to prevent crashes during exit due to deregistering frames on exit #5893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Testing this is a bit dumb. I ran |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses thread safety issues during frame registration/deregistration by introducing a global mutex-protected registry. The key change is deferring frame deregistration from the Drop implementation to avoid crashes during program shutdown and concurrent access.
Key Changes:
- Introduces
GLOBAL_UNREGISTRYmutex to serialize frame registration/deregistration operations - Defers actual deregistration by queuing frames in
Dropinstead of immediately calling deregister functions - Processes queued deregistrations before each new registration batch
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self.compact_unwind_mgr.deregister(); | ||
| } |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The self.compact_unwind_mgr.deregister() call is inside the loop but appears to be called on the same manager object for each item in the unregistry. This means if there are N registrations in the unregistry, deregister() will be called N times on the same manager. This seems incorrect - it should likely be called once after the loop, similar to how self.compact_unwind_mgr.register() is called once in the publish() method (line 143). Consider moving this call outside the loop.
| self.compact_unwind_mgr.deregister(); | |
| } | |
| } | |
| self.compact_unwind_mgr.deregister(); |
| // TODO: I am not 100% sure if it's correct to never deregister the `.eh_frame` section. It was this way before | ||
| // I started working on this, so I kept it that way. | ||
| unsafe { | ||
| compact_unwind::__unw_add_dynamic_eh_frame_section(eh_frame.as_ptr() as usize); |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On macOS aarch64, the code registers frames via __unw_add_dynamic_eh_frame_section at line 170, but unlike the non-macOS path (line 245), it doesn't add the registration to self.registrations. This means when Drop is called (line 294), self.registrations will be empty on macOS, so nothing gets queued in the unregistry for later deregistration. This could lead to the registered frames never being deregistered. Consider adding the registration to self.registrations after line 170, similar to how it's done at line 245 for non-macOS platforms.
| compact_unwind::__unw_add_dynamic_eh_frame_section(eh_frame.as_ptr() as usize); | |
| let registration = eh_frame.as_ptr() as usize; | |
| compact_unwind::__unw_add_dynamic_eh_frame_section(registration); | |
| self.registrations.push(registration); |
|
As the code stands before this PR, we never unregister frames on macOS aarch64. This is caused by us only calling deregister on the CompactUnwindManager if there are entries in |
marxin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a smart idea @zebreus, haven't thought about it!
Testing this is a bit dumb. I ran while target/debug/wasmer --disable-cache cowsay aa --registry=wasmer.io ; do : ; done for 10 minutes, and it didn't crash. Before this PR it would usually abort after 5-10 runs.
Yeah, I was basically doing the very same, don't have anything better. I've got an idea how to improvement the implementation, let me do it in the inline comments.
| use crate::types::unwind::CompiledFunctionUnwindInfoReference; | ||
|
|
||
| /// Represents a registry of function unwind information for System V ABI. | ||
| pub struct UnwindRegistry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about splitting the type into 2: UnpublishedUnwindRegistry and PublishedUnwindRegistry?
| #[cfg(not(all(target_os = "macos", target_arch = "aarch64")))] | ||
| static GLOBAL_UNREGISTRY: Mutex<Vec<usize>> = Mutex::new(Vec::new()); | ||
| #[cfg(all(target_os = "macos", target_arch = "aarch64"))] | ||
| static GLOBAL_UNREGISTRY: Mutex<(Vec<usize>, Vec<compact_unwind::CompactUnwindManager>)> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be then unconditional on target: Mutex<Vec<PublishedUnwindRegistry>>.
| #[cfg(all(target_os = "macos", target_arch = "aarch64"))] | ||
| { | ||
| // Acquire the unregistry lock to avoid interleaved registrations/deregistrations. | ||
| let mut unregistry = GLOBAL_UNREGISTRY.lock().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we would iterate the PublishedUnwindRegistry, and call PublishedUnwindRegistry::unregister?
| fn drop(&mut self) { | ||
| if self.published { | ||
| unsafe { | ||
| let mut unregistry = GLOBAL_UNREGISTRY.lock().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this will simply push to the GLOBAL_UNREGISTRY vector.
|
However, the code in this PR causes segfaults in the tests. I can't reproduce it outside of running the whole test suite, but it happens at https://github.com/gcc-mirror/gcc/blob/master/libgcc/unwind-dw2-fde.c#L310, so I assume the pointer is invalid by the time we unregister it. Which makes sense, as the pointer effectively points into the &[u8] passed to the register frame. Technically, the previous implementation was also unsafe, since that buffer is not guaranteed to remain alive while the drop is executing. That may even be the original issue. |
|
Copying the |
|
I dont really know a good way forward. My best guess would be to restore the previous behaviour of deregistering in |
|
Alternative approach #5912. |
This PR ensures that multiple threads don't access the functions for registering and deregistering frames at the same time, which can lead to crashes, as the functions are not guaranteed to be thread-safe (at least I couldn't find any documentation stating they were).
It also no longer deregisters frames during
Drop, which caused issues when exiting the wasmer. Instead, deregistrations are now queued in aGLOBAL_UNREGISTRYand processed before the next batch of registrations.Fixes #5877