-
Notifications
You must be signed in to change notification settings - Fork 576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRASH aarch64 under QEMU at proc_init mrs x0, ID_AA64MMFR2_EL1
#7315
Comments
Under QEMU, proc_init()'s MRS of ID_AA64ISAR2_EL1 causes a fatal SIGILL. We avoid it when under QEMU. We'd prefer to use TRY_EXCEPT_ALLOW_NO_DCONTEXT, but proc_init() is called prior to init-time signal handling being set up: and we'd need to add SIGILL to the ones caught at init time, which complicates later uses of SIGILL for NUDGESIG_SIGNUM and suspend_signum (and on x86 XSTATE_QUERY_SIG): so we'd want SIGILL to only work for try-except at init time. That is all a little too involved to implement right now. Tested on a local Ubuntu22 x86 machine with aarch64 cross-compilation where every test failed before this fix and nearly all pass now (the others fail for other reasons masked by this SIGILL before). Fixes #7315
Emulation of ID_AA64MMFR2_EL1 in qemu-aarch64 should be supported from QEMU version 8.0.0 and later (QEMU commit bc6bd20ee353834). Unfortunately the QEMU in Ubuntu 22.04 is only QEMU 6.2, so it doesn't have that. PS: in your pullreq the log message quotes the wrong register name: "Skipping MRS of ID_AA64ISAR2_EL1 under QEMU". |
What's the QEMU version in Ubuntu20? Since this issue was created for Ubuntu22, I assume the mrs worked fine under QEMU on Ubuntu20 (the current OS in our test CI workflows). Or was the code path skipped for some reason? |
Very old versions of QEMU don't implement the "emulate the kernel's trap-and-emulate of ID register accesses" at all. It looks like dynamorio has support for "check whether ID_AA64ISAR0_EL1 SIGILLs on access" in mrs_id_reg_supported(), so a really old QEMU will be OK because it'll take the same "handle a SIGILL there" path as it would under an old host kernel that doesn't expose HWCAP_CPUID. (Could we use the same mechanism for ID_AA64MMFR2_EL1 that we do for handling "ID_AA64ISAR0_EL1 might SIGILL" in mrs_id_reg_supported(), by the way? That would have the advantage that you could automatically do the right thing on QEMU 8.0 and later where ID_AA64MMFR2_EL1 is emulated and you might want to look at the field values in it. Edit: oh, FIXME i#5474 says "we don't actually catch the SIGILL", so in fact on old host kernels this would just crash. If dynamorio is in a position to look at the ELF hwcaps you could alternatively check for HWCAP_CPUID before reading the regs instead.) Ubuntu 20.04 had QEMU 4.2, though, which is after QEMU introduced support for ID register trap emulation (which came with QEMU commit 37020ff15398 in version 4.0.0), so if we didn't see this problem with Ubuntu 20.04's QEMU then something else must be going on. |
Had the mrs_id_reg_supported returned false and skipped read_feature_regs at dynamorio/core/arch/aarch64/proc.c Line 129 in d4a6206
Also, we know that mrs_id_reg_supported did not crash in the above example. So presumably the mrs for ID_AA64MMFR2_EL1 worked in Ubuntu20 QEMU. But we don't know why yet. |
I'm thinking let's leave this open for a better solution than just disabling this MRS for xarch_root. I think adding try-except for SIGILL in proc_init() is the best solution which would have the MRS still run on whatever QEMU supported it. |
Under QEMU, proc_init()'s MRS of ID_AA64MMFR2_EL1 causes a fatal SIGILL. We avoid it when under QEMU. (Note that this did not fail on Ubuntu20's QEMU, only on Ubuntu22, but we avoid under any QEMU for now to unblock progress.) We'd prefer to use TRY_EXCEPT_ALLOW_NO_DCONTEXT, but proc_init() is called prior to init-time signal handling being set up: and we'd need to add SIGILL to the ones caught at init time, which complicates later uses of SIGILL for NUDGESIG_SIGNUM and suspend_signum (and on x86 XSTATE_QUERY_SIG): so we'd want SIGILL to only work for try-except at init time. That is all a little too involved to implement right now so we're putting in this disabling to unblock Ubuntu22 progress for #7270. Long-term we probably want to put in the try-except, so we'll leave #7315 open. Tested on a local Ubuntu22 x86 machine with aarch64 cross-compilation where every test failed before this fix and nearly all pass now (the others fail for other reasons masked by this SIGILL before). Issue: #7270, #7315
Disables the AArch64 proc_init() mrs of ID_AA64MMFR2_EL1 under STANDALONE_UNIT_TEST to avoid a fatal SIGILL in unit_tests under QEMU. Tested on an Ubuntu 22 machine in an AArch64 cross-compile build where unit_tests dies with SIGILL without this change and passes with it. Issue: #7315
Disables the AArch64 proc_init() mrs of ID_AA64MMFR2_EL1 under STANDALONE_UNIT_TEST to avoid a fatal SIGILL in unit_tests under QEMU. Tested on an Ubuntu 22 machine in an AArch64 cross-compile build where unit_tests dies with SIGILL without this change and passes with it. Issue: #7315
On Ubuntu22 (where we have to migrate: #7270), every aarch64 test run under QEMU hits a SIGILL from the final MRS instruction in proc_init:
mrs x0, ID_AA64MMFR2_EL1
.This shows up as:
DR only intercepts SIGSEGV and SIGBUS during init time. I'd like to intercept SIGILL and put these MRS instructions in a try-except: but it's tricky since we use SIGILL for nudges and suspending later on. I may try to add an init-time-only try-except handling.
This is blocking us from moving to Ubuntu22 on Github Actions as mandated: #7270.
The text was updated successfully, but these errors were encountered: