Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRASH: SIGSEGV on sigreturn under QEMU for AArch64-on-x86 #7371

Open
derekbruening opened this issue Mar 18, 2025 · 3 comments
Open

CRASH: SIGSEGV on sigreturn under QEMU for AArch64-on-x86 #7371

derekbruening opened this issue Mar 18, 2025 · 3 comments

Comments

@derekbruening
Copy link
Contributor

This is another QEMU failure that happens on Ubuntu22 hit for #7270 but which did not happen on Ubuntu20.
This happens in a target-AArch64 build on an x86 machine.

These tests hit it among those labeled RUNS_ON_QEMU:

$ ctest -j 10 -L RUNS_ON_QEMU
96% tests passed, 4 tests failed out of 94
The following tests FAILED:
	 79 - code_api|linux.signal_racesys (Failed)
	124 - code_api|client.exception (Failed)
	163 - code_api|client.drx_buf-test (Failed)
	172 - code_api|client.drreg-test (Failed)

client.exception is #7297.

Actually on GA two of those are ignored so only signal_racesys shows up as a failure:

	(ignore: i#6260) 	code_api|client.drx_buf-test 
	(ignore: i#6260) 	code_api|client.drreg-test 

#6260 was filed after failures showed up after the SVE patch.

These all crash the same way:

<Application /home/derek/dr/git/build_a64_dbg_tests/suite/tests/bin/linux.signal_racesys (144495).  DynamoRIO internal crash at PC 0x0000005504873d80.  Please report this at http://dynamorio.org/issues/.  Program aborted.
Received SIGSEGV at generated pc 0x0000005504873d80 in thread 144495

<Application /home/derek/dr/git/build_a64_dbg_tests/suite/tests/bin/client.drx_buf-test (144725).  Tool internal crash at PC 0x0000005504889d80.  Please report this at your tool's issue tracker.  Program aborted.
Received SIGSEGV at generated pc 0x0000005504889d80 in thread 144725

<Application /home/derek/dr/git/build_a64_dbg_tests/suite/tests/bin/client.drreg-test (144753).  Tool internal crash at PC 0x0000005504895d80.  Please report this at your tool's issue tracker.  Program aborted.
Received SIGSEGV at generated pc 0x0000005504895d80 in thread 144753

From studying the DR logs, the app has a deliberate crash that raises SIGSEGV, and there is a handler somewhere that should recover from it, but instead the sigreturn when DR tries to send control from its handler to the app handler raises a SIGSEGV from the kernel:

fcache_return:
  0x0000005504889d80  f9001785   str    %x5 -> +0x28(%x28)[8byte]
  0x0000005504889d84  f9401f85   ldr    +0x38(%x28)[8byte] -> %x5
main_signal_handler: thread=268657, sig=11, xsp=0x0000005544943da0, retaddr=0x000000000000000b
siginfo: sig = 11, pid = 0, status = 0, errno = 0, si_code = 128
computing memory target for 0x0000005504889d80 causing SIGSEGV, kernel claims it is 0x0000000000000000
opnd_compute_address for: +0x28(%x28)
        base => 0x00000055448f3000
        index,scale => 0x00000055448f3000
        disp => 0x00000055448f3028
For SIGSEGV at cache pc 0x0000005504889d80, computed target write 0x0000000000000000
        faulting instr: str    %x5 -> +0x28(%x28)[8byte]

That's si_code==128==0x80==SI_KERNEL.

So kernel is raising SIGSEGV on SYS_sigreturn: something is wrong with the frame.

Could it be that all DR sigreturns crash this way on this version of QEMU and just most of RUNS_ON_QEMU tests don't have signals?

I tried signal1000 and it works under QEMU w/o DR but hits same 0x*d80 crash
w/ DR:

$ /usr/bin/qemu-aarch64 "-L" "/usr/aarch64-linux-gnu" suite/tests/bin/linux.signal1000
Sending SIGUSR2
Sending SIGUSR1
in signal handler
Got SIGUSR1
Sending SIGRTMAX
in signal handler
Got SIGRTMAX
Generating SIGSEGV
in signal handler
Got SIGSEGV
250006.902505

$ /usr/bin/qemu-aarch64 "-L" "/usr/aarch64-linux-gnu" bin64/drrun "-xarch_root" "/usr/aarch64-linux-gnu" -- suite/tests/bin/linux.signal1000
<Starting application /home/derek/dr/git/build_a64_dbg_tests/suite/tests/bin/linux.signal1000 (305658)>
...
Sending SIGUSR2
Sending SIGUSR1
in signal handler
Got SIGUSR1
<Application /home/derek/dr/git/build_a64_dbg_tests/suite/tests/bin/linux.signal1000 (305658).  DynamoRIO internal crash at PC 0x0000005504a5cd80.  Please report this at http://dynamorio.org/issues/.  Program aborted.
Received SIGSEGV at generated pc 0x0000005504a5cd80 in thread 305658
Base: 0x0000000071000000
Registers:	eflags=0x0000000060000000
version 11.90.20165, custom build
-no_dynamic_options -xarch_root '/usr/aarch64-linux-gnu' -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct -unsafe_ignore_takeover_timeout -takeover_timeout
0x0000005500800070 0x00000057450d73fc
0x0000005500800240 0x00000057450d74cc
0x0000005500800350 0x0000005500806270>

I tried aligning the sigcontext from 16 to 32 and 64 bytes: same failure.

@derekbruening
Copy link
Contributor Author

The above is QEMU 6.2. On 8.2 the failure is different:

Sending SIGUSR2
Sending SIGUSR1
in signal handler
Got SIGUSR1
<Application tests/bin/linux.signal1000 (90746).  Internal Error: DynamoRIO debug check failure: core/unix/signal_linux_aarch64.c:235 next_head->magic == ESR_MAGIC || next_head->magic == SVE_MAGIC || next_head->magic == EXTRA_MAGIC

@derekbruening
Copy link
Contributor Author

I built the latest QEMU 10.0.0-rc0 and it has the same failure:

$ ~/extsw/qemu/qemu-10.0.0-rc0/build/qemu-bundle/usr/local/bin/qemu-aarch64 "-L" "/usr/aarch64-linux-gnu" bin64/drrun "-xarch_root" "/usr/aarch64-linux-gnu" -- suite/tests/bin/linux.signal1000
Sending SIGUSR2
Sending SIGUSR1
in signal handler
Got SIGUSR1
<Application suite/tests/bin/linux.signal1000 (231927).  Internal Error: DynamoRIO debug check failure: core/unix/signal_linux_aarch64.c:235 next_head->magic == ESR_MAGIC || next_head->magic == SVE_MAGIC || next_head->magic == EXTRA_MAGIC

@derekbruening
Copy link
Contributor Author

Tried passing -cpu max,sve=off to QEMU 8.2: same SVE assert.
Though https://gitlab.com/qemu-project/qemu/-/issues/2304 says sve2 still enabled: fixed 6 months ago; it is in 10.0.0.

Tried in 10.0: same! DR still thinks SVE is there:

ID_AA64PFR0_EL1 = 0x0001000100110011
   Processor has FEATURE_FP16
   Processor has FEATURE_SVE
   Processor has FEATURE_DIT

If I ignore the assert, we hit the same SIGSEGV as in older QEMU: so newer QEMU is not helping.

$ extsw/qemu/qemu-10.0.0-rc0/build/qemu-bundle/usr/local/bin/qemu-aarch64 -cpu max,sve=off "-L" "/usr/aarch64-linux-gnu" bin64/drrun "-xarch_root" "/usr/aarch64-linux-gnu" -ignore_assert_list '*' -- suite/tests/bin/linux.signal1000
...
Sending SIGUSR2
Sending SIGUSR1
in signal handler
Got SIGUSR1
<Ignoring assert dr/git/src/core/unix/signal_linux_aarch64.c:235 next_head->magic == ESR_MAGIC || next_head->magic == SVE_MAGIC || next_head->magic == EXTRA_MAGIC>
<sigcontext_to_mcontext_simd 271 Unhandled section with magic number 0x54366345>
<Application dr/git/build_a64_dbg_tests/suite/tests/bin/linux.signal1000 (318569).  DynamoRIO internal crash at PC 0x00003fffc0012d80.  Please report this at http://dynamorio.org/issues/.  Program aborted.
Received SIGSEGV at generated pc 0x00003fffc0012d80 in thread 318569

edeiana added a commit that referenced this issue Mar 28, 2025
Add signal_racesys test to ignore list when executing on
QEMU when DynamoRIO's target is AARCH64 on an x86 host.

Issue #7371
edeiana added a commit that referenced this issue Mar 29, 2025
Adds `code_api|linux.signal_racesys` test to ignore list when executing
under QEMU on an x86 host with DynamoRIO's target set to AARCH64.
Upgrades from Ubuntu 20.04 to 22.04 for aarch64-cross-compile jobs.

Issue #7371
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant