Skip to content

DRM backend crashes on resume from S3 sleep when amdgpu performs MODE1 GPU reset (RDNA4 / gfx1201) #271

@Reason7322

Description

@Reason7322

Hello, ive swapped my gpu(6700xt) for 9070xt, and i can no longer wake my system up, after suspending it with systemctl suspend. All im getting is a black screen with my mouse cursor. I have no wake up/suspend issues on KDE.

gpu-info.txt
hyprctl-version.txt
amdgpu-resume.txt
kernel-resume-boot.txt
hypr-sigsegv-trace.txt

The bug report below, has been written by Claude. I have tried to troubleshoot the issue myself by using it, but i was unable to solve it.

System Information

OS: CachyOS (Arch-based, rolling)
Kernel: 6.19.3-2-cachyos
aquamarine 0.10.0-4.1
hyprgraphics 0.5.1-1.1
hyprland 0.54.3-2.1
~
GPU: Sapphire Radeon RX 9070 XT Pulse 16GB (RDNA4, gfx1201, PCI ID 1002:xxxx)
CPU: AMD Ryzen 7 5700X
Display: Dell Alienware AW2724DM, connected via DisplayPort
Session: TTY autologin → Hyprland (no display manager)
Sleep state: S3 deep sleep (cat /sys/power/mem_sleeps2idle [deep])

Description

After upgrading from an RX 6700 XT (RDNA2) to an RX 9070 XT (RDNA4), resuming from S3 sleep
results in a black screen with only the mouse cursor visible. The system cannot be recovered
without manually switching to a TTY and relaunching Hyprland.

KDE Plasma (KWin) on the same hardware, same kernel, same GPU, recovers from S3 sleep without
any issues. The problem is specific to Hyprland/aquamarine.

Root Cause (Diagnosed)

On resume from S3, amdgpu performs a full MODE1 GPU reset instead of a clean save/restore
cycle. This is confirmed in the kernel journal:
amdgpu 0000:0a:00.0: amdgpu: MODE1 reset
amdgpu 0000:0a:00.0: amdgpu: GPU mode1 reset
amdgpu 0000:0a:00.0: amdgpu: GPU smu mode1 reset

Additionally, an SMU interface version mismatch is logged:
smu driver if version = 0x0000002e, smu fw if version = 0x00000033
amdgpu: SMU driver if version not matched

Aquamarine does not handle GPU reset events in its DRM backend. When the reset occurs:

  1. Hyprland crashes immediately with SIGSEGV inside aquamarine's DRM cleanup path —
    SDRMConnector::disconnect calls CLogger::log on an already-invalid backend object
    (use-after-free during CDRMBackend destructor).
  2. Autologin restarts Hyprland. The new instance tries to initialize, but the GPU is still
    mid-reset. CAsyncResourceGatherer::asyncAssetSpinLock in libhyprgraphics deadlocks
    waiting on a GPU context that no longer exists. Hyprland's watchdog fires SIGABRT.
  3. Steps 2-3 repeat every ~5 seconds for approximately 60-90 seconds until the GPU reset
    completes and Hyprland finally starts successfully.

During this entire loop the DRM cursor plane remains active (owned independently of the
compositor), which is why the mouse cursor stays visible on an otherwise black screen.

Crash Traces

SIGSEGV — aquamarine DRM cleanup crash (PID 191376)

Signal: 11 (SEGV)
Stack trace of thread 191376:
#0 CLogger::log (libaquamarine.so.9 + 0x7cc91)
#1 SDRMConnector::disconnect (libaquamarine.so.9 + 0xb3aaf)
#2 SDRMConnector::~SDRMConnector (libaquamarine.so.9 + 0xb3bd2)
#3 CSharedPointer::_delete (libaquamarine.so.9 + 0xc4408)
#4 CDRMBackend::~CDRMBackend (libaquamarine.so.9 + 0xa729f)
#5 CSharedPointer::_delete (libaquamarine.so.9 + 0xbd44b)
#6 CBackend::~CBackend (libaquamarine.so.9 + 0x7b111)
#7 CSharedPointer::_delete (libaquamarine.so.9 + 0x7bb18)
#8 (libaquamarine.so.9 + 0x7aa77)
#9 __cxa_finalize (libc.so.6 + 0x46b5e)

The crash is in CLogger::log called from SDRMConnector::disconnect, which is triggered
during CDRMBackend's destructor. The backend object is already in an invalid state due to
the GPU reset, resulting in a use-after-free.

SIGABRT — hyprgraphics spinlock deadlock on restart (PID 205442)

Signal: 6 (ABRT)
Stack trace of thread 205442:
#0 abort (libc.so.6)
#1 (/usr/bin/Hyprland + 0x29d65c) ← Hyprland watchdog
#2 (libc.so.6)
#3 pthread_kill (libc.so.6)
#4 raise (libc.so.6)
#5 abort (libc.so.6)
#6 (libstdc++.so.6)
#7 (libstdc++.so.6)
#8 std::terminate (libstdc++.so.6)
#9 __cxa_throw (libstdc++.so.6)
#10 (/usr/bin/Hyprland + 0x16384a)
#11 CCompositor::initServer (Hyprland + 0x2c479e)
#12 main (Hyprland + 0x220444)
Stack trace of thread 205448 (deadlocked):
#0 syscall (libc.so.6)
#1 pthread_cond_clockwait (libc.so.6)
#4 CAsyncResourceGatherer::asyncAssetSpinLock (libhyprgraphics.so.4 + 0x2357d)

On restart during the GPU reset, CAsyncResourceGatherer::asyncAssetSpinLock waits
indefinitely on a condition variable that will never be signaled because the GPU context
is gone. Hyprland's watchdog detects the hang and aborts.

Reproduction Steps

  1. Install Hyprland on an RDNA4 GPU (tested: RX 9070 XT, gfx1201)
  2. Suspend system to S3 sleep (systemctl suspend or equivalent)
  3. Wake the system
  4. Observe: black screen with mouse cursor visible, Hyprland unresponsive
  5. Ctrl+Alt+F3 → TTY login → journalctl --user -b | grep -i hyprland shows repeated
    SIGABRT crashes during the ~60-90 second window after resume

What Works / What Doesn't

  • KDE Plasma (KWin): Recovers from S3 sleep cleanly on identical hardware/kernel —
    confirms the GPU reset itself is not fatal, only aquamarine's response to it is.
  • DPMS off/on via hyprctl: Returns ok but has no visible effect — aquamarine accepts
    the command but cannot re-drive the display after the DRM state was invalidated by the reset.
  • hyprctl reload: Same — accepted but ineffective.
  • Switching TTY and back: Reliably recovers the session, as it forces a full KMS
    re-acquisition independent of the compositor's internal state.

Expected Behavior

Aquamarine's DRM backend should handle GPU reset events gracefully — either by:

  • Detecting the reset, rebuilding its DRM state, and signaling Hyprland to re-acquire outputs, or
  • At minimum, not crashing with a use-after-free in SDRMConnector::disconnect when the
    backend is destroyed in a post-reset invalid state.

Additional Notes

  • The RX 6700 XT (RDNA2) on the same system, same config, never triggered this. The MODE1
    reset on resume appears to be specific to RDNA4's current kernel driver maturity.
  • Kernel parameters tried that had no effect: amdgpu.sg_display=0,
    amdgpu.ppfeaturemask=0xfffd7fff
  • Current workaround: delay Hyprland restarts by 90 seconds after a crash, to let the GPU
    reset complete before retrying. This is obviously not a real fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions