DRM backend crashes on resume from S3 sleep when amdgpu performs MODE1 GPU reset (RDNA4 / gfx1201)

Hello, ive swapped my gpu(6700xt) for 9070xt, and i can no longer wake my system up, after suspending it with systemctl suspend. All im getting is a black screen with my mouse cursor. I have no wake up/suspend issues on KDE.

[gpu-info.txt](https://github.com/user-attachments/files/26594903/gpu-info.txt)
[hyprctl-version.txt](https://github.com/user-attachments/files/26594905/hyprctl-version.txt)
[amdgpu-resume.txt](https://github.com/user-attachments/files/26594907/amdgpu-resume.txt)
[kernel-resume-boot.txt](https://github.com/user-attachments/files/26594904/kernel-resume-boot.txt)
[hypr-sigsegv-trace.txt](https://github.com/user-attachments/files/26594906/hypr-sigsegv-trace.txt)

The bug report below, has been written by Claude. I have tried to troubleshoot the issue myself by using it, but i was unable to solve it.
## System Information

**OS:** CachyOS (Arch-based, rolling)  
**Kernel:** 6.19.3-2-cachyos                                               
aquamarine 0.10.0-4.1
hyprgraphics 0.5.1-1.1
hyprland 0.54.3-2.1
 ~      
**GPU:** Sapphire Radeon RX 9070 XT Pulse 16GB (RDNA4, gfx1201, PCI ID 1002:xxxx)  
**CPU:** AMD Ryzen 7 5700X  
**Display:** Dell Alienware AW2724DM, connected via DisplayPort  
**Session:** TTY autologin → Hyprland (no display manager)  
**Sleep state:** S3 deep sleep (`cat /sys/power/mem_sleep` → `s2idle [deep]`)  

## Description
After upgrading from an RX 6700 XT (RDNA2) to an RX 9070 XT (RDNA4), resuming from S3 sleep
results in a black screen with only the mouse cursor visible. The system cannot be recovered
without manually switching to a TTY and relaunching Hyprland.

KDE Plasma (KWin) on the same hardware, same kernel, same GPU, recovers from S3 sleep without
any issues. The problem is specific to Hyprland/aquamarine.

## Root Cause (Diagnosed)
On resume from S3, amdgpu performs a full MODE1 GPU reset instead of a clean save/restore
cycle. This is confirmed in the kernel journal:
amdgpu 0000:0a:00.0: amdgpu: MODE1 reset
amdgpu 0000:0a:00.0: amdgpu: GPU mode1 reset
amdgpu 0000:0a:00.0: amdgpu: GPU smu mode1 reset

Additionally, an SMU interface version mismatch is logged:
smu driver if version = 0x0000002e, smu fw if version = 0x00000033
amdgpu: SMU driver if version not matched

Aquamarine does not handle GPU reset events in its DRM backend. When the reset occurs:

1. Hyprland crashes immediately with **SIGSEGV** inside aquamarine's DRM cleanup path —
   `SDRMConnector::disconnect` calls `CLogger::log` on an already-invalid backend object
   (use-after-free during `CDRMBackend` destructor).
2. Autologin restarts Hyprland. The new instance tries to initialize, but the GPU is still
   mid-reset. `CAsyncResourceGatherer::asyncAssetSpinLock` in libhyprgraphics deadlocks
   waiting on a GPU context that no longer exists. Hyprland's watchdog fires **SIGABRT**.
3. Steps 2-3 repeat every ~5 seconds for approximately 60-90 seconds until the GPU reset
   completes and Hyprland finally starts successfully.

During this entire loop the DRM cursor plane remains active (owned independently of the
compositor), which is why the mouse cursor stays visible on an otherwise black screen.

## Crash Traces

### SIGSEGV — aquamarine DRM cleanup crash (PID 191376)
Signal: 11 (SEGV)
Stack trace of thread 191376:
#0  CLogger::log (libaquamarine.so.9 + 0x7cc91)
#1  SDRMConnector::disconnect (libaquamarine.so.9 + 0xb3aaf)
#2  SDRMConnector::~SDRMConnector (libaquamarine.so.9 + 0xb3bd2)
#3  CSharedPointer<SDRMConnector>::_delete (libaquamarine.so.9 + 0xc4408)
#4  CDRMBackend::~CDRMBackend (libaquamarine.so.9 + 0xa729f)
#5  CSharedPointer<CDRMBackend>::_delete (libaquamarine.so.9 + 0xbd44b)
#6  CBackend::~CBackend (libaquamarine.so.9 + 0x7b111)
#7  CSharedPointer<CBackend>::_delete (libaquamarine.so.9 + 0x7bb18)
#8  (libaquamarine.so.9 + 0x7aa77)
#9  __cxa_finalize (libc.so.6 + 0x46b5e)

The crash is in `CLogger::log` called from `SDRMConnector::disconnect`, which is triggered
during `CDRMBackend`'s destructor. The backend object is already in an invalid state due to
the GPU reset, resulting in a use-after-free.

### SIGABRT — hyprgraphics spinlock deadlock on restart (PID 205442)

Signal: 6 (ABRT)
Stack trace of thread 205442:
#0  abort (libc.so.6)
#1  (/usr/bin/Hyprland + 0x29d65c)        ← Hyprland watchdog
#2  (libc.so.6)
#3  pthread_kill (libc.so.6)
#4  raise (libc.so.6)
#5  abort (libc.so.6)
#6  (libstdc++.so.6)
#7  (libstdc++.so.6)
#8  std::terminate (libstdc++.so.6)
#9  __cxa_throw (libstdc++.so.6)
#10 (/usr/bin/Hyprland + 0x16384a)
#11 CCompositor::initServer (Hyprland + 0x2c479e)
#12 main (Hyprland + 0x220444)
Stack trace of thread 205448 (deadlocked):
#0  syscall (libc.so.6)
#1  pthread_cond_clockwait (libc.so.6)
#4  CAsyncResourceGatherer::asyncAssetSpinLock (libhyprgraphics.so.4 + 0x2357d)

On restart during the GPU reset, `CAsyncResourceGatherer::asyncAssetSpinLock` waits
indefinitely on a condition variable that will never be signaled because the GPU context
is gone. Hyprland's watchdog detects the hang and aborts.

## Reproduction Steps

1. Install Hyprland on an RDNA4 GPU (tested: RX 9070 XT, gfx1201)
2. Suspend system to S3 sleep (`systemctl suspend` or equivalent)
3. Wake the system
4. Observe: black screen with mouse cursor visible, Hyprland unresponsive
5. `Ctrl+Alt+F3` → TTY login → `journalctl --user -b | grep -i hyprland` shows repeated
   SIGABRT crashes during the ~60-90 second window after resume

## What Works / What Doesn't

- **KDE Plasma (KWin):** Recovers from S3 sleep cleanly on identical hardware/kernel —
  confirms the GPU reset itself is not fatal, only aquamarine's response to it is.
- **DPMS off/on via hyprctl:** Returns `ok` but has no visible effect — aquamarine accepts
  the command but cannot re-drive the display after the DRM state was invalidated by the reset.
- **hyprctl reload:** Same — accepted but ineffective.
- **Switching TTY and back:** Reliably recovers the session, as it forces a full KMS
  re-acquisition independent of the compositor's internal state.

## Expected Behavior

Aquamarine's DRM backend should handle GPU reset events gracefully — either by:
- Detecting the reset, rebuilding its DRM state, and signaling Hyprland to re-acquire outputs, or
- At minimum, not crashing with a use-after-free in `SDRMConnector::disconnect` when the
  backend is destroyed in a post-reset invalid state.

## Additional Notes
- The RX 6700 XT (RDNA2) on the same system, same config, never triggered this. The MODE1
  reset on resume appears to be specific to RDNA4's current kernel driver maturity.
- Kernel parameters tried that had no effect: `amdgpu.sg_display=0`,
  `amdgpu.ppfeaturemask=0xfffd7fff`
- Current workaround: delay Hyprland restarts by 90 seconds after a crash, to let the GPU
  reset complete before retrying. This is obviously not a real fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DRM backend crashes on resume from S3 sleep when amdgpu performs MODE1 GPU reset (RDNA4 / gfx1201) #271

System Information

Description

Root Cause (Diagnosed)

Crash Traces

SIGSEGV — aquamarine DRM cleanup crash (PID 191376)

SIGABRT — hyprgraphics spinlock deadlock on restart (PID 205442)

Reproduction Steps

What Works / What Doesn't

Expected Behavior

Additional Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DRM backend crashes on resume from S3 sleep when amdgpu performs MODE1 GPU reset (RDNA4 / gfx1201) #271

Description

System Information

Description

Root Cause (Diagnosed)

Crash Traces

SIGSEGV — aquamarine DRM cleanup crash (PID 191376)

SIGABRT — hyprgraphics spinlock deadlock on restart (PID 205442)

Reproduction Steps

What Works / What Doesn't

Expected Behavior

Additional Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions