Skip to content

drm: CRTC starvation recovery + clear stale page-flip state after suspend#254

Open
UncleJ4ck wants to merge 2 commits intohyprwm:mainfrom
UncleJ4ck:main
Open

drm: CRTC starvation recovery + clear stale page-flip state after suspend#254
UncleJ4ck wants to merge 2 commits intohyprwm:mainfrom
UncleJ4ck:main

Conversation

@UncleJ4ck
Copy link
Copy Markdown

@UncleJ4ck UncleJ4ck commented Mar 4, 2026

Setup

I run a laptop (Intel HD 630 iGPU, Quadro M620 dGPU unused) with 3 external monitors (HDMI-A-1, DP-3, DP-5) and the built-in lid display (eDP-1). The HD 630 has exactly 3 CRTCs but 4 connectors. Since I force AQ_DRM_DEVICES=/dev/dri/intel-igpu to avoid the dGPU, all 4 connectors compete for those 3 CRTCs.

A lid-manager.sh daemon disables eDP-1 when the lid is closed or when all 3 externals are connected (hyprctl keyword monitor "eDP-1,disable").


Commit 1: CRTC starvation recovery

The bug

When I boot with lid closed and all 3 externals connected, everything works. But if I unplug one external and replug it (or open the lid while only 2 externals are connected), the formerly-starved connector never gets a CRTC. The output stays dark. Only fix: restart Hyprland entirely.

What happens: all 3 CRTCs get claimed at boot. When a connector disconnects, its CRTC should be freed, but the cleanup in recheckCRTCs() only logged "removing old crtc" without calling crtc.reset(), so the stale reference stayed. When the compositor disables an output, nothing triggers CRTC reassignment. applyCommit() updates enabledState but never tells the backend to re-evaluate which connectors should get which CRTCs.

This affects anyone with more connectors than CRTCs. Common on Intel iGPU laptops with docking stations. Related issues: aquamarine#36, #40, #59, Hyprland#7572, #7694.

The fix

Three changes in DRM.cpp:

  1. Two-pass CRTC assignment in recheckCRTCs(). Pass 1 releases CRTCs from disabled-but-connected outputs and assigns CRTCs to enabled connectors. Pass 2 gives remaining free CRTCs to disabled connectors as backup slots for quick re-enable. The disconnected-connector cleanup now actually calls crtc.reset().

  2. No-CRTC guard in recheckOutputs(). If a connector has no CRTC after recheckCRTCs(), connect() is skipped. It gets connected later when a CRTC frees up via the deferred recheck.

  3. Deferred recheck on enabledState change in applyCommit(). When enabledState transitions, addIdleEvent() schedules recheckOutputs() on the next event loop iteration. Same mechanism already used for frame scheduling. Avoids reentrancy since applyCommit runs inside the DRM commit path.

Safety

If all outputs have CRTCs (the common case), Pass 1 releases nothing, Pass 2 has no leftovers, and the idle-event never fires. Tiled display connectors (tilingRedundant) are skipped at the top of recheckCRTCs(). Multi-GPU setups run each backend independently. The idle event uses a weak pointer so it's safe if the backend is destroyed between scheduling and firing.


Commit 2: Clear stale page-flip state after suspend/resume

The bug

After S3 suspend or suspend-then-hibernate, all monitors go permanently black. The only recovery is a hard restart. I ran into this on the same laptop after suspending overnight.

The page-flip lifecycle works like this: compositor submits a buffer, atomic commit sets DRM_MODE_PAGE_FLIP_EVENT, isPageFlipPending becomes true. Kernel completes the flip, delivers an event on the DRM fd, handlePF() fires and sets isPageFlipPending = false. Next frame gets scheduled.

During suspend, the display hardware powers off. Any in-flight page-flip completion event is lost because no vblank interrupt fires when the hardware is off. On resume, libseat fires changeActive, restoreAfterVT() runs, but isPageFlipPending is still true from before suspend. handlePF() never fires because there's no event to process. scheduleFrame() checks isPageFlipPending, sees true, returns early. No frames ever get scheduled. Display stays black.

VT switch doesn't hit this because the kernel preserves DRM state and queues pending page-flip events in the fd buffer. They get delivered when the session reactivates. During suspend the hardware is off, so there's nothing to deliver.

The regression trace: commit d83c97f (Dec 2025) removed impl->reset() from restoreAfterVT(). That reset used to deactivate all CRTCs at the kernel level, implicitly clearing stale page-flip state. It was removed for valid reasons (caused flicker during VT switch), but left no replacement for the suspend path. Commit 603f5cd (Feb 2026) added isFrameRunning as a third guard in scheduleFrame(), making the stale-state problem worse.

wlroots had the same bug class: issues #2290, #2325, #2373, #2395, fixed in commit 324eeaa0cd ("disable all CRTCs after VT switch").

Related: Hyprland#8312 ("freezes after suspension"), Hyprland#6289 ("black screen after resuming from suspend").

The fix

Changes across DRM.cpp, DRM.hpp, Atomic.cpp, Legacy.cpp:

  1. In restoreAfterVT(), clear isPageFlipPending, isFrameRunning, and frameEventScheduled for all connectors before recheckOutputs(). For VT switch this is a no-op since events still arrive and handlePF() would set them false anyway. For suspend it removes the blocker so frames can be scheduled again.

  2. In commitState(), detect stale page-flips using a CLOCK_BOOTTIME timestamp recorded when isPageFlipPending is set. If a modeset commit finds a flip pending for >500ms (well past any vblank interval even at low refresh rates), treat it as stale and clear the flags. CLOCK_BOOTTIME is used instead of CLOCK_MONOTONIC because CLOCK_MONOTONIC does not advance during suspend on Linux, making elapsed time appear near-zero after resume.

  3. Added pageFlipPendingAtMs field to SDRMConnector. Timestamp recorded in both atomic (Atomic.cpp) and legacy (Legacy.cpp) commit paths when isPageFlipPending is set true.

Safety

VT switch: Pending page-flip events survive in the DRM fd buffer. After restoreAfterVT() returns, dispatchEvents() processes them. handlePF() fires and sees isPageFlipPending = false (already cleared). The = false is a no-op. onPresent() rotates buffer refs normally. No visible difference.

S3/S4 resume: No pending events in fd buffer. The flag clearing removes the blocker. The blocking modeset in restoreAfterVT() displays the restored frame. Next input or DPMS-on triggers scheduleFrame(), which succeeds, and frames render.

Edge case — flip completes just before suspend: handlePF() already fired before suspend. isPageFlipPending is already false. The clearing loop skips that connector. No change.

Tested

On Intel HD 630 (3 CRTCs, 4 connectors) with:

  • Short S3 suspend/resume: screens recover, log shows "Clearing stale page-flip state for HDMI-A-1 during modeset (pending for 503ms)"
  • Long suspend-then-hibernate (>10h): screens recover on resume
  • VT switch (Ctrl+Alt+F2 and back): no regression, screens restore cleanly
  • Hotplug cycles after resume: CRTC starvation recovery still works
  • Fresh boot: no false stale detection, flags are zero

@vaxerski
Copy link
Copy Markdown
Member

vaxerski commented Mar 4, 2026

@gulafaran can you test this? I don't hw

j4kuuu added 2 commits March 12, 2026 09:28
When more displays are connected than CRTCs available, connectors that
arrive after all CRTCs are claimed become starved.  The compositor can
free a CRTC by disabling an output, but nothing reclaimed that CRTC for
the starved connector.

Restructure recheckCRTCs() into two passes: first, disabled outputs
release their CRTCs and enabled connectors get priority assignment;
second, any remaining free CRTCs are given to disabled connectors as
backup slots for quick re-enable.

When applyCommit() detects an enabledState transition, schedule
recheckOutputs() via addIdleEvent so starved connectors pick up the
freed CRTC on the next event loop iteration, without reentrancy or
blocking the event loop.
My laptop (i915, 3 external monitors) would go permanently black after
S3 or suspend-then-hibernate. The only recovery was a hard restart.

The root cause: when display hardware powers off during suspend, any
in-flight page-flip completion event is lost. handlePF() never fires,
so isPageFlipPending stays true from the last frame before suspend. On
resume, scheduleFrame() sees the stale flag and bails, commitState()
rejects every frame with "Cannot commit when a page-flip is awaiting",
and nothing ever clears it. Screens stay dark forever.

VT switch doesn't hit this because the kernel preserves DRM state and
queues pending events in the fd buffer. Suspend kills the hardware, so
there's nothing to deliver.

Fix in two places:

1. restoreAfterVT(): clear isPageFlipPending, isFrameRunning, and
   frameEventScheduled for all connectors before recheckOutputs(). For
   VT switch this is harmless (the events arrive anyway and handlePF
   would set them false). For suspend it unblocks frame scheduling.

2. commitState(): record a CLOCK_BOOTTIME timestamp when
   isPageFlipPending is set. If a modeset finds a flip pending for
   >500ms (well past any vblank), treat it as stale and clear the
   flags. CLOCK_BOOTTIME instead of CLOCK_MONOTONIC because MONOTONIC
   freezes during suspend on Linux, making elapsed time look like zero
   after resume.

Timestamp recorded in both atomic and legacy commit paths.

Relates to Hyprland#8312, Hyprland#6289.
@UncleJ4ck UncleJ4ck changed the title drm: handle CRTC starvation recovery when outputs are disabled drm: CRTC starvation recovery + clear stale page-flip state after suspend Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants