drm: CRTC starvation recovery + clear stale page-flip state after suspend#254
Open
UncleJ4ck wants to merge 2 commits intohyprwm:mainfrom
Open
drm: CRTC starvation recovery + clear stale page-flip state after suspend#254UncleJ4ck wants to merge 2 commits intohyprwm:mainfrom
UncleJ4ck wants to merge 2 commits intohyprwm:mainfrom
Conversation
Member
|
@gulafaran can you test this? I don't hw |
added 2 commits
March 12, 2026 09:28
When more displays are connected than CRTCs available, connectors that arrive after all CRTCs are claimed become starved. The compositor can free a CRTC by disabling an output, but nothing reclaimed that CRTC for the starved connector. Restructure recheckCRTCs() into two passes: first, disabled outputs release their CRTCs and enabled connectors get priority assignment; second, any remaining free CRTCs are given to disabled connectors as backup slots for quick re-enable. When applyCommit() detects an enabledState transition, schedule recheckOutputs() via addIdleEvent so starved connectors pick up the freed CRTC on the next event loop iteration, without reentrancy or blocking the event loop.
My laptop (i915, 3 external monitors) would go permanently black after S3 or suspend-then-hibernate. The only recovery was a hard restart. The root cause: when display hardware powers off during suspend, any in-flight page-flip completion event is lost. handlePF() never fires, so isPageFlipPending stays true from the last frame before suspend. On resume, scheduleFrame() sees the stale flag and bails, commitState() rejects every frame with "Cannot commit when a page-flip is awaiting", and nothing ever clears it. Screens stay dark forever. VT switch doesn't hit this because the kernel preserves DRM state and queues pending events in the fd buffer. Suspend kills the hardware, so there's nothing to deliver. Fix in two places: 1. restoreAfterVT(): clear isPageFlipPending, isFrameRunning, and frameEventScheduled for all connectors before recheckOutputs(). For VT switch this is harmless (the events arrive anyway and handlePF would set them false). For suspend it unblocks frame scheduling. 2. commitState(): record a CLOCK_BOOTTIME timestamp when isPageFlipPending is set. If a modeset finds a flip pending for >500ms (well past any vblank), treat it as stale and clear the flags. CLOCK_BOOTTIME instead of CLOCK_MONOTONIC because MONOTONIC freezes during suspend on Linux, making elapsed time look like zero after resume. Timestamp recorded in both atomic and legacy commit paths. Relates to Hyprland#8312, Hyprland#6289.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Setup
I run a laptop (Intel HD 630 iGPU, Quadro M620 dGPU unused) with 3 external monitors (HDMI-A-1, DP-3, DP-5) and the built-in lid display (eDP-1). The HD 630 has exactly 3 CRTCs but 4 connectors. Since I force
AQ_DRM_DEVICES=/dev/dri/intel-igputo avoid the dGPU, all 4 connectors compete for those 3 CRTCs.A
lid-manager.shdaemon disables eDP-1 when the lid is closed or when all 3 externals are connected (hyprctl keyword monitor "eDP-1,disable").Commit 1: CRTC starvation recovery
The bug
When I boot with lid closed and all 3 externals connected, everything works. But if I unplug one external and replug it (or open the lid while only 2 externals are connected), the formerly-starved connector never gets a CRTC. The output stays dark. Only fix: restart Hyprland entirely.
What happens: all 3 CRTCs get claimed at boot. When a connector disconnects, its CRTC should be freed, but the cleanup in
recheckCRTCs()only logged "removing old crtc" without callingcrtc.reset(), so the stale reference stayed. When the compositor disables an output, nothing triggers CRTC reassignment.applyCommit()updatesenabledStatebut never tells the backend to re-evaluate which connectors should get which CRTCs.This affects anyone with more connectors than CRTCs. Common on Intel iGPU laptops with docking stations. Related issues: aquamarine#36, #40, #59, Hyprland#7572, #7694.
The fix
Three changes in
DRM.cpp:Two-pass CRTC assignment in
recheckCRTCs(). Pass 1 releases CRTCs from disabled-but-connected outputs and assigns CRTCs to enabled connectors. Pass 2 gives remaining free CRTCs to disabled connectors as backup slots for quick re-enable. The disconnected-connector cleanup now actually callscrtc.reset().No-CRTC guard in
recheckOutputs(). If a connector has no CRTC afterrecheckCRTCs(),connect()is skipped. It gets connected later when a CRTC frees up via the deferred recheck.Deferred recheck on
enabledStatechange inapplyCommit(). WhenenabledStatetransitions,addIdleEvent()schedulesrecheckOutputs()on the next event loop iteration. Same mechanism already used for frame scheduling. Avoids reentrancy sinceapplyCommitruns inside the DRM commit path.Safety
If all outputs have CRTCs (the common case), Pass 1 releases nothing, Pass 2 has no leftovers, and the idle-event never fires. Tiled display connectors (
tilingRedundant) are skipped at the top ofrecheckCRTCs(). Multi-GPU setups run each backend independently. The idle event uses a weak pointer so it's safe if the backend is destroyed between scheduling and firing.Commit 2: Clear stale page-flip state after suspend/resume
The bug
After S3 suspend or suspend-then-hibernate, all monitors go permanently black. The only recovery is a hard restart. I ran into this on the same laptop after suspending overnight.
The page-flip lifecycle works like this: compositor submits a buffer, atomic commit sets
DRM_MODE_PAGE_FLIP_EVENT,isPageFlipPendingbecomes true. Kernel completes the flip, delivers an event on the DRM fd,handlePF()fires and setsisPageFlipPending = false. Next frame gets scheduled.During suspend, the display hardware powers off. Any in-flight page-flip completion event is lost because no vblank interrupt fires when the hardware is off. On resume, libseat fires
changeActive,restoreAfterVT()runs, butisPageFlipPendingis still true from before suspend.handlePF()never fires because there's no event to process.scheduleFrame()checksisPageFlipPending, sees true, returns early. No frames ever get scheduled. Display stays black.VT switch doesn't hit this because the kernel preserves DRM state and queues pending page-flip events in the fd buffer. They get delivered when the session reactivates. During suspend the hardware is off, so there's nothing to deliver.
The regression trace: commit d83c97f (Dec 2025) removed
impl->reset()fromrestoreAfterVT(). That reset used to deactivate all CRTCs at the kernel level, implicitly clearing stale page-flip state. It was removed for valid reasons (caused flicker during VT switch), but left no replacement for the suspend path. Commit 603f5cd (Feb 2026) addedisFrameRunningas a third guard inscheduleFrame(), making the stale-state problem worse.wlroots had the same bug class: issues #2290, #2325, #2373, #2395, fixed in commit 324eeaa0cd ("disable all CRTCs after VT switch").
Related: Hyprland#8312 ("freezes after suspension"), Hyprland#6289 ("black screen after resuming from suspend").
The fix
Changes across
DRM.cpp,DRM.hpp,Atomic.cpp,Legacy.cpp:In
restoreAfterVT(), clearisPageFlipPending,isFrameRunning, andframeEventScheduledfor all connectors beforerecheckOutputs(). For VT switch this is a no-op since events still arrive andhandlePF()would set them false anyway. For suspend it removes the blocker so frames can be scheduled again.In
commitState(), detect stale page-flips using aCLOCK_BOOTTIMEtimestamp recorded whenisPageFlipPendingis set. If a modeset commit finds a flip pending for >500ms (well past any vblank interval even at low refresh rates), treat it as stale and clear the flags.CLOCK_BOOTTIMEis used instead ofCLOCK_MONOTONICbecauseCLOCK_MONOTONICdoes not advance during suspend on Linux, making elapsed time appear near-zero after resume.Added
pageFlipPendingAtMsfield toSDRMConnector. Timestamp recorded in both atomic (Atomic.cpp) and legacy (Legacy.cpp) commit paths whenisPageFlipPendingis set true.Safety
VT switch: Pending page-flip events survive in the DRM fd buffer. After
restoreAfterVT()returns,dispatchEvents()processes them.handlePF()fires and seesisPageFlipPending = false(already cleared). The= falseis a no-op.onPresent()rotates buffer refs normally. No visible difference.S3/S4 resume: No pending events in fd buffer. The flag clearing removes the blocker. The blocking modeset in
restoreAfterVT()displays the restored frame. Next input or DPMS-on triggersscheduleFrame(), which succeeds, and frames render.Edge case — flip completes just before suspend:
handlePF()already fired before suspend.isPageFlipPendingis already false. The clearing loop skips that connector. No change.Tested
On Intel HD 630 (3 CRTCs, 4 connectors) with: