Skip to content

Improve Windows gpu support#70

Merged
illuzen merged 13 commits into
mainfrom
illuzen/windows-gpu
Jun 30, 2026
Merged

Improve Windows gpu support#70
illuzen merged 13 commits into
mainfrom
illuzen/windows-gpu

Conversation

@illuzen

@illuzen illuzen commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Add comprehensive GPU detection support for AMD, NVIDIA, Intel, and Qualcomm

Overview

This PR significantly expands GPU detection and tier configuration to support a much wider range of graphics hardware. It addresses user reports of unrecognized GPUs (specifically Vega 8 APU and RX 560X) and proactively adds support for many other common GPUs that were previously falling back to conservative default settings.

What Changed

Bug Fixes

  • Fixed critical pattern matching bug: The "mi" pattern for AMD Instinct was matching any GPU name containing "mi" (e.g., "Family", "Graphics"). Changed to specific patterns: "instinct mi", "mi100", "mi200", "mi250", "mi300"
  • Fixed RX 560X misdetection: RX 560X was incorrectly matching the "5600" pattern (RDNA 1) instead of Polaris. Reordered detection to check Polaris patterns before RDNA 1

NVIDIA Additions

Tier Models Workgroup Formula
GTX 900 (Maxwell) 980, 970, 960, 950 max/18, min 768
GTX 700 (Kepler/Maxwell) 780, 770, 760, 750 max/20, min 512
GTX Legacy (Fermi/Kepler) GTX 600/500/400 series max/24, min 384
MX (Mobile) MX550, MX450, MX350, etc. max/24, min 384
GT (Entry-Level) GeForce GT series max/28, min 256
Professional Added A100, H100, L4 datacenter GPUs max/10, min 2560

Also added missing patterns: RTX 3060, 3050, GTX 1050, 1030

AMD Additions

Tier Models Workgroup Formula
Radeon 780M (RDNA 3 APU) 780M max/12, min 2048
Radeon 7x0M (RDNA 3 APU) 760M, 740M max/16, min 1024
Radeon 680M (RDNA 2 APU) 680M max/16, min 1536
Radeon 6x0M (RDNA 2 APU) 660M, 610M max/22, min 768
RX 6500/6400 (Entry RDNA 2) 6500 XT, 6400 max/22, min 512
Radeon VII (Vega 20) Radeon VII max/12, min 2048
Vega 64 (Discrete) Vega 64 max/14, min 1536
Vega 56 (Discrete) Vega 56 max/16, min 1280
Vega (APU) Vega 8, Vega 11, etc. max/28, min 384
R9 Fury/Nano (Fiji) Fury X, Fury, Nano max/16, min 1280
R9 (GCN) 390, 380, 290, 280 max/20, min 768
R7 (GCN) 370, 360, 270, 260 max/22, min 512
Radeon OEM Radeon 600/700 (rebadged Polaris) max/24, min 512
Radeon Graphics (APU) Generic APU fallback max/26, min 384

Also added: RX 590, RX 480/470/460, RX 6950/6750/6650 patterns

Intel Additions

Tier Models Workgroup Formula
Arc A5 (Desktop) A580 max/14, min 1536
Arc A3 (Desktop) A380, A310 max/18, min 768
Arc A7 Mobile A770M, A730M max/14, min 1536
Arc A5 Mobile A550M, A570M max/16, min 1024
Arc A3 Mobile A370M, A350M max/20, min 512
Iris Xe Max (Discrete) DG1-based max/20, min 512
Iris Pro Haswell/Broadwell max/26, min 256
Iris Plus Ice Lake, etc. max/26, min 320
UHD 700 Alder Lake+ max/26, min 320
UHD 600 Coffee Lake, Comet Lake max/28, min 256
HD Graphics Older generations max/30, min 192

Qualcomm (New Vendor)

Tier Models Workgroup Formula
Adreno X1 (Snapdragon X) X Elite, X Plus max/14, min 1536
Adreno 700 Snapdragon 8 Gen 1/2/3 max/16, min 1024
Adreno 600 Snapdragon 800 series max/20, min 512
Adreno 500 Older Snapdragon max/24, min 384

Validation

  • cargo check -p engine-gpu - passes
  • cargo clippy -p engine-gpu -- -D warnings - passes, no warnings

Risks and Mitigations

  • Risk: New patterns could potentially match unintended GPU names

    • Mitigation: Patterns are ordered from most specific to least specific; more conservative fallbacks are used for ambiguous cases
  • Risk: Workgroup formulas for new tiers may not be optimal

    • Mitigation: Values are based on relative GPU compute capabilities; users can still override via CLI flags if needed; fallback detection still triggers a warning asking users to report unrecognized GPUs

Testing Notes

The originally reported GPUs should now be detected as:

  • Radeon(TM) Vega 8 GraphicsAMD Vega (APU) tier
  • Radeon RX 560XAMD RX 500/400 (Polaris) tier (previously misdetected as RDNA 1)

Note

Medium Risk
Large heuristic-only change: mis-matched adapter name substrings could pick wrong workgroup limits (performance/stability), though users can still override via CLI and unknown GPUs still log fallback warnings.

Overview
Expands get_vendor_specific_dispatch in engine-gpu so more adapters get a named tier and tuned optimal_workgroups instead of generic fallbacks.

NVIDIA gains extra string matches (e.g. RTX 3060/3050, GTX 1050/1030) and new branches for GTX 900/700/legacy, MX mobile, GT entry-level, and datacenter names (A100/H100/L4).

AMD detection is reordered and broadened: Polaris is checked before RDNA 1 so names like RX 560X no longer hit 5600-style patterns; Instinct matching drops the loose "mi" substring in favor of explicit mi100/instinct mi-style patterns. New tiers cover RDNA APUs, RX 6500/6400, Vega (discrete and APU), GCN R9/R7, OEM rebadges, and generic Radeon Graphics APUs.

Intel splits Arc into finer desktop/mobile buckets and adds more integrated paths (Iris variants, UHD 700/600, HD). A new Qualcomm Adreno block (vendor ID + name patterns) covers Snapdragon X and Adreno 5/6/7 series on Windows ARM.

Reviewed by Cursor Bugbot for commit 911463c. Configure here.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 911463c. Configure here.

Comment thread crates/engine-gpu/src/lib.rs Outdated
Comment thread crates/engine-gpu/src/lib.rs Outdated

@n13 n13 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Improve Windows GPU support

Reviewed at head 3656655. Pulled the branch and ran the repo's validation checklist locally — all green:

  • cargo test -p engine-gpu → 5 passed (the new gpu_tiers tests)
  • cargo clippy -p engine-gpu --all-targets -- -D warnings → clean
  • cargo fmt --all -- --check → clean
  • cargo check --workspace --locked → clean

Verdict: Approve (with non-blocking follow-ups)

Solid net improvement that meets its stated goal. The big win is replacing the brittle String::contains heuristics with a table-driven, word-boundary regex approach that is much more maintainable and DRY (per-tier workgroup math is now one centralized expression instead of duplicated (max/N).max(M) everywhere).

The two Bugbot findings are resolved

Both inline comments were filed against the older contains() commit 911463c. The later refactor into gpu_tiers.rs fixes both, and there are now unit tests proving it:

  • Polaris vs RDNA1: \b…\b boundaries + RDNA checked before Polaris → RX 5500/5600/5700 map to RDNA1 and RX 560X/580/550 map to Polaris. Verified by test_amd_rdna_vs_polaris.
  • Arc mobile vs desktop: mobile tiers are checked first and \ba7[57]0\b won't match a770m. Verified by test_intel_arc_mobile_vs_desktop.

I spot-checked extra cases not covered by the tests: RTX A6000 → Professional (not swallowed by consumer tiers), the GT tier does not match GTX names, and RX 6500 XT → RDNA2 Entry. All correct.

Strong positive: fixes a real silent hang

The previous buffer-map path only set its flag on success and looped forever on a map error with no logging — a silent infinite hang, which violates the project's fail-early / always-log rule. The new AtomicU8 state (pending/success/error) + 30s poll timeout logs and bails out. Good catch.

Non-blocking follow-ups

  1. unmap() on a non-mapped buffer (crates/engine-gpu/src/lib.rs ~L664 and ~L675): on the error path the buffer was never mapped, and on the timeout path the map is still pending. wgpu 27 flags this as a validation error (noisy, not a hard panic). Suggest guarding unmap() so it only runs when the buffer is actually mapped.
  2. Persistent device-loss spins: returning NotFound { hash_count: 0 } is the right move for a transient failure, but on a genuinely lost device every batch will log-error and burn cycles at 0 H/s indefinitely. Consider tearing down / removing the dead context so the engine fails loudly, consistent with the repo's fail-early guidance.
  3. Init "timeout" is post-hoc: init_start.elapsed() is checked after request_device().await returns (the comment acknowledges this), so a driver that truly hangs inside the await still blocks. It only guards slow-but-completed init. Fine as-is; if true hang-protection is the goal it needs to race the future against a timer / off-thread init.

Nits

  • Qualcomm tiers use greedy adreno.*7 / adreno.*6, so e.g. "Adreno 627" buckets as 700-series. Tuning-only and Qualcomm-scoped — low impact.
  • Duplicate "AMD RX 5000 (RDNA 1)" tier (\b5[56]00\b and rx 5\d{3}, identical params) could be merged into a single pattern.
  • regex + once_cell are added as direct deps; once_cell::sync::Lazy could be std::sync::LazyLock (Rust ≥ 1.80) if you want to keep the dependency surface minimal.

None of the above blocks merge.

@illuzen illuzen merged commit 7296c00 into main Jun 30, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants