Improve Windows gpu support#70
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 911463c. Configure here.
n13
left a comment
There was a problem hiding this comment.
Review: Improve Windows GPU support
Reviewed at head 3656655. Pulled the branch and ran the repo's validation checklist locally — all green:
cargo test -p engine-gpu→ 5 passed (the newgpu_tierstests)cargo clippy -p engine-gpu --all-targets -- -D warnings→ cleancargo fmt --all -- --check→ cleancargo check --workspace --locked→ clean
Verdict: Approve (with non-blocking follow-ups)
Solid net improvement that meets its stated goal. The big win is replacing the brittle String::contains heuristics with a table-driven, word-boundary regex approach that is much more maintainable and DRY (per-tier workgroup math is now one centralized expression instead of duplicated (max/N).max(M) everywhere).
The two Bugbot findings are resolved
Both inline comments were filed against the older contains() commit 911463c. The later refactor into gpu_tiers.rs fixes both, and there are now unit tests proving it:
- Polaris vs RDNA1:
\b…\bboundaries + RDNA checked before Polaris → RX 5500/5600/5700 map to RDNA1 and RX 560X/580/550 map to Polaris. Verified bytest_amd_rdna_vs_polaris. - Arc mobile vs desktop: mobile tiers are checked first and
\ba7[57]0\bwon't matcha770m. Verified bytest_intel_arc_mobile_vs_desktop.
I spot-checked extra cases not covered by the tests: RTX A6000 → Professional (not swallowed by consumer tiers), the GT tier does not match GTX names, and RX 6500 XT → RDNA2 Entry. All correct.
Strong positive: fixes a real silent hang
The previous buffer-map path only set its flag on success and looped forever on a map error with no logging — a silent infinite hang, which violates the project's fail-early / always-log rule. The new AtomicU8 state (pending/success/error) + 30s poll timeout logs and bails out. Good catch.
Non-blocking follow-ups
unmap()on a non-mapped buffer (crates/engine-gpu/src/lib.rs~L664 and ~L675): on the error path the buffer was never mapped, and on the timeout path the map is still pending. wgpu 27 flags this as a validation error (noisy, not a hard panic). Suggest guardingunmap()so it only runs when the buffer is actually mapped.- Persistent device-loss spins: returning
NotFound { hash_count: 0 }is the right move for a transient failure, but on a genuinely lost device every batch will log-error and burn cycles at 0 H/s indefinitely. Consider tearing down / removing the dead context so the engine fails loudly, consistent with the repo's fail-early guidance. - Init "timeout" is post-hoc:
init_start.elapsed()is checked afterrequest_device().awaitreturns (the comment acknowledges this), so a driver that truly hangs inside the await still blocks. It only guards slow-but-completed init. Fine as-is; if true hang-protection is the goal it needs to race the future against a timer / off-thread init.
Nits
- Qualcomm tiers use greedy
adreno.*7/adreno.*6, so e.g. "Adreno 627" buckets as 700-series. Tuning-only and Qualcomm-scoped — low impact. - Duplicate
"AMD RX 5000 (RDNA 1)"tier (\b5[56]00\bandrx 5\d{3}, identical params) could be merged into a single pattern. regex+once_cellare added as direct deps;once_cell::sync::Lazycould bestd::sync::LazyLock(Rust ≥ 1.80) if you want to keep the dependency surface minimal.
None of the above blocks merge.

Add comprehensive GPU detection support for AMD, NVIDIA, Intel, and Qualcomm
Overview
This PR significantly expands GPU detection and tier configuration to support a much wider range of graphics hardware. It addresses user reports of unrecognized GPUs (specifically Vega 8 APU and RX 560X) and proactively adds support for many other common GPUs that were previously falling back to conservative default settings.
What Changed
Bug Fixes
"mi"pattern for AMD Instinct was matching any GPU name containing "mi" (e.g., "Family", "Graphics"). Changed to specific patterns:"instinct mi","mi100","mi200","mi250","mi300""5600"pattern (RDNA 1) instead of Polaris. Reordered detection to check Polaris patterns before RDNA 1NVIDIA Additions
max/18, min 768max/20, min 512max/24, min 384max/24, min 384max/28, min 256max/10, min 2560Also added missing patterns: RTX 3060, 3050, GTX 1050, 1030
AMD Additions
max/12, min 2048max/16, min 1024max/16, min 1536max/22, min 768max/22, min 512max/12, min 2048max/14, min 1536max/16, min 1280max/28, min 384max/16, min 1280max/20, min 768max/22, min 512max/24, min 512max/26, min 384Also added: RX 590, RX 480/470/460, RX 6950/6750/6650 patterns
Intel Additions
max/14, min 1536max/18, min 768max/14, min 1536max/16, min 1024max/20, min 512max/20, min 512max/26, min 256max/26, min 320max/26, min 320max/28, min 256max/30, min 192Qualcomm (New Vendor)
max/14, min 1536max/16, min 1024max/20, min 512max/24, min 384Validation
cargo check -p engine-gpu- passescargo clippy -p engine-gpu -- -D warnings- passes, no warningsRisks and Mitigations
Risk: New patterns could potentially match unintended GPU names
Risk: Workgroup formulas for new tiers may not be optimal
Testing Notes
The originally reported GPUs should now be detected as:
Radeon(TM) Vega 8 Graphics→ AMD Vega (APU) tierRadeon RX 560X→ AMD RX 500/400 (Polaris) tier (previously misdetected as RDNA 1)Note
Medium Risk
Large heuristic-only change: mis-matched adapter name substrings could pick wrong workgroup limits (performance/stability), though users can still override via CLI and unknown GPUs still log fallback warnings.
Overview
Expands
get_vendor_specific_dispatchinengine-gpuso more adapters get a named tier and tunedoptimal_workgroupsinstead of generic fallbacks.NVIDIA gains extra string matches (e.g. RTX 3060/3050, GTX 1050/1030) and new branches for GTX 900/700/legacy, MX mobile, GT entry-level, and datacenter names (A100/H100/L4).
AMD detection is reordered and broadened: Polaris is checked before RDNA 1 so names like RX 560X no longer hit 5600-style patterns; Instinct matching drops the loose
"mi"substring in favor of explicitmi100/instinct mi-style patterns. New tiers cover RDNA APUs, RX 6500/6400, Vega (discrete and APU), GCN R9/R7, OEM rebadges, and generic Radeon Graphics APUs.Intel splits Arc into finer desktop/mobile buckets and adds more integrated paths (Iris variants, UHD 700/600, HD). A new Qualcomm Adreno block (vendor ID + name patterns) covers Snapdragon X and Adreno 5/6/7 series on Windows ARM.
Reviewed by Cursor Bugbot for commit 911463c. Configure here.