- GTX 9xx
- Titan X
- Quadro Mxxxx
- Tegra X1
- Nintendo Switch
- Shield TV
- MX110, MX130
- CUDA Tuning Guide, [backup]
- Architecture Whitepaper, [backup]
- Dissecting GPU Memory Hierarchy through Microbenchmarking, [backup]
- Examining the Nintendo Switch (Tegra X1) Video Engine
- Maxwell: Nvidia’s Silver 28nm Hammer
- Nintendo Switch’s iGPU: Maxwell Nerfed Edition
- Compute Capability 5.x
- Tegra X1 Whitepaper, [backup]
- Vulkan features for GTX 980
-
FP16 data storage.
-
Unified L1/Texture cache.
-
Native shared memory atomic operations for 32-bit integer arithmetic, along with native 32 or 64-bit compare-and-swap (CAS).
-
Async transfer queue.
-
instructions IMAD and IMUL have a long latency because they are emulated. ref
- ops/clock per SM: [7]
- 128 fp32 FMA
- 4 fp64 FMA
- 32 fp32 rcp, rsqrt, log2, exp2, sin, cos
- 32 i32 add/sub
- 64 i32 shift
- 64 cmp, min, max
- 64 i32 bit reverse
- 64 i32 bit extract, insert
- 128 i32 and, or, xor
- 32 MSB
- 32 pop count
- 32 i8/i16 to i32
- 4 to/from i64/fp64
- 32 type conversions