v0.5
New in CUTLASS SYCL 0.5
Major Architecture Changes
- Xe Rearchitecture (#477): Complete redesign of Xe CuTe atoms with new architecture
- New MMA atoms for improved performance
- Enhanced 2D copy atoms (loads, stores, prefetch with VNNI/transpose support)
- New 2D copy helpers (low-level
make_block_2d_copyand high-levelmake_block_2d_copy_{A,B,C}) - Generic and optimized reorder atoms for {int4, uint4, int8, uint8, e2m1, e4m3, e5m2} -> {half, bfloat16}
- Requires IGC version v2.18.5 or later
New Features
-
G++ Host Compiler Support (#490): Support for G++ 13 as host compiler
-
Migrated
syclcompatto this repository ascutlasscompatfor better compatibility- Fixed compilation issues when using G++ instead of clang++
- Added new CI workflow for testing G++ host compiler builds
- Enhanced build system to support
-DDPCPP_HOST_COMPILER=g++option
-
Grouped GEMM for Mixed Dtype (#457): Extended grouped GEMM support to mixed precision operations
- Added support for BF16 + S8 mixed dtype grouped GEMM
- Added support for FP16 + U4 mixed dtype grouped GEMM
- New examples:
10_bmg_grouped_gemm_bf16_f16_s8.cppand10_bmg_grouped_gemm_f16_u4.cpp
See the CHANGELOG-SYCL for details of all past releases and updates.