Skip to content

v0.5

Choose a tag to compare

@rolandschulz rolandschulz released this 26 Sep 22:42
· 103 commits to main since this release
b0cb10e

New in CUTLASS SYCL 0.5

Major Architecture Changes

  • Xe Rearchitecture (#477): Complete redesign of Xe CuTe atoms with new architecture
    • New MMA atoms for improved performance
    • Enhanced 2D copy atoms (loads, stores, prefetch with VNNI/transpose support)
    • New 2D copy helpers (low-level make_block_2d_copy and high-level make_block_2d_copy_{A,B,C})
    • Generic and optimized reorder atoms for {int4, uint4, int8, uint8, e2m1, e4m3, e5m2} -> {half, bfloat16}
    • Requires IGC version v2.18.5 or later

New Features

  • G++ Host Compiler Support (#490): Support for G++ 13 as host compiler

  • Migrated syclcompat to this repository as cutlasscompat for better compatibility

    • Fixed compilation issues when using G++ instead of clang++
    • Added new CI workflow for testing G++ host compiler builds
    • Enhanced build system to support -DDPCPP_HOST_COMPILER=g++ option
  • Grouped GEMM for Mixed Dtype (#457): Extended grouped GEMM support to mixed precision operations

    • Added support for BF16 + S8 mixed dtype grouped GEMM
    • Added support for FP16 + U4 mixed dtype grouped GEMM
    • New examples: 10_bmg_grouped_gemm_bf16_f16_s8.cpp and 10_bmg_grouped_gemm_f16_u4.cpp

    See the CHANGELOG-SYCL for details of all past releases and updates.