Skip to content

graph: backend: dnnl: executables: call internal sdpa iface#4839

Open
TaoLv wants to merge 3 commits intomainfrom
lvtao/main/sdpa-iface
Open

graph: backend: dnnl: executables: call internal sdpa iface#4839
TaoLv wants to merge 3 commits intomainfrom
lvtao/main/sdpa-iface

Conversation

@TaoLv
Copy link
Contributor

@TaoLv TaoLv commented Mar 17, 2026

  • Call internal sdpa iface in graph backend.
  • It simplifies the integration code a bit and importantly, it enables sdpa verbose log under graph API.
  • Add sdpa_test_iface.hpp in common so it can be included in both graph backend and internal gtests.

See sdpa verbose below after the change:

$ ONEDNN_VERBOSE=1 ./tests/benchdnn/benchdnn --graph --engine=gpu --case=complex_fusion/mha/sdpa-plain-simplified-f16-f32.json
onednn_verbose,v1,info,oneDNN v3.12.0 (commit d23923d)
onednn_verbose,v1,info,cpu,runtime:OpenMP,nthr:224
onednn_verbose,v1,info,cpu,isa:Intel AVX-512 with float16, Intel DL Boost and bfloat16 support and Intel AMX with bfloat16 and 8-bit integer support
onednn_verbose,v1,info,gpu,runtime:OpenCL
onednn_verbose,v1,info,gpu,engine,opencl device count:4
onednn_verbose,v1,info,gpu,engine,0,name:Intel(R) Data Center GPU Max 1100,driver_version:25.18.33578,binary_kernels:enabled
onednn_verbose,v1,info,gpu,engine,1,name:Intel(R) Data Center GPU Max 1100,driver_version:25.18.33578,binary_kernels:enabled
onednn_verbose,v1,info,gpu,engine,2,name:Intel(R) Data Center GPU Max 1100,driver_version:25.18.33578,binary_kernels:enabled
onednn_verbose,v1,info,gpu,engine,3,name:Intel(R) Data Center GPU Max 1100,driver_version:25.18.33578,binary_kernels:enabled
onednn_verbose,v1,info,graph,backend,0:dnnl_backend
onednn_verbose,v1,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,v1,graph,info,template:operation,engine,partition_id,partition_kind,op_names,data_formats,logical_tensors,fpmath_mode,implementation,backend,exec_time
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abdc::f0 dst:f32::blocked:abdc::f0,,,1x16x64x384,7.03687
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x16x384x64,5.90308
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x1x1x1,0.00292969
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x16x384x384,7.69385
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x1x384x384,7.81201
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x16x384x384,7.85107
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x16x384x384,7.81689
onednn_verbose,v1,primitive,exec,cpu,reorder,jit:uni,undef,src:f32::blocked:abdc::f0 dst:f32::blocked:abcd::f0,,,1x16x384x64,4.12207
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x16x384x384,7.68701
onednn_verbose,v1,primitive,exec,gpu:0,matmul,jit:gemm:any,undef,src:f32::blocked:abcd::f0 wei:f32::blocked:abdc::f0 dst:f32::blocked:abcd::f0,,,1x16x384x64:1x16x64x384,2.3479
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x1x1x1,0.000976562
onednn_verbose,v1,primitive,exec,gpu:0,binary,ocl:xe,undef,src:f32::blocked:abcd::f0 src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,alg:binary_div,1x16x384x384:1x1x1x1,0.13501
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x1x384x384,7.95605
onednn_verbose,v1,primitive,exec,gpu:0,binary,ocl:xe,undef,src:f32::blocked:abcd::f0 src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,alg:binary_add,1x16x384x384:1x1x384x384,0.10498
onednn_verbose,v1,primitive,exec,gpu:0,softmax,ocl:reusable,forward_training,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,alg:softmax_accurate_inf_as_zero axis:3,1x16x384x384,0.0480957
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f16::blocked:abcd::f0,,,1x16x384x384,7.1499
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f16::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x16x384x384,3.37402
onednn_verbose,v1,primitive,exec,gpu:0,matmul,jit:gemm:any,undef,src:f32::blocked:abcd::f0 wei:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x16x384x384:1x16x384x64,0.15918
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f16::blocked:abcd::f0,,,1x16x384x64,5.99219
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abdc::f0 dst:f16::blocked:abdc::f0,,,1x16x64x384,8.19507
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f16::blocked:abcd::f0,,,1x1x1x1,0.00390625
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f16::blocked:abcd::f0,,,1x1x384x384,7.16113
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f16::blocked:abcd::f0,,,1x16x384x64,8.4751
onednn_verbose,v1,primitive,exec,gpu:0,sdpa,ocl:micro:reusable,undef,query:f16::blocked:abcd::f0 key:f16::blocked:abdc::f0 val:f16::blocked:abcd::f0 msk:f16::blocked:abcd::f0 dst:f16::blocked:abcd::f0,attr-scratchpad:user,alg:softmax_accurate_inf_as_zero msk:2d scl:div:f16:device,1x16x384x64:1x16x64x384:1x16x384x64,0.169922
onednn_verbose,v1,graph,exec,gpu,100002,sdp,matmul_qk;scale_div;mask_add;softmax;matmul_v,,in0_f16:1:strided:undef:1x16x384x64:393216s24576s64s1 in1_f16:2:strided:undef:1x16x384x64:393216s24576s64s1 in2_f16:4:strided:constant:1:1 in3_f16:5:strided:undef:1x1x384x384:147456s147456s384s1 in4_f16:3:strided:undef:1x16x384x64:393216s24576s64s1 out0_f16:6:strided:undef:1x16x384x64:393216s24576s64s1,fpm:strict,sdp_primitive_kernel_t,dnnl_backend,0.24292
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x16x384x64,6.11304
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f16::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x16x384x64,8.68799
0:PASSED (2264 ms) __REPRO: --graph --engine=gpu --case=complex_fusion/mha/sdpa-plain-simplified-f16-f32.json
tests:1 passed:1 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 2.28s; create_pd: 0.12s (5%); create_prim: 0.84s (37%); fill: 0.00s (0%); execute: 0.01s (0%); compute_ref: 0.00s (0%); compare: 0.00s (0%);

@TaoLv TaoLv requested review from a team as code owners March 17, 2026 06:45
@github-actions github-actions bot added component:graph-api Codeowner: @oneapi-src/onednn-graph component:tests Codeowner: @oneapi-src/onednn-arch component:common labels Mar 17, 2026
@TaoLv
Copy link
Contributor Author

TaoLv commented Mar 17, 2026

make test
disable benchdnn_all
enable benchdnn_graph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:common component:graph-api Codeowner: @oneapi-src/onednn-graph component:tests Codeowner: @oneapi-src/onednn-arch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant