graph: backend: dnnl: executables: call internal sdpa iface#4839
Open
graph: backend: dnnl: executables: call internal sdpa iface#4839
Conversation
Contributor
Author
|
make test |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See sdpa verbose below after the change:
$ ONEDNN_VERBOSE=1 ./tests/benchdnn/benchdnn --graph --engine=gpu --case=complex_fusion/mha/sdpa-plain-simplified-f16-f32.json
onednn_verbose,v1,info,oneDNN v3.12.0 (commit d23923d)
onednn_verbose,v1,info,cpu,runtime:OpenMP,nthr:224
onednn_verbose,v1,info,cpu,isa:Intel AVX-512 with float16, Intel DL Boost and bfloat16 support and Intel AMX with bfloat16 and 8-bit integer support
onednn_verbose,v1,info,gpu,runtime:OpenCL
onednn_verbose,v1,info,gpu,engine,opencl device count:4
onednn_verbose,v1,info,gpu,engine,0,name:Intel(R) Data Center GPU Max 1100,driver_version:25.18.33578,binary_kernels:enabled
onednn_verbose,v1,info,gpu,engine,1,name:Intel(R) Data Center GPU Max 1100,driver_version:25.18.33578,binary_kernels:enabled
onednn_verbose,v1,info,gpu,engine,2,name:Intel(R) Data Center GPU Max 1100,driver_version:25.18.33578,binary_kernels:enabled
onednn_verbose,v1,info,gpu,engine,3,name:Intel(R) Data Center GPU Max 1100,driver_version:25.18.33578,binary_kernels:enabled
onednn_verbose,v1,info,graph,backend,0:dnnl_backend
onednn_verbose,v1,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,v1,graph,info,template:operation,engine,partition_id,partition_kind,op_names,data_formats,logical_tensors,fpmath_mode,implementation,backend,exec_time
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abdc::f0 dst:f32::blocked:abdc::f0,,,1x16x64x384,7.03687
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x16x384x64,5.90308
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x1x1x1,0.00292969
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x16x384x384,7.69385
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x1x384x384,7.81201
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x16x384x384,7.85107
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x16x384x384,7.81689
onednn_verbose,v1,primitive,exec,cpu,reorder,jit:uni,undef,src:f32::blocked:abdc::f0 dst:f32::blocked:abcd::f0,,,1x16x384x64,4.12207
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x16x384x384,7.68701
onednn_verbose,v1,primitive,exec,gpu:0,matmul,jit:gemm:any,undef,src:f32::blocked:abcd::f0 wei:f32::blocked:abdc::f0 dst:f32::blocked:abcd::f0,,,1x16x384x64:1x16x64x384,2.3479
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x1x1x1,0.000976562
onednn_verbose,v1,primitive,exec,gpu:0,binary,ocl:xe,undef,src:f32::blocked:abcd::f0 src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,alg:binary_div,1x16x384x384:1x1x1x1,0.13501
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x1x384x384,7.95605
onednn_verbose,v1,primitive,exec,gpu:0,binary,ocl:xe,undef,src:f32::blocked:abcd::f0 src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,alg:binary_add,1x16x384x384:1x1x384x384,0.10498
onednn_verbose,v1,primitive,exec,gpu:0,softmax,ocl:reusable,forward_training,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,alg:softmax_accurate_inf_as_zero axis:3,1x16x384x384,0.0480957
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f16::blocked:abcd::f0,,,1x16x384x384,7.1499
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f16::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x16x384x384,3.37402
onednn_verbose,v1,primitive,exec,gpu:0,matmul,jit:gemm:any,undef,src:f32::blocked:abcd::f0 wei:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x16x384x384:1x16x384x64,0.15918
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f16::blocked:abcd::f0,,,1x16x384x64,5.99219
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abdc::f0 dst:f16::blocked:abdc::f0,,,1x16x64x384,8.19507
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f16::blocked:abcd::f0,,,1x1x1x1,0.00390625
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f16::blocked:abcd::f0,,,1x1x384x384,7.16113
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f16::blocked:abcd::f0,,,1x16x384x64,8.4751
onednn_verbose,v1,primitive,exec,gpu:0,sdpa,ocl:micro:reusable,undef,query:f16::blocked:abcd::f0 key:f16::blocked:abdc::f0 val:f16::blocked:abcd::f0 msk:f16::blocked:abcd::f0 dst:f16::blocked:abcd::f0,attr-scratchpad:user,alg:softmax_accurate_inf_as_zero msk:2d scl:div:f16:device,1x16x384x64:1x16x64x384:1x16x384x64,0.169922
onednn_verbose,v1,graph,exec,gpu,100002,sdp,matmul_qk;scale_div;mask_add;softmax;matmul_v,,in0_f16:1:strided:undef:1x16x384x64:393216s24576s64s1 in1_f16:2:strided:undef:1x16x384x64:393216s24576s64s1 in2_f16:4:strided:constant:1:1 in3_f16:5:strided:undef:1x1x384x384:147456s147456s384s1 in4_f16:3:strided:undef:1x16x384x64:393216s24576s64s1 out0_f16:6:strided:undef:1x16x384x64:393216s24576s64s1,fpm:strict,sdp_primitive_kernel_t,dnnl_backend,0.24292
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f32::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x16x384x64,6.11304
onednn_verbose,v1,primitive,exec,cpu,reorder,jit_direct_copy:uni,undef,src:f16::blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,1x16x384x64,8.68799
0:PASSED (2264 ms) __REPRO: --graph --engine=gpu --case=complex_fusion/mha/sdpa-plain-simplified-f16-f32.json
tests:1 passed:1 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 2.28s; create_pd: 0.12s (5%); create_prim: 0.84s (37%); fill: 0.00s (0%); execute: 0.01s (0%); compute_ref: 0.00s (0%); compare: 0.00s (0%);