[QNN] MatMulAddFusion and Reshape Related Fusion #22494

centwang · 2024-10-18T06:18:21Z

QNN EP relies on Gemm Op to use FullyConnected QNN Op to run the model, which is much faster than MatMul+Add. This PR fuses MatMul+Add when MatMul's 2nd input is 2D initializer, no matter the rank of the 1st input. If the 1st input is not 2D tensor, Reshape nodes will be added.

On QNN EP, the memory allocation is for each activation tensor, so Reshape/Squeeze/Unsqueeze is not no-op. This PR also add some fusion trying to remove redundant reshape nodes. For some QNN AI Hub models on specific device, without removing the Reshape nodes, it cannot finalize the graph when execution, but works well after removing.

Run below models with and without the change:
swin_tiny: Average inference time cost: 12.8077 ms | Average inference time cost: 23.956 ms
swin_base: Average inference time cost: 27.0639 ms | Average inference time cost: 57.6608 ms
convnext_tiny: Average inference time cost: 3.42956 ms | Average inference time cost: 16.1848 ms
openai_clip_CLIPTextEncoder: Average inference time cost: 5.96104 ms | Average inference time cost: 220.406 ms
openai_clip_CLIPImageEncoder: Average inference time cost: 41.8206 ms | Average inference time cost: 919.712 ms

NOTE that current change skips the Attention pattern because it not it will cause AttentionFusion to work. Ideally we need to adjust the AttentionFusion to support the Gemm pattern, but it requires big changes. Maybe we can do this in the future, say, when we want to run transformer models on QNN, since we don't have Attention QNN, we still want to fuse MatMul+Add in the Attention pattern to use FullyConnected in QNN side.

onnxruntime/core/optimizer/matmul_add_fusion.cc

adrianlizarraga · 2024-11-06T17:40:25Z

@centwang Thank you for the PR. It looks like many unit tests and pipelines are still not passing. Could you please address those issues first?

onnxruntime/core/optimizer/reshape_fusion.cc

onnxruntime/test/optimizer/graph_transform_test.cc

onnxruntime/core/optimizer/matmul_add_fusion.cc

onnxruntime/test/optimizer/graph_transform_test.cc

onnxruntime/test/providers/qnn/gemm_op_test.cc

onnxruntime/core/optimizer/matmul_add_fusion.cc

onnxruntime/core/optimizer/reshape_fusion.cc

onnxruntime/core/providers/qnn/builder/qnn_node_group/reshape_gemm_fusion.cc

onnxruntime/test/providers/qnn/qnn_basic_test.cc

onnxruntime/core/providers/qnn/builder/qnn_node_group/reshape_gemm_fusion.cc

adrianlizarraga · 2025-02-12T04:02:51Z

@HectorSVC could you please take a look at this PR?

adrianlizarraga · 2025-02-18T21:18:22Z

Hi @skottmckay, I think there are some unresolved comments. Would you be able to take another look?

centwang force-pushed the weicwang/matmul_add_fusion branch from 7d3d515 to 0a05430 Compare October 21, 2024 03:34

snnn previously approved these changes Oct 21, 2024

View reviewed changes

centwang requested review from adrianlizarraga and skottmckay October 22, 2024 02:08

skottmckay reviewed Oct 22, 2024

View reviewed changes

onnxruntime/core/optimizer/matmul_add_fusion.cc Outdated Show resolved Hide resolved

onnxruntime/core/optimizer/matmul_add_fusion.cc Outdated Show resolved Hide resolved

centwang dismissed snnn’s stale review via ca59611 October 29, 2024 11:51

centwang force-pushed the weicwang/matmul_add_fusion branch from a8388b7 to ca59611 Compare October 29, 2024 11:51

centwang changed the title ~~Add More Cases to MatMulAddFusion~~ [QNN] MatMulAddFusion and Reshape Related Fusion Oct 29, 2024

centwang requested review from jywu-msft and cloudhan October 29, 2024 11:52

skottmckay reviewed Nov 8, 2024

View reviewed changes

centwang added 6 commits November 22, 2024 11:45

matmul add fusion

9405087

fix ut failure

c685f39

fix compile error

e5146ce

fix attn pattern

787b4fb

reshape related fusion

8606668

resolve comments

47d4755

centwang force-pushed the weicwang/matmul_add_fusion branch from ca59611 to 47d4755 Compare November 25, 2024 05:49

centwang added 2 commits November 25, 2024 14:16

fix build error

d063df5

fix test failure

8d75e0a

skottmckay reviewed Nov 28, 2024

View reviewed changes

centwang added 7 commits December 5, 2024 10:37

Merge branch 'main' into weicwang/matmul_add_fusion

dff068b

resolve comments

81d3fe9

Merge branch 'main' into weicwang/matmul_add_fusion

e1c77da

fix merge error

9b09618

use constant

91258aa

Merge branch 'main' into weicwang/matmul_add_fusion

582eb3f

enforce constant initializer for qnn

4cf44f6

skottmckay previously approved these changes Jan 17, 2025

View reviewed changes

adrianlizarraga reviewed Jan 27, 2025

View reviewed changes

onnxruntime/core/providers/qnn/builder/qnn_node_group/reshape_gemm_fusion.cc Outdated Show resolved Hide resolved

Merge main and fix conflicts

d8260b8

adrianlizarraga dismissed skottmckay’s stale review via d8260b8 February 7, 2025 00:44

adrianlizarraga added 3 commits February 6, 2025 16:52

lintrunner fix

1e3dda9

signed comparison fix for qnn built as a shared lib

0ab6369

Merge main and fix conflicts

dd08cdc

adrianlizarraga previously approved these changes Feb 12, 2025

View reviewed changes

adrianlizarraga requested review from HectorSVC and removed request for cloudhan February 12, 2025 04:02

adrianlizarraga added 2 commits February 13, 2025 17:32

Merge in main branch and fix conflicts

47259bc

Add include to fix optional GitHub actions linter suggestion

0f9700d

adrianlizarraga dismissed their stale review via 0f9700d February 14, 2025 01:33

adrianlizarraga approved these changes Feb 14, 2025

View reviewed changes

HectorSVC approved these changes Feb 14, 2025

View reviewed changes

adrianlizarraga added the ep:QNN issues related to QNN exeution provider label Feb 14, 2025

jywu-msft merged commit 03c6c2e into main Feb 18, 2025
96 of 98 checks passed

jywu-msft deleted the weicwang/matmul_add_fusion branch February 18, 2025 21:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QNN] MatMulAddFusion and Reshape Related Fusion #22494

[QNN] MatMulAddFusion and Reshape Related Fusion #22494

centwang commented Oct 18, 2024 •

edited

Loading

adrianlizarraga commented Nov 6, 2024

adrianlizarraga commented Feb 12, 2025

adrianlizarraga commented Feb 18, 2025

[QNN] MatMulAddFusion and Reshape Related Fusion #22494

[QNN] MatMulAddFusion and Reshape Related Fusion #22494

Conversation

centwang commented Oct 18, 2024 • edited Loading

adrianlizarraga commented Nov 6, 2024

adrianlizarraga commented Feb 12, 2025

adrianlizarraga commented Feb 18, 2025

centwang commented Oct 18, 2024 •

edited

Loading