[midend][tests] Add broadcast BatchMatMul optimization pass and tests. #187

EllisLambda · 2023-08-19T08:20:32Z

Add "-batchmatmul-optimize" option in buddy-opt, the Matmul part is similar to MatMulBroadcast.mlir in buddy-benchmark. The batch-level loop utilizes affine.parallel. In the case of multiple batches, OpenMP can be utilized for multi-threaded acceleration. The A[batch_idx, 0, 0] which broadcasts in the Matmul will use affine.prefetch to prefetch in the outer loop. It could make the Op slightly faster. There's a bug, if (A_elementType.isIntOrFloat()) cannot work properly. Even if the input type is floating-point, the integer branch will still execute. So I modified to if (A_elementType.isIntOrFloat() && 0) to skip the branch. So far the integer matrix is unavailable.

xlinsist · 2023-08-19T15:34:12Z

midend/lib/Conversion/MatMulOptimization/BatchMatMulOptimize.cpp

+//===- BatchMatMulOptimize.cpp
+//-------------------------------------------------===//


Please reformat the title comment for neatness.

xlinsist · 2023-08-19T15:40:38Z

midend/lib/Conversion/MatMulOptimization/BatchMatMulOptimize.cpp

+    Value N = rewriter.create<memref::DimOp>(loc, B, 2);     // b_col
+    Value K = rewriter.create<memref::DimOp>(loc, B, 1);     // b_row
+
+    // build loop body


Please make sure the comments are decent(i.e. capitalized and expressed formally). You should also double-check the other comments in this PR.

examples/MLIRLinalg/linalg-batch-matmul.mlir

examples/MLIRLinalg/makefile

xlinsist · 2023-08-19T16:15:43Z

Thanks! @EllisLambda I have left some suggestions to make it better, and I think these are enough for this PR. More verifications in real scenarios(e.g. simultaneously process linalg.batch_matmul of multiple sizes) may be needed in the future.

@zhanghb97 you may also see buddy-compiler/buddy-benchmark#73.

linuxlonelyeagle · 2023-08-20T15:08:51Z

@EllisLambda Why add batch_matmul pass to opt? I think it is inappropriate to add a certain op pass in opt.

xlinsist · 2023-08-20T15:34:54Z

@EllisLambda Why add batch_matmul pass to opt? I think it is inappropriate to add a certain op pass in opt.

batch_matmul is a typical opertion which is widely used in transformer models, so I think it is reasonable to provide a tailor-made pass for it.

Nevertheless, it would be better if matmul-optimize pass can incorporate the lowering optimizations of both linalg.matmul and linalg.batch_matmul. WDYT @EllisLambda ?

linuxlonelyeagle · 2023-08-20T15:44:10Z

In fact, there is some truth to this, but I think this PR should explain what optimizations have been made? Where has the optimization been made?

EllisLambda · 2023-08-20T16:34:24Z

In fact, there is some truth to this, but I think this PR should explain what optimizations have been made? Where has the optimization been made?

@EllisLambda Why add batch_matmul pass to opt? I think it is inappropriate to add a certain op pass in opt.

batch_matmul is a typical opertion which is widely used in transformer models, so I think it is reasonable to provide a tailor-made pass for it.

Nevertheless, it would be better if matmul-optimize pass can incorporate the lowering optimizations of both linalg.matmul and linalg.batch_matmul. WDYT @EllisLambda ?

The optimization styles between Matmul and BatchMatmul are quite distinct. On different hardware, the performance of these two operations may differ. Separating might be better?

EllisLambda · 2023-08-20T16:35:46Z

In fact, there is some truth to this, but I think this PR should explain what optimizations have been made? Where has the optimization been made?

Ok, I will add the explanation.

xlinsist · 2023-08-26T05:12:27Z

Separating might be better?

Yes, we decide to develop batchmatmul-optimize pass specifically for linalg.batch_matmul right now.

Yumin @EllisLambda , about the "bug" you mention in your description, we can unify the support of floating point and integer with another interface. If not, starting a new discussion about integer operations would be more appropriate so that the relevant comments in your description could be removed, as this PR just needs to take floating point operations into account.

EllisLambda · 2023-08-29T03:39:37Z

Separating might be better?

Yes, we decide to develop batchmatmul-optimize pass specifically for linalg.batch_matmul right now.

Yumin @EllisLambda , about the "bug" you mention in your description, we can unify the support of floating point and integer with another interface. If not, starting a new discussion about integer operations would be more appropriate so that the relevant comments in your description could be removed, as this PR just needs to take floating point operations into account.

Ok! I see.

EllisLambda · 2023-09-08T15:58:17Z

@xlinsist I have amended my PR.

… optimization pass and tests.

xlinsist requested changes Aug 19, 2023

View reviewed changes

EllisLambda requested a review from xlinsist August 20, 2023 16:22

xlinsist mentioned this pull request Aug 26, 2023

[OpOptimization] Add BatchMatMul benchmark and [OpOptimization] Further optimize BatchMatMulBroadcast and add OpenMP tests buddy-compiler/buddy-benchmark#73

Open

xlinsist mentioned this pull request Sep 2, 2023

OpenMP need a different linking way buddy-compiler/buddy-benchmark#88

Closed

EllisLambda force-pushed the main branch from b73ceb5 to 963431a Compare September 8, 2023 15:45

[midend][tests] Add batch level parallelism outer product BatchMatMul…

49e6482

… optimization pass and tests.

EllisLambda force-pushed the main branch from 963431a to 49e6482 Compare September 8, 2023 18:26

xlinsist merged commit a8f1653 into buddy-compiler:main Sep 9, 2023

xlinsist mentioned this pull request Sep 11, 2023

[build system]fix build bug. #200

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[midend][tests] Add broadcast BatchMatMul optimization pass and tests. #187

[midend][tests] Add broadcast BatchMatMul optimization pass and tests. #187

EllisLambda commented Aug 19, 2023 •

edited

Loading

xlinsist Aug 19, 2023

xlinsist Aug 19, 2023

xlinsist commented Aug 19, 2023

linuxlonelyeagle commented Aug 20, 2023

xlinsist commented Aug 20, 2023

linuxlonelyeagle commented Aug 20, 2023

EllisLambda commented Aug 20, 2023

EllisLambda commented Aug 20, 2023

xlinsist commented Aug 26, 2023 •

edited

Loading

EllisLambda commented Aug 29, 2023

EllisLambda commented Sep 8, 2023

		//===- BatchMatMulOptimize.cpp
		//-------------------------------------------------===//

[midend][tests] Add broadcast BatchMatMul optimization pass and tests. #187

[midend][tests] Add broadcast BatchMatMul optimization pass and tests. #187

Conversation

EllisLambda commented Aug 19, 2023 • edited Loading

xlinsist Aug 19, 2023

Choose a reason for hiding this comment

xlinsist Aug 19, 2023

Choose a reason for hiding this comment

xlinsist commented Aug 19, 2023

linuxlonelyeagle commented Aug 20, 2023

xlinsist commented Aug 20, 2023

linuxlonelyeagle commented Aug 20, 2023

EllisLambda commented Aug 20, 2023

EllisLambda commented Aug 20, 2023

xlinsist commented Aug 26, 2023 • edited Loading

EllisLambda commented Aug 29, 2023

EllisLambda commented Sep 8, 2023

EllisLambda commented Aug 19, 2023 •

edited

Loading

xlinsist commented Aug 26, 2023 •

edited

Loading