Skip to content

[midend][tests] Add broadcast BatchMatMul optimization pass and tests. #187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 9, 2023

Conversation

EllisLambda
Copy link
Contributor

@EllisLambda EllisLambda commented Aug 19, 2023

Add "-batchmatmul-optimize" option in buddy-opt, the Matmul part is similar to MatMulBroadcast.mlir in buddy-benchmark. The batch-level loop utilizes affine.parallel. In the case of multiple batches, OpenMP can be utilized for multi-threaded acceleration. The A[batch_idx, 0, 0] which broadcasts in the Matmul will use affine.prefetch to prefetch in the outer loop. It could make the Op slightly faster. There's a bug, if (A_elementType.isIntOrFloat()) cannot work properly. Even if the input type is floating-point, the integer branch will still execute. So I modified to if (A_elementType.isIntOrFloat() && 0) to skip the branch. So far the integer matrix is unavailable.

Comment on lines +1 to +2
//===- BatchMatMulOptimize.cpp
//-------------------------------------------------===//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reformat the title comment for neatness.

Value N = rewriter.create<memref::DimOp>(loc, B, 2); // b_col
Value K = rewriter.create<memref::DimOp>(loc, B, 1); // b_row

// build loop body
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure the comments are decent(i.e. capitalized and expressed formally). You should also double-check the other comments in this PR.

@xlinsist
Copy link
Collaborator

Thanks! @EllisLambda I have left some suggestions to make it better, and I think these are enough for this PR. More verifications in real scenarios(e.g. simultaneously process linalg.batch_matmul of multiple sizes) may be needed in the future.

@zhanghb97 you may also see buddy-compiler/buddy-benchmark#73.

@linuxlonelyeagle
Copy link
Member

@EllisLambda Why add batch_matmul pass to opt? I think it is inappropriate to add a certain op pass in opt.

@xlinsist
Copy link
Collaborator

@EllisLambda Why add batch_matmul pass to opt? I think it is inappropriate to add a certain op pass in opt.

batch_matmul is a typical opertion which is widely used in transformer models, so I think it is reasonable to provide a tailor-made pass for it.

Nevertheless, it would be better if matmul-optimize pass can incorporate the lowering optimizations of both linalg.matmul and linalg.batch_matmul. WDYT @EllisLambda ?

@linuxlonelyeagle
Copy link
Member

In fact, there is some truth to this, but I think this PR should explain what optimizations have been made? Where has the optimization been made?

@EllisLambda EllisLambda requested a review from xlinsist August 20, 2023 16:22
@EllisLambda
Copy link
Contributor Author

In fact, there is some truth to this, but I think this PR should explain what optimizations have been made? Where has the optimization been made?

@EllisLambda Why add batch_matmul pass to opt? I think it is inappropriate to add a certain op pass in opt.

batch_matmul is a typical opertion which is widely used in transformer models, so I think it is reasonable to provide a tailor-made pass for it.

Nevertheless, it would be better if matmul-optimize pass can incorporate the lowering optimizations of both linalg.matmul and linalg.batch_matmul. WDYT @EllisLambda ?

The optimization styles between Matmul and BatchMatmul are quite distinct. On different hardware, the performance of these two operations may differ. Separating might be better?

@EllisLambda
Copy link
Contributor Author

In fact, there is some truth to this, but I think this PR should explain what optimizations have been made? Where has the optimization been made?

Ok, I will add the explanation.

@xlinsist
Copy link
Collaborator

xlinsist commented Aug 26, 2023

Separating might be better?

Yes, we decide to develop batchmatmul-optimize pass specifically for linalg.batch_matmul right now.

Yumin @EllisLambda , about the "bug" you mention in your description, we can unify the support of floating point and integer with another interface. If not, starting a new discussion about integer operations would be more appropriate so that the relevant comments in your description could be removed, as this PR just needs to take floating point operations into account.

@EllisLambda
Copy link
Contributor Author

Separating might be better?

Yes, we decide to develop batchmatmul-optimize pass specifically for linalg.batch_matmul right now.

Yumin @EllisLambda , about the "bug" you mention in your description, we can unify the support of floating point and integer with another interface. If not, starting a new discussion about integer operations would be more appropriate so that the relevant comments in your description could be removed, as this PR just needs to take floating point operations into account.

Ok! I see.

@EllisLambda
Copy link
Contributor Author

@xlinsist I have amended my PR.

@xlinsist xlinsist merged commit a8f1653 into buddy-compiler:main Sep 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants