Skip to content

[MatmulLoopPipeline] Populate LoadOp mask to PrefetchOp #4030

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 28, 2025

Conversation

whitneywhtsang
Copy link
Contributor

@whitneywhtsang whitneywhtsang commented Apr 27, 2025

This PR enhances MatmulLoopPipeline to make it create PrefetchOp operations with mask from associated LoadOp.
Benchmark CI:
https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/14697631543
https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/14716472373
(No performance regressions.)

Note: this change comes partially from #3634.

@whitneywhtsang whitneywhtsang self-assigned this Apr 27, 2025
@whitneywhtsang whitneywhtsang marked this pull request as ready for review April 28, 2025 00:18
@whitneywhtsang whitneywhtsang requested review from alexbaden, etiotto, chengjunlu and a team April 28, 2025 00:21
@whitneywhtsang whitneywhtsang requested a review from etiotto April 28, 2025 16:45
@whitneywhtsang whitneywhtsang merged commit d4699e1 into main Apr 28, 2025
10 checks passed
@whitneywhtsang whitneywhtsang deleted the whitneywhtsang/prefetch_pipeline branch April 28, 2025 21:44
@etiotto etiotto linked an issue May 2, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Performance] Enable prefetching for tt.load with tensor of pointer
4 participants