-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DID loop split for SDPA #3711
base: main
Are you sure you want to change the base?
DID loop split for SDPA #3711
Conversation
!test |
self.define_tensor( | ||
shape=[b, h, s, e // h], | ||
dtype=DataType.BFloat16, | ||
stride_order=stride_order, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? I believe it'll get overwritten by set_allocation_as_loop so we can simply start with the default stride_order.
Co-authored-by: Jingyue Wu <[email protected]>
Co-authored-by: Jingyue Wu <[email protected]>
PR Reviewer Guide 🔍(Review updated until commit 026229d)Here are some key observations to aid the review process:
|
In this PR, I explicitly parallelize the outputs
attn
,log_sumexp
ofsdpfa_fwd
. Sharding propagation for loop split does not work correctly in this case at the moment.