Skip to content

Conversation

@Aya-ZIbra
Copy link
Contributor

Summary:
This diff introduces changes to support local masks in the decode attn implementation. The changes include adding window_left and window_right parameters to the decode function, modifying the GenRunner class to include a Mask template parameter, and modifying the collective_builder to include a Mask parameter. The changes also include modifying the load_cpasync_warpspecialized class to include window_size_left and window_size_right parameters.

Currrently, softmax is applied in a 3-loop setting.
Next: Optimize these iteration and benchmark perf.

Differential Revision: D84778050

Aya Ibrahim and others added 5 commits October 9, 2025 15:16
Summary:
This diff updates the code to enable BF16 enablement with latest Cutlass version.

The changes include updating the code in the `blackwell_gen_impl.cu` and `collective/sm100_fmha_gen_mainloop_warpspecialized.hpp` files to support BF16 data type.

The `fmha.hpp` file also includes a check to ensure that the SMEM usage does not exceed the capacity.

Differential Revision: D84624233
Summary:
Add stand-alone blackwell decode op.
Supported mask:
   BlockDiagonalCausalWithOffsetPaddedKeysMask

Differential Revision: D84630701
Summary:
This diff introduces changes to support local masks in the decode attn implementation. The changes include adding window_left and window_right parameters to the decode function, modifying the GenRunner class to include a Mask template parameter, and modifying the collective_builder to include a Mask parameter. The changes also include modifying the load_cpasync_warpspecialized class to include window_size_left and window_size_right parameters.

Currrently, softmax is applied in a 3-loop setting.
Next: Optimize these iteration and benchmark perf.

Differential Revision: D84778050
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Oct 16, 2025

@Aya-ZIbra has exported this pull request. If you are a Meta employee, you can view the originating Diff in D84778050.

@netlify
Copy link

netlify bot commented Oct 16, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 06d6424
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68f123b8248d2900089c65e7
😎 Deploy Preview https://deploy-preview-5015--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant