[JAX] Support segment_ids/pos as FA inputs #1406

zlsh80826 · 2025-01-13T10:21:04Z

Description

This PR adds segment_ids/pos limited support and deprecated fused_attn_thd API.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refractor

Changes

Add a new SequenceDescriptor class for different sequence descriptions scenario.
- from_seqlens for non-THD
- from_seqlens_and_offsets for THD
- from_segment_ids_and_pos for THD + ring attn (haven't implemented)
Change the old fused_attn mask parameter to SequenceDescriptor. Passing mask in the position argument will work for a while but generating deprecation warning.
Deprecate fused_attn_thd API as the refactored fused_attn can also support THD format.
Remove small inputs in test_fused_attn.py as the long sequence inputs should cover.
Add different sequence inputs tests in test_fused_attn.py

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

zlsh80826 · 2025-01-13T10:36:42Z

/te-ci jax L1

Signed-off-by: Reese Wang <[email protected]>

zlsh80826 · 2025-01-14T10:02:10Z

/te-ci jax L1

mgoldfarb-nvidia · 2025-01-14T15:24:08Z

tests/jax/test_fused_attn.py

@@ -709,10 +727,7 @@ def check_dqkv(primitive, reference, pad):
 @pytest.mark.parametrize(
    "b, s_q, s_kv, h_q, h_kv, d, dtype",
    [
-        pytest.param(4, 128, 128, 16, 16, 64, jnp.bfloat16, id="4-128-128-16-16-64-BF16-SELF"),


Are these removals to cut down on number of [redundant] test cases?

Yes, I think those tests are unnecessary because we no longer use max_512 kernels for seqlen < 512. The kernels for seqlen=128 are the same as seqlen=2048, so the 2048 test cases already cover the 128 cases.

mgoldfarb-nvidia · 2025-01-14T15:37:03Z

tests/jax/test_fused_attn.py

+@pytest.mark.parametrize(
+    "seq_desc_format",
+    [
+        pytest.param(SeqDescFormat.Mask, id="Mask"),


Would it help to cut down on test cases by creating a standalone unit test for the SequenceDesc to cover and check all of the cases? Then in this unit test we can use either Seqlens or SegmentIDs depending on THD or BSHD?

mgoldfarb-nvidia · 2025-01-14T15:48:37Z

transformer_engine/jax/attention.py

+        else:
+
+            def generate_default_pos(segment_ids):
+                seqlen = segment_ids.shape[-1]


I couldn't see if we applied max_segments_per_seq anywhere when generating the seqlen and offset to cudnn. I found that this was a very useful to help limit the overhead of the jax code that sets up seqlen/offset rather than assuming max_seq_len (see _get_seqlens_and_offsets). We should do a quick benchmark to see how much overhead if any is incurred.

I added fea6b0e and cfe00a8 to reduce the seqlen and offset shape to (batch, max_segments_per_seq)

Signed-off-by: Reese Wang <[email protected]>

zlsh80826 requested review from mgoldfarb-nvidia and phu0ngng January 13, 2025 14:07

zlsh80826 added 17 commits January 14, 2025 09:57

POC for segment_ids/segment_pos

0733c76

Signed-off-by: Reese Wang <[email protected]>

Change segment_pos position

1c3cbb4

Signed-off-by: Reese Wang <[email protected]>

Use RemainingArgs to solve number of parameters mismatches

b321d41

Signed-off-by: Reese Wang <[email protected]>

Test mask_descriptor for accomendating different mask representations

e2a61ae

Signed-off-by: Reese Wang <[email protected]>

Fix bugs

51a9e90

Signed-off-by: Reese Wang <[email protected]>

Use descriptor in bwd

e27a476

Signed-off-by: Reese Wang <[email protected]>

Primitives only accepts pure jnp array

9c3a597

Signed-off-by: Reese Wang <[email protected]>

segment_ids/pos support POC

c7759b0

Signed-off-by: Reese Wang <[email protected]>

Move seqlens/offsets generation to mask descriptor

a83cded

Signed-off-by: Reese Wang <[email protected]>

Rename MaskDescriptor to SequenceDescriptor

945228b

Signed-off-by: Reese Wang <[email protected]>

Generalize get_seqlens_and_offsets

6cf9aa1

Signed-off-by: Reese Wang <[email protected]>

Utilize sequence desc on FA bwd

082b630

Signed-off-by: Reese Wang <[email protected]>

Migrate to new API

7825750

Signed-off-by: Reese Wang <[email protected]>

Add docstrings

606259f

Signed-off-by: Reese Wang <[email protected]>

Remove small inputs and test different input format

5441f8f

Signed-off-by: Reese Wang <[email protected]>

Fix lint

0382c68

Signed-off-by: Reese Wang <[email protected]>

Fix seed shardings

e62c049

Signed-off-by: Reese Wang <[email protected]>

zlsh80826 force-pushed the rewang/test-segment-ids branch from 08a7582 to e62c049 Compare January 14, 2025 09:57

mgoldfarb-nvidia reviewed Jan 14, 2025

View reviewed changes

zlsh80826 added 4 commits January 15, 2025 10:04

Optimize sequence converting overhead

fea6b0e

Signed-off-by: Reese Wang <[email protected]>

Optimize seq_offsets calculation

cfe00a8

Signed-off-by: Reese Wang <[email protected]>

Fix up

e04d72a

Signed-off-by: Reese Wang <[email protected]>

fix lint

b8255e3

Signed-off-by: Reese Wang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JAX] Support segment_ids/pos as FA inputs #1406

[JAX] Support segment_ids/pos as FA inputs #1406

zlsh80826 commented Jan 13, 2025 •

edited

Loading

zlsh80826 commented Jan 13, 2025

zlsh80826 commented Jan 14, 2025

mgoldfarb-nvidia Jan 14, 2025

zlsh80826 Jan 15, 2025

mgoldfarb-nvidia Jan 14, 2025

zlsh80826 Jan 15, 2025

mgoldfarb-nvidia Jan 14, 2025

zlsh80826 Jan 15, 2025

[JAX] Support segment_ids/pos as FA inputs #1406

Are you sure you want to change the base?

[JAX] Support segment_ids/pos as FA inputs #1406

Conversation

zlsh80826 commented Jan 13, 2025 • edited Loading

Description

Type of change

Changes

Checklist:

zlsh80826 commented Jan 13, 2025

zlsh80826 commented Jan 14, 2025

mgoldfarb-nvidia Jan 14, 2025

Choose a reason for hiding this comment

zlsh80826 Jan 15, 2025

Choose a reason for hiding this comment

mgoldfarb-nvidia Jan 14, 2025

Choose a reason for hiding this comment

zlsh80826 Jan 15, 2025

Choose a reason for hiding this comment

mgoldfarb-nvidia Jan 14, 2025

Choose a reason for hiding this comment

zlsh80826 Jan 15, 2025

Choose a reason for hiding this comment

zlsh80826 commented Jan 13, 2025 •

edited

Loading