-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Paged Stashing #2690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
nanz-nv
wants to merge
68
commits into
NVIDIA:dev
Choose a base branch
from
vasunvidia:paged_offloading
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Paged Stashing #2690
Changes from 66 commits
Commits
Show all changes
68 commits
Select commit
Hold shift + click to select a range
96051cc
Add --moe-use-device-initiated-grouped-gemm to allow token_per_expert…
QiZhangNV 92e247f
Initial change for packed offloading
vasunvidia fa8da97
Bug fix
31b2ba9
Mem Opt
vasunvidia 78dfacd
Handle MXFP8Tensor offload
0c0a75e
Enable Packed offloading to CPU pinned memory with PACKED_OFFLOAD_CPU=1
6703445
Enable activation truncation for first step
955fbba
Overflow check and assert
cf7b68b
Check in temporary solution for detecing overflow in receiving buffer
nanz-nv dc4e973
Reconstruct the stash buffer into a 2D structure
nanz-nv 683d283
Refactor the code to check overflow in HybridEP receiving buffer
nanz-nv 9c65eea
Use CPU offloading context manager as a WAR for now to WAR the proble…
nanz-nv 7c2aa7c
Add support for paged stashing
nanz-nv f44a426
Add the feature of speculative CE stashing
nanz-nv 1bbaf54
Fix PP schedule
629bf22
Use common buffer across VP for paged stashing
vasunvidia 50c6c17
Disable Packed Offloading for validation
32fbc15
Fixe perf issue in packed stash/pop kernels
nanz-nv bff7e8b
Minor fix for tensor allocation and padding requirement on budget
nanz-nv 94c14bc
Packed/paged offloading is current not stream-safe. Need to put stash…
nanz-nv 7b0ef46
add new hybrid ep
Autumn1998 6905e2c
Remove the overflow check in framework because it is now done by hybr…
nanz-nv 9c056df
Fix one merge conflict
nanz-nv 669d9f7
Code cleanup
vasunvidia 66ebb1e
Add second autograd to avoid triple buffering
vasunvidia 535b277
Avoid unnecessary wait_stream for reload in case of 1f1b
vasunvidia cb71c66
Check in dynamic-shape-aware SwiGLU triton kernel
nanz-nv c308899
Major cleanup and refactor
nanz-nv 4c1b01b
Check in paged_stash.py that was omited in the previous commit
nanz-nv 3536250
Remove d2d page feature for now
nanz-nv 90b02d5
Update added arguments and add compatibility check
nanz-nv fb8fc21
refine overflow check
nanz-nv 27352b5
Fixing lint issues
nanz-nv 84ba8b8
Minor refactor
vasunvidia e32a28b
Add unit test for Paged Stashing
vasunvidia 6ca4a01
1. allocate stashing buffer based on avg token count if STASH_BUFFER_…
nanz-nv e88df64
Reenable overlapping of stashing kernels
nanz-nv 10ed85b
Remove a buggy/redundant reset
nanz-nv 62ffb30
Cleanup moe-expert-rank-capacity-factor argument.
vasunvidia 19b62d2
Update moe_use_device_initiated_grouped_gemm check for paged stashing…
vasunvidia 7c868e9
Remove the WAR of running warmup on a side stream
nanz-nv b815f99
Fix for data_iterator type check in Paged Stashing fallback
vasunvidia ac42b99
Change to support eager-mode fallback for validation
vasunvidia 5cff7a9
Revert "Check in dynamic-shape-aware SwiGLU triton kernel"
nanz-nv 6dd213b
Fixed some minor issues
nanz-nv b28f812
Fix the unit test
nanz-nv 2e92588
Initial commit for spill to cpu feature
nanz-nv 58a97c1
Move paged stashing knobs from env vars to transformer_config knobs
nanz-nv 79522cc
Refactor the knobs a bit so it is more intuitive
nanz-nv b3be4de
Use get_attr_wrapped_model util to access moe and mtp layers
vasunvidia 3fc366e
Refactor the unit test for paged stashing
nanz-nv 7a23c78
Clean up after rebase
nanz-nv b4e1e56
skip routed expert padding
zhongbozhu 06d8a85
Refactor/clean-up logging
nanz-nv fb620fc
Resolve review feedback
nanz-nv 09ef7af
Fix fallback data read for PP=1
vasunvidia f9a5fcf
Paged stashing refactor
vasunvidia 25be640
Remove logical_shape check
vasunvidia 1d3755a
Remove paged_stash_set_last_layer
vasunvidia f227fd6
Cleanup PadUnpadFunction
vasunvidia b5cb760
Remove stash modules and remove stashing code for non-fused grouped gemm
nanz-nv de6c6eb
Remove dead code
nanz-nv 84d1803
Fix TE import problem in experts.py
nanz-nv b49c1a0
Fixed merge conflict
nanz-nv 2617ff9
Address reviewer's comments
nanz-nv a037128
Review comments
vasunvidia 05ea747
Add PagedStashRunner for overflow detection for pure M-LM training
vasunvidia a133251
Release stashing buffer before fallback to restore the memory
nanz-nv File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.