-
Notifications
You must be signed in to change notification settings - Fork 45
Mamba tiktoken extend #150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
daviswer
merged 19 commits into
foundation-model-stack:mamba-tiktoken-extend
from
daviswer:mamba-tiktoken-extend
Jun 13, 2025
Merged
Mamba tiktoken extend #150
daviswer
merged 19 commits into
foundation-model-stack:mamba-tiktoken-extend
from
daviswer:mamba-tiktoken-extend
Jun 13, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Add code of conduct (foundation-model-stack#132) Add Foundation Model Stack community code of conduct. Signed-off-by: Sahdev Zala <[email protected]> * minimal cp * print model sanity check * more print tests * better print test * print test fix * use foreach = false by default * comment * remove foreach option * cp_impl * context_parallel -> cp * cp_in_node * cp_in_node default False * cp_in_node -> cp_over_world * add else case * print check * rm print check * explicit device meshes * comment * Pass dp_degree to dataloader * Apply CP chunking * add cp_{attn,mamba}_impl configs * allreduce -> allgather typo fix * Corrected dp ws * Diag save * Close brackets * Better save * Also save per-token loss * grad norm print test * fix throughput computation for cp * rm local grad norm print * another local grad norm print test * print loss, too * add local num params print * rm test code * Add 8x cfg * Upstream fhandler * Upstream fhandler / edge case fixes * Rope theta name * Add 32x * rework mesh logic * low_cpu_fsdp for mamba * wrap embedding and lm head for mamba * fms_to_hf_mamba_transformers.py * remove "sharded" in hf ckpt util * copy over hf conversion script * Add docslicedataset * Hard disable countfile (mixed dataset) * Add 500k cfg and zloss * Diag skip dslice * Diag swap back buffers * Readd loader changes * Diag dslice off * Revert * Diag print * Swap back buffers, doc fragging * Add constant sched * Up slice rate * Soft filtering @8k * Add 16x * Add filter_exp as arg * Passthrough filter_exp from cfg * Add filter_exp and target_doclen * Passthrough target_doclen --------- Signed-off-by: Sahdev Zala <[email protected]> Co-authored-by: Sahdev Zala <[email protected]> Co-authored-by: Garrett Goon <[email protected]>
1171cf2
into
foundation-model-stack:mamba-tiktoken-extend
0 of 4 checks passed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.