Skip to content

Conversation

@daviswer
Copy link
Collaborator

No description provided.

daviswer and others added 19 commits October 22, 2024 13:44
* Add code of conduct (foundation-model-stack#132)

Add Foundation Model Stack community code of conduct.

Signed-off-by: Sahdev Zala <[email protected]>

* minimal cp

* print model sanity check

* more print tests

* better print test

* print test fix

* use foreach = false by default

* comment

* remove foreach option

* cp_impl

* context_parallel -> cp

* cp_in_node

* cp_in_node default False

* cp_in_node -> cp_over_world

* add else case

* print check

* rm print check

* explicit device meshes

* comment

* Pass dp_degree to dataloader

* Apply CP chunking

* add cp_{attn,mamba}_impl configs

* allreduce -> allgather typo fix

* Corrected dp ws

* Diag save

* Close brackets

* Better save

* Also save per-token loss

* grad norm print test

* fix throughput computation for cp

* rm local grad norm print

* another local grad norm print test

* print loss, too

* add local num params print

* rm test code

* Add 8x cfg

* Upstream fhandler

* Upstream fhandler / edge case fixes

* Rope theta name

* Add 32x

* rework mesh logic

* low_cpu_fsdp for mamba

* wrap embedding and lm head for mamba

* fms_to_hf_mamba_transformers.py

* remove "sharded" in hf ckpt util

* copy over hf conversion script

* Add docslicedataset

* Hard disable countfile (mixed dataset)

* Add 500k cfg and zloss

* Diag skip dslice

* Diag swap back buffers

* Readd loader changes

* Diag dslice off

* Revert

* Diag print

* Swap back buffers, doc fragging

* Add constant sched

* Up slice rate

* Soft filtering @8k

* Add 16x

* Add filter_exp as arg

* Passthrough filter_exp from cfg

* Add filter_exp and target_doclen

* Passthrough target_doclen

---------

Signed-off-by: Sahdev Zala <[email protected]>
Co-authored-by: Sahdev Zala <[email protected]>
Co-authored-by: Garrett Goon <[email protected]>
@daviswer daviswer merged commit 1171cf2 into foundation-model-stack:mamba-tiktoken-extend Jun 13, 2025
0 of 4 checks passed
@daviswer daviswer deleted the mamba-tiktoken-extend branch June 13, 2025 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant