Mamba tiktoken extend #150

daviswer · 2025-06-13T22:50:03Z

No description provided.

@8k

* Add code of conduct (foundation-model-stack#132) Add Foundation Model Stack community code of conduct. Signed-off-by: Sahdev Zala <[email protected]> * minimal cp * print model sanity check * more print tests * better print test * print test fix * use foreach = false by default * comment * remove foreach option * cp_impl * context_parallel -> cp * cp_in_node * cp_in_node default False * cp_in_node -> cp_over_world * add else case * print check * rm print check * explicit device meshes * comment * Pass dp_degree to dataloader * Apply CP chunking * add cp_{attn,mamba}_impl configs * allreduce -> allgather typo fix * Corrected dp ws * Diag save * Close brackets * Better save * Also save per-token loss * grad norm print test * fix throughput computation for cp * rm local grad norm print * another local grad norm print test * print loss, too * add local num params print * rm test code * Add 8x cfg * Upstream fhandler * Upstream fhandler / edge case fixes * Rope theta name * Add 32x * rework mesh logic * low_cpu_fsdp for mamba * wrap embedding and lm head for mamba * fms_to_hf_mamba_transformers.py * remove "sharded" in hf ckpt util * copy over hf conversion script * Add docslicedataset * Hard disable countfile (mixed dataset) * Add 500k cfg and zloss * Diag skip dslice * Diag swap back buffers * Readd loader changes * Diag dslice off * Revert * Diag print * Swap back buffers, doc fragging * Add constant sched * Up slice rate * Soft filtering @8k * Add 16x * Add filter_exp as arg * Passthrough filter_exp from cfg * Add filter_exp and target_doclen * Passthrough target_doclen --------- Signed-off-by: Sahdev Zala <[email protected]> Co-authored-by: Sahdev Zala <[email protected]> Co-authored-by: Garrett Goon <[email protected]>

daviswer and others added 19 commits October 22, 2024 13:44

Remove rope_theta arg from recent external PR

2d10c86

Merge branch 'foundation-model-stack:main' into main

73728c5

Slice begin/end when long enough

1408cdd

Empty begin/end

43884a8

Pull in cfg and loader updates from mamba-tiktoken

f699bd5

Readd clm

362bcd4

Port train_utils

8d0f4f2

orig params

a24f24b

Separated weights

710a275

Full changes ported (minus conversion)

0a7ea94

Pad buffer after saving ckpt

a9e16a4

Temp disable optim load for diag porpoises

62f7e17

Revert optim skip

3bae2f6

Account for no delims in docslice layer

b67fb6e

Diag print

4839584

Wipe cache if wrong len

4c1dcf9

Wipe cache if wrong len pt2

abd5ef1

Make doc slicing flaggable

922d373

daviswer merged commit 1171cf2 into foundation-model-stack:mamba-tiktoken-extend Jun 13, 2025
0 of 4 checks passed

daviswer deleted the mamba-tiktoken-extend branch June 13, 2025 22:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mamba tiktoken extend #150

Mamba tiktoken extend #150

Uh oh!

daviswer commented Jun 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Mamba tiktoken extend #150

Mamba tiktoken extend #150

Uh oh!

Conversation

daviswer commented Jun 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant