[Feature] XTuner Lite #974

pppppM · 2024-12-16T07:18:24Z

No description provided.

* first commit: support internlm3 moe streaming dataset * move codes

* first commit: support internlm3 moe streaming dataset * move codes * rmsnorm kernel support low version flash_attn * add barrier

* add internvl * fix bug * remove dup code * support liger of internvl * fix bug * add get_repo_git_info * fix * add minicpmv * add minicpmv dispatch

* fix dpo error * fix sp error * update dataset * fix

* sample ratio greater than 1.0 and trunc max len * accelerating the counting of tokens * log reduced loss * fix mirco bs greater than 1

…han 1 (#24) * repeat dataset * fixup * fix typos * fix typos

* add prefetch * update prefetch * add janus * add janus * fix * fix * fix llama position id error * fix ProcessPoolExecutor * update * fix llama * delete cache

* [Feature] XTuner Lite (#974) * minimum dependency sft * fix dispatch * add timer * add tgs * internlm2 tp * rms support tp * gradient checkpointing * lazy load pretrain * temp * fix bugs * add data pipeline example * fix lints * remove useless code * fix hard pack bug * add comments * clean code * add shard strategy * support cpu offload * support cpu offload * trust remote code * fix soft packer bug * fix soft packer bug * fix soft packer bug * refactor data pipeline * fixup * fix pad tokens bug * check input_ids and labels * check input_ids and labels in collator * fix load local datasets bug * fix load cache datasts * restore dset order * save cached infos * accelerate start up * avoid all gather cached datasets * fixup * fix cache bug * Support group length (#4) * replace rmsnorm kernel * suport ftdp ds * suport load_bin * suport group by maxlen * add fsdp_ftdp_sft and fix fsdp_sft * suport ftdp ds * add lr min * fix bugs * fix bugs * delete * support llava * support packer cache * refactor dist load * Add sp tp (#5) * support sp and tp * add fsdp_tp_sft and modify fsdp_sft * move chat_template * fix load_ds * delete useless codes * delete useless codes * fix jsonl load * refactor * fix bug * fix lr scheduler * refactor setup parallel * update data load * fix bugs * move fsdp * adapt new parallel load * fix setup_parallel (#7) * fix some bugs * add remote codes * add convert script * support load image from ceph * support load image from ceph * fix cache dataset bugs * support mulit images * support llava interleave * fix load timeout * refactor datasets: optimize the cache mechanism and clean up code * distinguish dataset components based on algorithms * support fsdp2+3d parallel * fix lints * support contiguous batching * refactor parallel * zero wasting ppo * support asend npu * fix openai convert * fix npu bugs * fix npu bug * dispatch npu flash attn * adapt asend npu * fix ppo losses * steady increase in reward * faster ppo * fix top-p generate * support internlm3 * baseline 2.5 * fix internlm3 * (ing)support hard pack * support qwen2 * fix dataset bugs * baseline * del ppo.py * fixup * support hybrid sp * fix hybrid sp * qwen2 + hybird sp * fix requirements * avoid re-initialize dist * support group pack * pretrain (#13) * first commit: support internlm3 moe streaming dataset * move codes * Moe pretrain (#14) * first commit: support internlm3 moe streaming dataset * move codes * rmsnorm kernel support low version flash_attn * add barrier * support prompt length control (#15) * support VLM Base (#16) * add internvl * fix bug * remove dup code * support liger of internvl * fix bug * add get_repo_git_info * fix * add minicpmv * add minicpmv dispatch * accelerate tokenize * Updata InternVL (#17) * fix dpo error * fix sp error * update dataset * fix * fix rand sampler (#18) * llama support transformers >= 4.45 (#19) * convert fsdp1 to fsdp2 in sft.py * [Feature] Support Liger Kernel (#20) * filter data by max length (#21) * fix causal forward, prefetch, and remote code (#22) * [Enhancement] Accelerating Data Pipeline (#23) * sample ratio greater than 1.0 and trunc max len * accelerating the counting of tokens * log reduced loss * fix mirco bs greater than 1 * [Enhancement] Ensure data integrity when the sampling ratio is more than 1 (#24) * repeat dataset * fixup * fix typos * fix typos * [Fix] Pass in temperature during generation (#25) * Support Janus and fix some error (#27) * add prefetch * update prefetch * add janus * add janus * fix * fix * fix llama position id error * fix ProcessPoolExecutor * update * fix llama * delete cache * remove useless code --------- Co-authored-by: whcao <[email protected]> Co-authored-by: Happy <[email protected]> Co-authored-by: Haian Huang(深度眸) <[email protected]> * support mlu (#984) * cleanup * add internlm3 remote code * cleanup * auto patch * remove useless code --------- Co-authored-by: whcao <[email protected]> Co-authored-by: Happy <[email protected]> Co-authored-by: Haian Huang(深度眸) <[email protected]> Co-authored-by: Lantian Zhang <[email protected]>

pppppM added 30 commits June 20, 2024 16:30

minimum dependency sft

f6e6a43

fix dispatch

3b6237e

add timer

3404262

add tgs

fb96828

internlm2 tp

cf4176a

rms support tp

ce8a5c1

gradient checkpointing

ebaa2e1

lazy load pretrain

69b7f29

temp

6f87c28

fix bugs

37afd2b

add data pipeline example

3789fb8

fix lints

bbcfc03

remove useless code

bdbd7b1

fix hard pack bug

e31822f

add comments

9f40981

clean code

90eeaab

add shard strategy

2b90046

support cpu offload

2620215

support cpu offload

1893d9c

trust remote code

e4f295b

fix soft packer bug

1026668

fix soft packer bug

cc25ee9

fix soft packer bug

4f38fd3

refactor data pipeline

399cd87

fixup

6cb8e5a

fix pad tokens bug

123633a

check input_ids and labels

4980af5

check input_ids and labels in collator

1d02ff9

fix load local datasets bug

871f210

fix load cache datasts

c8a2225

pppppM and others added 24 commits November 4, 2024 10:52

del ppo.py

a905683

fixup

877fd09

support hybrid sp

75ab733

fix hybrid sp

5b4126c

qwen2 + hybird sp

f6bf99d

fix requirements

569c725

avoid re-initialize dist

32c7a63

support group pack

fd8dab5

pretrain (#13)

dada500

* first commit: support internlm3 moe streaming dataset * move codes

Moe pretrain (#14)

c6a9ad1

* first commit: support internlm3 moe streaming dataset * move codes * rmsnorm kernel support low version flash_attn * add barrier

support prompt length control (#15)

32421cf

support VLM Base (#16)

da2c8a8

* add internvl * fix bug * remove dup code * support liger of internvl * fix bug * add get_repo_git_info * fix * add minicpmv * add minicpmv dispatch

accelerate tokenize

cd0c3f1

Updata InternVL (#17)

728ff55

* fix dpo error * fix sp error * update dataset * fix

fix rand sampler (#18)

fcd0836

llama support transformers >= 4.45 (#19)

a12f0ed

convert fsdp1 to fsdp2 in sft.py

99344c1

[Feature] Support Liger Kernel (#20)

6caad42

filter data by max length (#21)

6c6dfad

fix causal forward, prefetch, and remote code (#22)

07c175f

[Enhancement] Accelerating Data Pipeline (#23)

5e85e19

* sample ratio greater than 1.0 and trunc max len * accelerating the counting of tokens * log reduced loss * fix mirco bs greater than 1

[Enhancement] Ensure data integrity when the sampling ratio is more t…

657d74f

…han 1 (#24) * repeat dataset * fixup * fix typos * fix typos

[Fix] Pass in temperature during generation (#25)

fad0a4d

Support Janus and fix some error (#27)

9192ec8

* add prefetch * update prefetch * add janus * add janus * fix * fix * fix llama position id error * fix ProcessPoolExecutor * update * fix llama * delete cache

pppppM closed this Dec 26, 2024

remove useless code

a02bbb2

pppppM reopened this Dec 27, 2024

pppppM merged commit e443aa9 into InternLM:lite Dec 27, 2024
1 of 3 checks passed

pppppM deleted the xpuyu branch December 27, 2024 08:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] XTuner Lite #974

[Feature] XTuner Lite #974

pppppM commented Dec 16, 2024

[Feature] XTuner Lite #974

[Feature] XTuner Lite #974

Conversation

pppppM commented Dec 16, 2024