change the training LLM model used #2

skr3178 · 2025-01-31T00:18:16Z

current model does not fit into my single 12 GB GPU.
At least specify the GPU VRAM requirement.

add fsdp 2

andreaskoepf · 2025-01-31T07:10:57Z

@Hey skr3178 thank for training training on smaller GPU .. this would be really a cool thing to have, but instead of using a micro model I would prefer to go the following route:

use liger kernel to overall reduce max memory required
implement split group sampling & gradient accumulation (micro-batching)

To make the model configurable adding lightweight command line argument parsing would be nice.

Note: A PR should contain only files that belong to the project.

samsja and others added 14 commits January 24, 2025 02:15

add fsdp2

9015634

add fsdp2

c4f70bc

wip

b6044f5

fix ckpt

36f954a

fix ckpt

3d387d4

update param

eb1b796

remove llama decoder import

833ba4a

add back forward

d104a56

Merge pull request open-thought#1 from samsja/fsdp_2

ef14fcb

add fsdp 2

black formatting

e681b87

Your commit message here

e1fe5e1

Your commit message

551f144

Deleted files from local drive

c3fc886

Deleted files from local drive

3ac7ea5

andreaskoepf force-pushed the main branch from e681b87 to eafedd7 Compare January 31, 2025 14:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

change the training LLM model used #2

change the training LLM model used #2

Uh oh!

skr3178 commented Jan 31, 2025

Uh oh!

andreaskoepf commented Jan 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

change the training LLM model used #2

Are you sure you want to change the base?

change the training LLM model used #2

Uh oh!

Conversation

skr3178 commented Jan 31, 2025

Uh oh!

andreaskoepf commented Jan 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants