Skip to content

Conversation

@skr3178
Copy link

@skr3178 skr3178 commented Jan 31, 2025

current model does not fit into my single 12 GB GPU.
At least specify the GPU VRAM requirement.

@andreaskoepf
Copy link
Member

@Hey skr3178 thank for training training on smaller GPU .. this would be really a cool thing to have, but instead of using a micro model I would prefer to go the following route:

  1. use liger kernel to overall reduce max memory required
  2. implement split group sampling & gradient accumulation (micro-batching)

To make the model configurable adding lightweight command line argument parsing would be nice.

Note: A PR should contain only files that belong to the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants