Skip to content

Support frozen weights #183

Closed
@jlamypoirier

Description

@jlamypoirier

🎯 Goal (What & Why)

Fast-LLM creates gradient and optimizer state buffers for all parameters even if they are frozen. This degrades both memory usage and speedups from freezing weights, and is a blocker for LoRA (#149).

🚀 Execution Plan

Step 1: What is the smallest working version?

Step 2: What additional optimizations are possible (later, out-of-scope for nowl)?

  • Avoid storing a separate full-precision copy (shard) of the frozen weights when the 2-bit copy is enough. This will prevent excessive state memory usage when using a small number of gpus (up to ~3x for single-gpu)
  • Avoid reconstructing the frozen weights on every training step if they don't need to be. This will save a whole lot of unnecessary communication and potential network overhead with ZeRO-1/2
  • Weight freezing is not considered part of the architecture, yet will necessarily change the weight layout. We'll need additional safety checks to avoid accidental misuse (ex. loading distributed checkpoints in the wrong format). Note: this further breaks the architecture/non-architecture split, making things like Add non-architecture Huggingface conversion parameters #166 and [Prototype] Make the model config override the pretrained config #171 more relevant.

📌 Acceptance Criteria (Must-Haves for Completion)

  • Things work as described above

🛠️ Project Management

  • Assign the project to the Fast-LLM project.
  • Set the Estimate field (in days) in the GitHub project.
  • Use the Size field to categorize the PR size (Small/Medium/Large).
  • Assign an owner when opening the issue.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions