Dynamic transformer #275

jlamypoirier · 2025-05-27T18:44:46Z

✨ Description

Adjust the rotary embeddings, peft and normalization layers to use the new dynamic classes. Do some cleanup and refactoring for rotary embeddings. Add an option to disable normalization layers because why not.

🔍 Type of change

Select all that apply:

🐛 Bug fix (non-breaking change that addresses a specific issue)
🚀 New feature (non-breaking change that adds functionality)
⚠️ Breaking change (a change that could affect existing functionality)
📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
📝 Documentation change (updates documentation, including new content or typo fixes)
🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

RaymondLi0

Some small comments here and there, but LGTM otherwise, thank you!

fast_llm/layers/language_model/embedding.py

RaymondLi0 · 2025-06-12T16:01:18Z

fast_llm/layers/transformer/preprocessing.py

-                    torch.cat([torch.full((x,), i) for i, x in enumerate(sample_lens)])
-                    for sample_lens in sequence_lengths
-                ]
+                [torch.cat([torch.arange(x) for x in sample_lens]) for sample_lens in sequence_lengths]


Won't this break the document_mask below?

Broken merge, good catch!

fast_llm/layers/transformer/rotary/rotary.py

RaymondLi0 · 2025-06-12T20:44:20Z

fast_llm/layers/transformer/rotary/rotary.py

+        return ramp_func
+
+    def _get_correction(self, beta: float, dim: int) -> float:
+        return math.floor(


original implementation uses floor for low, but ceil for high.

Co-authored-by: RaymondLi0 <[email protected]>

RaymondLi0

LGTM thanks!

oleksost · 2025-06-12T21:19:57Z

fast_llm/layers/common/config.py

-        desc="The type of normalization to use, for example Layer Norm or RMS Norm.",
-        hint=FieldHint.architecture,
-    )
+    @abc.abstractmethod


@abc.abstractmethod is not needed here?

oleksost · 2025-06-12T21:46:49Z

fast_llm/layers/transformer/config.py

@@ -194,66 +160,50 @@ class TransformerPeftConfig(PeftConfig):
    )


Not sure this is the right place to raise this concern, but I think its better to not have layer freezing as part of PEFT, and keep only a single layer freezing mechanism (e.g. currently using lr sclaing for different components and layers).

Hence, if user wants to use PEFT (e.g. LoRA) in conjunction with layer freezing, they might do it by explicitly freezing the layers using lr_scale parameter.

This way we can keep the LoRA/PEFT logic simple and disentangle parameter freezing from PEFT.

That's a fair point, but LoRA is almost always used together with freezing, so it's really more convenient to do it together. Also I don't think the arbitrary lr scaling parameters is a good long term solution, and anyway there is still only one freezing mechanism in the background, so it's not too bad.

oleksost · 2025-06-12T21:54:10Z

fast_llm/layers/common/config.py

+    def module_class(self):
+        from fast_llm.layers.common.normalization import RMSNorm
+
+        return RMSNorm


 @config_class()
 class PeftConfig(BaseModelConfig):


no default handling (from_dict)?

We don't really use that one and it doesn't have a registry, these are in TransformerPeftConfig instead.

jlamypoirier added 2 commits May 27, 2025 14:44

Dynamic transformer

8ce8674

fixes

5513e48

jlamypoirier marked this pull request as ready for review May 28, 2025 00:06

jlamypoirier requested review from bigximik and RaymondLi0 June 4, 2025 13:59

jlamypoirier added 2 commits June 4, 2025 12:31

Merge remote-tracking branch 'origin/main' into dynamic_transformer

3971464

Merge remote-tracking branch 'origin/main' into dynamic_transformer

9eb745c

RaymondLi0 requested changes Jun 12, 2025

View reviewed changes

fixes

62abf27

RaymondLi0 reviewed Jun 12, 2025

View reviewed changes

jlamypoirier and others added 3 commits June 12, 2025 16:56

Update fast_llm/layers/transformer/rotary/rotary.py

619a0da

Co-authored-by: RaymondLi0 <[email protected]>

fix

912431c

Merge remote-tracking branch 'origin/main' into dynamic_transformer

3c8cf0b

jlamypoirier requested a review from RaymondLi0 June 12, 2025 21:02

RaymondLi0 approved these changes Jun 12, 2025

View reviewed changes

oleksost reviewed Jun 12, 2025

View reviewed changes

jlamypoirier merged commit 016a308 into main Jun 12, 2025
3 of 4 checks passed

jlamypoirier deleted the dynamic_transformer branch June 12, 2025 22:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dynamic transformer #275

Dynamic transformer #275

Uh oh!

jlamypoirier commented May 27, 2025 •

edited

Loading

Uh oh!

RaymondLi0 left a comment

Uh oh!

Uh oh!

RaymondLi0 Jun 12, 2025

Uh oh!

jlamypoirier Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

RaymondLi0 Jun 12, 2025

Uh oh!

RaymondLi0 left a comment

Uh oh!

oleksost Jun 12, 2025

Uh oh!

oleksost Jun 12, 2025

Uh oh!

jlamypoirier Jun 12, 2025

Uh oh!

oleksost Jun 12, 2025

Uh oh!

jlamypoirier Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

		@@ -194,66 +160,50 @@ class TransformerPeftConfig(PeftConfig):
		)

Dynamic transformer #275

Dynamic transformer #275

Uh oh!

Conversation

jlamypoirier commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✨ Description

🔍 Type of change

Uh oh!

RaymondLi0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RaymondLi0 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jlamypoirier commented May 27, 2025 •

edited

Loading