-
-
Notifications
You must be signed in to change notification settings - Fork 429
Gemma 4: sanitize() duplicates 'model.' prefix, all weights load as zero #912
Copy link
Copy link
Open
Description
Bug
Gemma 4 models (E2B, E4B, 31B) load with all-zero weights on mlx-vlm 0.4.3. The model produces only <pad> tokens.
Root Cause
In mlx_vlm/models/gemma4/gemma4.py, the sanitize() method has:
if new_key.startswith("language_model."):
rest = new_key[len("language_model."):]
new_key = "language_model.model." + restBut the safetensors weights already have the full prefix language_model.model.:
language_model.model.embed_tokens.weight
language_model.model.layers.0.self_attn.q_proj.weight
...
So sanitize transforms language_model.model.embed_tokens.weight → language_model.model.model.embed_tokens.weight, which doesn't match any model parameter. The weights silently fail to load and everything is zero.
Fix
if new_key.startswith("language_model.model."):
pass # already correct
elif new_key.startswith("language_model."):
rest = new_key[len("language_model."):]
new_key = "language_model.model." + restReproduction
from mlx_vlm import load, generate
model, processor = load("mlx-community/gemma-4-e4b-it-8bit")
prompt = processor.apply_chat_template(
[{"role": "user", "content": "Hello"}],
tokenize=False, add_generation_prompt=True,
)
output = generate(model, processor, prompt=prompt, max_tokens=10, verbose=False)
print(output.text) # All <pad> tokensEnvironment
- mlx-vlm 0.4.3
- mlx-lm 0.31.1
- macOS 15.5, M3 Ultra
- Models tested: gemma-4-e4b-it-4bit, gemma-4-e4b-it-8bit (both from mlx-community)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels