Skip to content

Gemma 4: sanitize() duplicates 'model.' prefix, all weights load as zero #912

@raullenchai

Description

@raullenchai

Bug

Gemma 4 models (E2B, E4B, 31B) load with all-zero weights on mlx-vlm 0.4.3. The model produces only <pad> tokens.

Root Cause

In mlx_vlm/models/gemma4/gemma4.py, the sanitize() method has:

if new_key.startswith("language_model."):
    rest = new_key[len("language_model."):]
    new_key = "language_model.model." + rest

But the safetensors weights already have the full prefix language_model.model.:

language_model.model.embed_tokens.weight
language_model.model.layers.0.self_attn.q_proj.weight
...

So sanitize transforms language_model.model.embed_tokens.weightlanguage_model.model.model.embed_tokens.weight, which doesn't match any model parameter. The weights silently fail to load and everything is zero.

Fix

if new_key.startswith("language_model.model."):
    pass  # already correct
elif new_key.startswith("language_model."):
    rest = new_key[len("language_model."):]
    new_key = "language_model.model." + rest

Reproduction

from mlx_vlm import load, generate

model, processor = load("mlx-community/gemma-4-e4b-it-8bit")
prompt = processor.apply_chat_template(
    [{"role": "user", "content": "Hello"}],
    tokenize=False, add_generation_prompt=True,
)
output = generate(model, processor, prompt=prompt, max_tokens=10, verbose=False)
print(output.text)  # All <pad> tokens

Environment

  • mlx-vlm 0.4.3
  • mlx-lm 0.31.1
  • macOS 15.5, M3 Ultra
  • Models tested: gemma-4-e4b-it-4bit, gemma-4-e4b-it-8bit (both from mlx-community)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions