Skip to content

Conversation

@LukaDarsalia
Copy link

@LukaDarsalia LukaDarsalia commented May 19, 2025

Checklist

  • Read CONTRIBUTING.md, and accept the CLA by including the provided snippet. We will not accept PR without this.
  • Run pre-commit hook.
  • If you changed Rust code, run cargo check, cargo clippy, cargo test.

PR Description

Fixes #271

Summary

This PR fixes a crash in the quantized PyTorch Moshi models when loaded on Windows/Linux due to meta tensors being passed into int8_vectorwise_quant.

The issue happens because replace_linear_with_qlinear() was called before real weights were loaded, resulting in meta tensors being quantized, which bitsandbytes does not support.

Fix Details

  • In QLinear.__init__, we now check if the weight is on the meta device.
  • If it is, we create dummy CB and SCB tensors with correct shapes and set a flag self.is_meta = True.
  • We delay quantization until real weights are loaded.
  • The forward pass checks the meta status and raises a clear error if used too early.
  • Once weights are loaded, _check_meta_status will turn off meta mode and ensure scale tensors are in float32.

CLA

I, LukaDarsalia, confirm that I have read and understood the terms of the CLA of Kyutai-labs, as outlined in the repository's CONTRIBUTING.md, and I agree to be bound by these terms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Quantized model fails to load on Windows/Linux

1 participant