Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Checklist
cargo check,cargo clippy,cargo test.PR Description
Fixes #271
Summary
This PR fixes a crash in the quantized PyTorch Moshi models when loaded on Windows/Linux due to meta tensors being passed into
int8_vectorwise_quant.The issue happens because
replace_linear_with_qlinear()was called before real weights were loaded, resulting inmetatensors being quantized, which bitsandbytes does not support.Fix Details
QLinear.__init__, we now check if the weight is on themetadevice.CBandSCBtensors with correct shapes and set a flagself.is_meta = True._check_meta_statuswill turn offmetamode and ensure scale tensors are infloat32.CLA
I, LukaDarsalia, confirm that I have read and understood the terms of the CLA of Kyutai-labs, as outlined in the repository's CONTRIBUTING.md, and I agree to be bound by these terms.