Skip to content

GPTQModel v1.8.0

Pre-release
Pre-release
Compare
Choose a tag to compare
@Qubitium Qubitium released this 07 Feb 17:07
· 101 commits to main since this release
e876a49

What's Changed

DeekSeek v3/R1 model support.
⚡ New flexible weight packing: allow quantized weights to be packed to [int32, int16, int8] dtypes. Triton and Torch kernels supports full range of new QuantizeConfig.pack_dtype.
⚡ New auto_gc: bool control in quantize() which can reduce quantization time for small model with no chance of oom.
⚡ New GPTQModel.push_to_hub() api for easy quant model to HF repo.
⚡ New buffered_fwd: bool control in model.quantize().
🐛 Fixed bits=3 packing regression in v1.7.4.
🐛 Fixed Google Colab install requiring two install passes
🐛 Fixed Python 3.10 compatibility

Full Changelog: v1.7.4...v1.8.0