Releases: ModelCloud/GPTQModel
GPTQModel v1.2.0
Note:
v1.2.0 was released with wrong version value of 1.2.1-dev.
We are re-releasing 1.2.0 correctly as 1.2.1.
GPTQModel v1.1.0
What's Changed
IBM Granite model support. Full auto-buildless wheel install from pypi. Reduce max cpu memory usage by >20% during quantization. 100% CI model/feature coverage. Updated hf-integration support with latest transformers.
Full deprecations: liger-kernel support and exllama v1 quant kernel.
- Fix deprecated by @CSY-ModelCloud in #447
- [COMPAT] [FIX] vllm params by @ZYC-ModelCloud in #448
- add estimate-vram by @PZS-ModelCloud in #452
- add field uri by @ZYC-ModelCloud in #449
- auto infer model base name from model files by @ZYC-ModelCloud in #451
- remove exllama v1 by @PZS-ModelCloud in #453
- [SECURITY] drop support of loading unsafe .bin weights by @ZYC-ModelCloud in #460
- [MODEL] add granite support by @LRL-ModelCloud in #466
- Split base.py file by @ZYC-ModelCloud in #465
- Move save_quantized function into saver.py by @ZYC-ModelCloud in #467
- remove deprecated exllama v1 code by @Qubitium in #473
- [MISC] move model def file to model_def folder by @PZS-ModelCloud in #479
- [FIX] Fix unit test by @PZS-ModelCloud in #480
- Download whl in setup.py by @CSY-ModelCloud in #481
- [Fix] cpu memory leak by @ZX-ModelCloud in #485
- [CI] set ninja threads to 4 by @CSY-ModelCloud in #487
- [FIX] sharded model loading error by @ZX-ModelCloud in #490
- add internlm test by @PZS-ModelCloud in #491
- remove needless function by @ZYC-ModelCloud in #494
- Fix unit test by @ZYC-ModelCloud in #495
- [FIX] fix test_integration by @PZS-ModelCloud in #497
- [Test] add codegen and xverse test by @PZS-ModelCloud in #496
Full Changelog: v1.0.9...v1.1.0
GPTQModel v1.0.9
What's Changed
Fixed HF integration to work with latest transformers. Moved AutoRound to optional. Update flaky CI tests.
- [FIX] mark auto_round extras_require by @LRL-ModelCloud in #430
- [BUILD] update compile flags by @Qubitium in #428
- [FIX] failed test_transformers_integration.py by @ZX-ModelCloud in #435
Full Changelog: v1.0.8...v1.0.9
GPTQModel v1.0.8
What's Changed
Moved QBits to optional. Add Python 3.12 wheels and fix wheel generation for cuda 11.8.
- [PKG] update vllm/sglang optional depends by @PZS-ModelCloud in #423
- [FIX] autoround depend causing torch-cpu to be installed by @Qubitium in #422
Full Changelog: v1.0.7...v1.0.8
GPTQModel v1.0.7
What's Changed
Fixed marlin (faster) kernel was not auto-selected for some models and autoround
quantization save throwing json errors.
- [FIX] marlin_inference_linear not correctly auto selected for eligible models by @ZX-ModelCloud in #413
- [FIX] remove "scale" and "zp" Tensor from layer_config by @ZX-ModelCloud in #414
- [FIX] Failed unit test by @ZX-ModelCloud in #420
Full Changelog: v1.0.6...v1.0.7
GPTQModel v1.0.6
What's Changed
Patch release to fix loading of quantized Llama 3.2 Vision model.
- [FIX] mllama loader by @LRL-ModelCloud in #404
Full Changelog: v1.0.5...v1.0.6
GPTQModel v1.0.5
What's Changed
Added partial quantization support Llama 3.2 Vision model. v1.0.5 allows quantization of text-layers (layers responsible for text-generation) only. We will add vision layer support shortly. A Llama 3.2 11B Vision Instruct models will quantize to 50% of the size in 4bit mode. Once vision layer support is added, the size will reduce to expected ~1/4.
- [MODEL] Add Llama 3.2 Vision (mllama)* support by @LRL-ModelCloud in #401
Full Changelog: v1.0.4...v1.0.5
GPTQModel v1.0.4
What's Changed
Liger Kernel support added for ~50% vram reduction in quantization stage for some models. Added toggle to disable parallel packing to avoid oom larger models. Transformers depend updated to 4.45.0 for Llama 3.2 support.
- [FEATURE] add a parallel_packing toggle by @LRL-ModelCloud in #393
- [FEATURE] add liger_kernel support by @LRL-ModelCloud in #394
Full Changelog: v1.0.3...v1.0.4
GPTQModel v1.0.3
What's Changed
- [MODEL] Add minicpm3 by @LDLINGLINGLING in #385
- [FIX] fix minicpm3 support by @LRL-ModelCloud in #387
- [MODEL] Added GRIN-MoE support by @LRL-ModelCloud in #388
New Contributors
- @LDLINGLINGLING made their first contribution in #385
- @mrT23 made their first contribution in #386
Full Changelog: v1.0.2...v1.0.3
GPTQModel v1.0.2
What's Changed
Upgrade the AutoRound package to v0.3.0. Pre-built WHL and PyPI source releases are now available. Installation can be done by downloading our pre-built WHL or using pip install gptqmodel --no-build-isolation
.
- [CORE] Autoround v0.3 by @LRL-ModelCloud in #368
- [CI] Lots of CI fixups by @CSY-ModelCloud
Full Changelog: v1.0.0...v1.0.2