Releases · ModelCloud/GPTQModel

14 Aug 00:29

Qubitium

v1.0.0

4a028d5

v1.0.0

What's Changed

40% faster multi-threaded packing, new lm_eval api, fixed python 3.9 compat.

Add lm_eval api by @PZS-ModelCloud in #338
Multi-threaded packing in quantization by PZS-ModelCloud in #354
[CI] Add TGI unit test by @PZS-ModelCloud in #348
[CI] Updates by @CSY-ModelCloud in #347, #352, #353, #355, @CSY-ModelCloud in #357
Fix python 3.9 compat by @PZS-ModelCloud in #358

Full Changelog: v0.9.11...v1.0.0

Contributors

PZS-ModelCloud and CSY-ModelCloud

Assets 27

09 Aug 10:33

Qubitium

v0.9.11

f2fcdc8

GPTQModel v0.9.11

What's Changed

Added LG EXAONE 3.0 model support. New dynamic per layer/module flexible quantization where each layer/module may have different bits/params. Added proper sharding support to backend.BITBLAS. Auto-heal quantization errors due to small damp values.

[CORE] add support for pack and shard to bitblas by @LRL-ModelCloud in #316
Add dynamic bits by @PZS-ModelCloud in #311, #319, #321, #323, #327
[MISC] Adjust the validate order of QuantLinear when BACKEND is AUTO by @ZX-ModelCloud in #318
add save_quantized log model total size by @PZS-ModelCloud in #320
Auto damp recovery by @CSY-ModelCloud in #326
[FIX] add missing original_infeatures by @CSY-ModelCloud in #337
Update Transformers to 4.44.0 by @Qubitium in #336
[MODEL] add exaone model support by @LRL-ModelCloud in #340
[CI] Upload wheel to local server by @CSY-ModelCloud in #339
[MISC] Fix assert by @CSY-ModelCloud in #342

Full Changelog: v0.9.10...v0.9.11

Contributors

Qubitium, PZS-ModelCloud, and 3 other contributors

Assets 2

30 Jul 19:04

Qubitium

v0.9.10

233548b

GPTQModel v0.9.10

What's Changed

Ported vllm/nm gptq_marlin inference kernel with expanded bits (8bits), group_size (64,32), and desc_act support for all GPTQ models with format = FORMAT.GPTQ. Auto calculate auto-round nsamples/seglen parameters based on calibration dataset. Fixed save_quantized() called on pre-quantized models with non-supported backends. HF transformers depend updated to ensure Llama 3.1 fixes are correctly applied to both quant and inference stage.

[CORE] add marlin inference kernel by @ZX-ModelCloud in #310
[CI] Increase timeout to 40m by @CSY-ModelCloud in #295, #299
[FIX] save_quantized() by @ZX-ModelCloud in #296
[FIX] autoround nsample/seqlen to be actual size of calibration_dataset. by @LRL-ModelCloud in #297, @LRL-ModelCloud in #298
Update HF transformers to 4.43.3 by @Qubitium in #305
[CI] remove test_marlin_hf_cache_serialization() by @ZX-ModelCloud in #314

Full Changelog: v0.9.9...v0.9.10

Contributors

Qubitium, ZX-ModelCloud, and 2 other contributors

Assets 2

24 Jul 16:42

Qubitium

v0.9.9

519fbe3

GPTQModel v0.9.9

What's Changed

Added Llama-3.1 support, Gemma2 27B quant inference support via vLLM, auto pad_token normalization, fixed auto-round quant compat for vLLM/SGLang.

[CI] by @CSY-ModelCloud in #238, #236, #237, #241, #242, #243, #246, #247, #250
[FIX] explicitly call torch.no_grad() by @LRL-ModelCloud in #239
Bitblas update by @Qubitium in #249
[FIX] calib avg for calib dataset arg passed as tensors by @Qubitium, @LRL-ModelCloud in #254, #258
[MODEL] gemma2 27b can load with vLLM now by @LRL-ModelCloud in #257
[OPTIMIZE] to optimize vllm inference, set an environment variable 'VLLM_ATTENTI… by @LRL-ModelCloud in #260
[FIX] hard set batch_size to 1 for 4.43.0 transformer due to compat/regression by @LRL-ModelCloud in #279
FIX vllm llama 3.1 support by @Qubitium in #280
Use better defaults values for quantization config by @Qubitium in #281
[REFRACTOR] Cleanup backend and model_type usage by @LRL-ModelCloud in #276
[FIX] allow auto_round lm_head quantization by @LRL-ModelCloud in #282
[FIX] [MODEL] Llama-3.1-8B-Instruct's eos_token_id is a list by @CSY-ModelCloud in #284
[FIX] add release_vllm_model, and import destroy_model_parallel in release_vllm_model by @LRL-ModelCloud in #288
[FIX] autoround quants compat with vllm/sglang by @Qubitium in #287

Full Changelog: v0.9.8...v0.9.9

Contributors

Qubitium, LRL-ModelCloud, and CSY-ModelCloud

Assets 2

13 Jul 12:55

Qubitium

v0.9.8

0d263f3

GPTQModel v0.9.8

What's Changed

Marlin end-to-end in/out feature padding for max model support
Run quantized models (FORMAT.GPTQ) directly using fast vLLM backend!
Run quantized models (FORMAT.GPTQ) directly using fast SGLang backend!

🚀 🚀 [CORE] Marlin end-to-end in/out feature padding by @LRL-ModelCloud in #183 #192
🚀 🚀 [CORE] Add vLLM Backend for FORMAT.GPTQ by @PZS-ModelCloud in #190
🚀 🚀 [CORE] Add SGLang Backend by @PZS-ModelCloud in #191
🚀 [CORE] Use Triton v2 to pack gptq/gptqv2 formats by @LRL-ModelCloud in #202
✨ [CLEANUP] remove triton warmup by @Qubitium in #200
👾 [FIX] 8bit choosing wrong packer by @Qubitium in #199
✨ [CI] [CLEANUP] Improve Unit Tests by CSY, PSY, and ZYC
✨ [DOC] Consolidate Examples by ZYC in #225

Full Changelog: v0.9.7...v0.9.8

Contributors

Qubitium, PZS-ModelCloud, and LRL-ModelCloud

Assets 2

08 Jul 11:21

Qubitium

v0.9.7

0935662

GPTQModel v0.9.7

What's Changed

🚀 [MODEL] InternLM 2.5 support by @LRL-ModelCloud in #182

Full Changelog: v0.9.6...v0.9.7

Contributors

LRL-ModelCloud

Assets 2

08 Jul 02:59

Qubitium

v0.9.6

4fade4c

GPTQModel v0.9.6

What's Changed

Intel/AutoRound QUANT_METHOD support added for a potentially higher quality quantization with lm_head module quantization support for even more vram reduction: format export to FORMAT.GPTQ for max inference compatibility.

🚀 [CORE] Add AutoRound as Quantizer option by @LRL-ModelCloud in #166
👾 [FIX] [CI] Update test by @CSY-ModelCloud in #177
👾 Cleanup Triton by @Qubitium in #178

Full Changelog: v0.9.5...v0.9.6

Contributors

Qubitium, LRL-ModelCloud, and CSY-ModelCloud

Assets 2

05 Jul 13:48

Qubitium

v0.9.5

f0a1ee8

GPTQModel v0.9.5

What's Changed

Another large update with added support for Intel/Qbits quantization/inference on CPU. Cuda kernels have been fully deprecated in favor of better performing Exllama (v1/v2), Marlin, and Triton kernels.

🚀🚀 [KERNEL] Added Intel QBits support with [2, 3, 4, 8] bits quantization/inference on CPU by @CSY-ModelCloud in #137
✨ [CORE] BaseQuantLinear add SUPPORTED_DEVICES by @ZX-ModelCloud in #174
✨ [DEPRECATION] Remove Backend.CUDA and Backend.CUDA_OLD by @ZX-ModelCloud in #165
👾 [CI] FIX test perplexity by @ZYC-ModelCloud in #160

Full Changelog: v0.9.4...v0.9.5

Contributors

ZX-ModelCloud, ZYC-ModelCloud, and CSY-ModelCloud

Assets 2

04 Jul 05:41

Qubitium

v0.9.4

527cffb

GPTQModel v0.9.4

What's Changed

🚀 [FEATURE] Added Transformers Integration via monkeypatch by @ZX-ModelCloud in #147
👾 [FIX] Typo causing Gemma 2 errors by @LRL-ModelCloud in #158

Full Changelog: v0.9.3...v0.9.4

Contributors

ZX-ModelCloud and LRL-ModelCloud

Assets 2

02 Jul 18:05

Qubitium

v0.9.3

26b3dc0

GPTQModel v0.9.3

What's Changed

🚀 [MODEL] Add Gemma 2 support by @LRL-ModelCloud in #131
🚀 [OTHER] Calculate ppl on gpu by @ZYC-ModelCloud in #135
✨ [REFRACTOR] BaseQuantLinear and avoid using shared QuantLinear cls name by @PZS-ModelCloud in #116
✨ [KERNEL] Bitblas cache stablity by @Qubitium in #129
👾 [FIX] Export TORCH_CUDA_ARCH_LIST in install.sh by @LeiWang1999 in #133
👾 [FIX] Limit Bitblas numexpr thread usage by @Qubitium in #125
👾 [FIX] Revert "Skip opt fc1/fc2 for quantization" due to inference regressions (#118)" by @Qubitium in #149
✨ [REFRACTOR] remove max_memory arg by @CL-ModelCloud in #144
🤖 [CI] Fix test was skipped by @CSY-ModelCloud in #145
🤖 [CI] Add GPU selector for runner by @CSY-ModelCloud in #148

New Contributors

@LeiWang1999 made their first contribution in #133

Full Changelog: v0.9.2...v0.9.3

Contributors

Qubitium, LeiWang1999, and 5 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Releases: ModelCloud/GPTQModel

v1.0.0

What's Changed

Contributors

GPTQModel v0.9.11

What's Changed

Contributors

GPTQModel v0.9.10

What's Changed

Contributors

GPTQModel v0.9.9

What's Changed

Contributors

GPTQModel v0.9.8

What's Changed

Contributors

GPTQModel v0.9.7

What's Changed

Contributors

GPTQModel v0.9.6

What's Changed

Contributors

GPTQModel v0.9.5

What's Changed

Contributors

GPTQModel v0.9.4

What's Changed

Contributors

GPTQModel v0.9.3

What's Changed

New Contributors

Contributors