Release v0.4.0 · vllm-project/llm-compressor

What's Changed

Record config file name as test suite property by @dbarbuzzi in #947
Update setup.py by @dsikka in #975
Depreciate OBCQ Helpers by @kylesayrs in #977
KV Cache, E2E Tests by @horheynm in #742
Use 1 GPU for offloading examples by @dsikka in #979
Replace tokenizer with processor by @kylesayrs in #955
Revert "KV Cache, E2E Tests (#742)" by @dsikka in #989
Fix SmoothQuant offload bug by @dsikka in #978
Add LM Eval Configs by @dsikka in #980
Fix test_model_reload test by @kylesayrs in #1005
Calibration and Compression Contexts by @kylesayrs in #998
Add info for clarity by @dsikka in #1009
[Bugfix] Pass trust_remote_code_model=True for deepseek examples by @dsikka in #1012
Vision Datasets by @kylesayrs in #943
Add example for fp8 kv cache of phi3.5 and gemma2 by @mgoin in #991
Update ReadMe and test for cpu_offloading by @dsikka in #1013
Adding amdsmi for AMD gpus by @citrix123 in #1018
CompressionLogger add time units by @kylesayrs in #1026
patch_tied_tensors_bug: support malformed model definitions by @kylesayrs in #1014
Add: 2of4 example with/without fp8 quantization by @rahul-tuli in #1033
Remove unccessary step in 2of4 Example by @dsikka in #1034
Remove Neural Magic copyright from files by @kylesayrs in #992
VLM Support via GPTQ Hooks and Data Pipelines by @kylesayrs in #914
[E2E Testing] KV-Cache by @horheynm in #1004
[E2E Testing] Add recipe check vllm e2e by @horheynm in #929
[MoE] GPTQ compress using callback not hook by @kylesayrs in #1049
Explicit dataset tokenizer text kwarg by @kylesayrs in #1031
Fix smoothquant ignore, Fix typing, Add glm mappings by @kylesayrs in #1015
[Test Fix] Quant model reload by @horheynm in #974
Remove old examples by @dsikka in #1062
VLM: Fix typo bug in TraceableLlavaForConditionalGeneration by @kylesayrs in #1065
Add tests for "examples/sparse_2of4_[...]" by @dbarbuzzi in #1067
VLM Image Examples by @kylesayrs in #1064
Add quick warning for DeepSeek with transformers 4.48.0 by @dsikka in #1066
[KV Cache] kv-cache end to end unit tests by @horheynm in #141
[E2E Testing] Fix HF upload by @horheynm in #1061
[Test Fix] Fix/update test_run_compressed by @horheynm in #970
Revert "[Test Fix] Fix/update test_run_compressed" by @mgoin in #1071
Sparse 2:4 + FP8 Quantization e2e vLLM tests by @dsikka in #1073
[Test Patch] Remove redundant code for "Fix/update test_run_compressed" by @horheynm in #1072
bump; set ct version by @dsikka in #1076

New Contributors

@citrix123 made their first contribution in #1018

Full Changelog: 0.3.1...0.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0

What's Changed

New Contributors

Contributors