What's Changed
- Record config file name as test suite property by @dbarbuzzi in #947
- Update setup.py by @dsikka in #975
- Depreciate OBCQ Helpers by @kylesayrs in #977
- KV Cache, E2E Tests by @horheynm in #742
- Use 1 GPU for offloading examples by @dsikka in #979
- Replace tokenizer with processor by @kylesayrs in #955
- Revert "KV Cache, E2E Tests (#742)" by @dsikka in #989
- Fix SmoothQuant offload bug by @dsikka in #978
- Add LM Eval Configs by @dsikka in #980
- Fix
test_model_reload
test by @kylesayrs in #1005 - Calibration and Compression Contexts by @kylesayrs in #998
- Add info for clarity by @dsikka in #1009
- [Bugfix] Pass
trust_remote_code_model=True
for deepseek examples by @dsikka in #1012 - Vision Datasets by @kylesayrs in #943
- Add example for fp8 kv cache of phi3.5 and gemma2 by @mgoin in #991
- Update ReadMe and test for cpu_offloading by @dsikka in #1013
- Adding amdsmi for AMD gpus by @citrix123 in #1018
- CompressionLogger add time units by @kylesayrs in #1026
- patch_tied_tensors_bug: support malformed model definitions by @kylesayrs in #1014
- Add: 2of4 example with/without fp8 quantization by @rahul-tuli in #1033
- Remove unccessary step in 2of4 Example by @dsikka in #1034
- Remove Neural Magic copyright from files by @kylesayrs in #992
- VLM Support via GPTQ Hooks and Data Pipelines by @kylesayrs in #914
- [E2E Testing] KV-Cache by @horheynm in #1004
- [E2E Testing] Add recipe check vllm e2e by @horheynm in #929
- [MoE] GPTQ compress using callback not hook by @kylesayrs in #1049
- Explicit dataset tokenizer
text
kwarg by @kylesayrs in #1031 - Fix smoothquant ignore, Fix typing, Add glm mappings by @kylesayrs in #1015
- [Test Fix] Quant model reload by @horheynm in #974
- Remove old examples by @dsikka in #1062
- VLM: Fix typo bug in TraceableLlavaForConditionalGeneration by @kylesayrs in #1065
- Add tests for "examples/sparse_2of4_[...]" by @dbarbuzzi in #1067
- VLM Image Examples by @kylesayrs in #1064
- Add quick warning for DeepSeek with transformers 4.48.0 by @dsikka in #1066
- [KV Cache] kv-cache end to end unit tests by @horheynm in #141
- [E2E Testing] Fix HF upload by @horheynm in #1061
- [Test Fix] Fix/update test_run_compressed by @horheynm in #970
- Revert "[Test Fix] Fix/update test_run_compressed" by @mgoin in #1071
- Sparse 2:4 + FP8 Quantization e2e vLLM tests by @dsikka in #1073
- [Test Patch] Remove redundant code for "Fix/update test_run_compressed" by @horheynm in #1072
- bump; set ct version by @dsikka in #1076
New Contributors
- @citrix123 made their first contribution in #1018
Full Changelog: 0.3.1...0.4.0