A flexible toolkit for customizable transformer model quantization
QuantKit is a modular and extensible framework for building and exporting quantized transformer models with fine-grained control. Whether you're targeting edge deployment, reducing inference latency, or experimenting with quantization strategies, QuantKit gives you all the knobs.
Designed with researchers and engineers in mind, QuantKit supports layer-wise quantization, asymmetric/symmetric schemes, 4/8-bit precision, and LoRA integration for efficient fine-tuning.
- β Layer-wise, selective, or full-model quantization
- π’ Supports 4-bit, 8-bit, mixed-precision
- βοΈ Implements symmetric and asymmetric quantization
- π§ Compatible with Hugging Face
transformers
andbitsandbytes
- π¦ Easily save/load quantized models
- π€ Works with LoRA/PEFT for fine-tuning
- π§ͺ Offers a Python API and CLI for full control
git clone https://github.com/your-username/quantkit.git
cd quantkit
pip install -e .