[Usage] How to do KV cache quantization? #111

CharlesRiggins · 2024-08-26T07:38:43Z

I followed this example Activation quantization to fp8 and got an FP8 quantized model. I also want to run the model with FP8 E4M3 KV cache. So my question is how do I set the kv_cache_scheme when I quantize the model?

The text was updated successfully, but these errors were encountered:

mgoin · 2024-08-26T15:22:54Z

@CharlesRiggins I am working on this right now and will share the PR for the example later today.

For a quick example, it is just a new entry inside of QuantizationModifier so you can add it to a recipe like this:

recipe = """
quant_stage:
    quant_modifiers:
        QuantizationModifier:
            ignore: ["lm_head"]
            config_groups:
                group_0:
                    weights:
                        num_bits: 8
                        type: float
                        strategy: tensor
                        dynamic: false
                        symmetric: true
                    input_activations:
                        num_bits: 8
                        type: float
                        strategy: tensor
                        dynamic: false
                        symmetric: true
                    targets: ["Linear"]
            kv_cache_scheme:
                num_bits: 8
                type: float
                strategy: tensor
                dynamic: false
                symmetric: true
"""

mgoin · 2024-08-26T16:22:51Z

Take a look here: #113

CharlesRiggins · 2024-08-27T08:49:32Z

Great. I tried the example and it worked. Thank you!

…ight quant param updates (vllm-project#111)

mgoin mentioned this issue Aug 26, 2024

Add FP8 KV Cache quant example #113

Merged

CharlesRiggins closed this as completed Aug 27, 2024

markmc pushed a commit to markmc/llm-compressor that referenced this issue Nov 13, 2024

quatization lifecycle - disable forward pass override + helper for we…

1d4a39f

…ight quant param updates (vllm-project#111)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage] How to do KV cache quantization? #111

[Usage] How to do KV cache quantization? #111

CharlesRiggins commented Aug 26, 2024

mgoin commented Aug 26, 2024

mgoin commented Aug 26, 2024

CharlesRiggins commented Aug 27, 2024

[Usage] How to do KV cache quantization? #111

[Usage] How to do KV cache quantization? #111

Comments

CharlesRiggins commented Aug 26, 2024

mgoin commented Aug 26, 2024

mgoin commented Aug 26, 2024

CharlesRiggins commented Aug 27, 2024