Questions about the details of LLM.int8 #1400

bg51717 · 2024-10-23T14:12:27Z

I'm curious about LLM.int8 seems to require input X to determine which weights need to retain fp16 precision and which can be quantized to int8, but models can be quantized directly by bitsandbytes without input information. Is it possible that all models have their Emergent Features in the same location?
Thanks for your reply!

Titus-von-Koeller · 2024-10-23T17:53:39Z

cc @matthewdouglas

matthewdouglas · 2024-10-24T02:07:43Z

LLM.int8 quantizes all of the weights to int8 precision. When activations (input features) are also quantized to int8, outlier channels are held back in fp16. Instead of requiring a copy of the original weights, those corresponding to the activation outliers are dequantized for computation in fp16 while the rest of the computations happen in int8.

In the decomposition phase, the $X_{F16}$ inputs are retained, but the $W_{F16}$ may have some quantization error. The important part is the focus on the emergence of outliers in the activations.

bg51717 · 2024-10-24T02:41:06Z

So my understanding is: llm.int8 directly quantizes the weights W into int8. During the forward pass, it identifies the dimensions corresponding to the outliers from the input X. Then, it decomposes the input. The corresponding part of the weights is dequantized back to fp16, and the subsequent calculations are performed.

matthewdouglas · 2024-10-24T14:47:26Z

@bg51717 That's correct!

Titus-von-Koeller · 2024-10-24T15:49:38Z

@bg51717 Does that answer you questions fully? Please close the issue if yes. Thanks 🤗

bg51717 closed this as completed Oct 25, 2024

matthewdouglas mentioned this issue Jan 22, 2025

where are the outliers stored in LLM.int8 quantization for inference suing transformers library on AMD GPU? #1320

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the details of LLM.int8 #1400

Questions about the details of LLM.int8 #1400

bg51717 commented Oct 23, 2024

Titus-von-Koeller commented Oct 23, 2024

matthewdouglas commented Oct 24, 2024

bg51717 commented Oct 24, 2024

matthewdouglas commented Oct 24, 2024

Titus-von-Koeller commented Oct 24, 2024

Questions about the details of LLM.int8 #1400

Questions about the details of LLM.int8 #1400

Comments

bg51717 commented Oct 23, 2024

Titus-von-Koeller commented Oct 23, 2024

matthewdouglas commented Oct 24, 2024

bg51717 commented Oct 24, 2024

matthewdouglas commented Oct 24, 2024

Titus-von-Koeller commented Oct 24, 2024