Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the details of LLM.int8 #1400

Closed
bg51717 opened this issue Oct 23, 2024 · 5 comments
Closed

Questions about the details of LLM.int8 #1400

bg51717 opened this issue Oct 23, 2024 · 5 comments

Comments

@bg51717
Copy link

bg51717 commented Oct 23, 2024

I'm curious about LLM.int8 seems to require input X to determine which weights need to retain fp16 precision and which can be quantized to int8, but models can be quantized directly by bitsandbytes without input information. Is it possible that all models have their Emergent Features in the same location?
Thanks for your reply!

@Titus-von-Koeller
Copy link
Collaborator

cc @matthewdouglas

@matthewdouglas
Copy link
Member

LLM.int8 quantizes all of the weights to int8 precision. When activations (input features) are also quantized to int8, outlier channels are held back in fp16. Instead of requiring a copy of the original weights, those corresponding to the activation outliers are dequantized for computation in fp16 while the rest of the computations happen in int8.

In the decomposition phase, the $X_{F16}$ inputs are retained, but the $W_{F16}$ may have some quantization error. The important part is the focus on the emergence of outliers in the activations.

image

@bg51717
Copy link
Author

bg51717 commented Oct 24, 2024

So my understanding is: llm.int8 directly quantizes the weights W into int8. During the forward pass, it identifies the dimensions corresponding to the outliers from the input X. Then, it decomposes the input. The corresponding part of the weights is dequantized back to fp16, and the subsequent calculations are performed.

@matthewdouglas
Copy link
Member

@bg51717 That's correct!

@Titus-von-Koeller
Copy link
Collaborator

@bg51717 Does that answer you questions fully? Please close the issue if yes. Thanks 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants