Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QA] Where to download DeepSeek-R1 gptq model? #1267

Open
Rane2021 opened this issue Feb 12, 2025 · 12 comments
Open

[QA] Where to download DeepSeek-R1 gptq model? #1267

Rane2021 opened this issue Feb 12, 2025 · 12 comments

Comments

@Rane2021
Copy link

How to Download DeepSeek-v1 gptq quanted model?

@Qubitium Qubitium changed the title Where Download DeepSeek-v1 gptq model file? [QA] Where Download DeepSeek-v1 gptq model file? Feb 12, 2025
@Qubitium Qubitium changed the title [QA] Where Download DeepSeek-v1 gptq model file? [QA] Where to download DeepSeek-v1 gptq model? Feb 12, 2025
@Qubitium
Copy link
Collaborator

You can visit https://huggingface.co/models?search=gptq to download our DeepSeek R1 distlled 7B model but we currently do not provide the full R1 model. You can use our toolkit to quantize your down R1 model.

@Qubitium Qubitium changed the title [QA] Where to download DeepSeek-v1 gptq model? [QA] Where to download DeepSeek-R1 gptq model? Feb 12, 2025
@Rane2021
Copy link
Author

You can visit https://huggingface.co/models?search=gptq to download our DeepSeek R1 distlled 7B model but we currently do not provide the full R1 model. You can use our toolkit to quantize your down R1 model.

Deepseek.ai has released the FP8 version. Can our tools work with it directly?
Have you considered releasing a DeepSeek R1 GPTQ quantized version? It should be very popular.

@Qubitium
Copy link
Collaborator

You can use the bf16 version of R1 to GPTQ quantize. We do not have large H100+ gpu to test FP8 model load. 4090 has too little vram.

https://huggingface.co/unsloth/DeepSeek-R1-BF16/tree/main

@Rane2021
Copy link
Author

Great, Thanks!

@Rane2021
Copy link
Author

One more question, have you tested if there are any issues with DeepSeek R1 GPTQ inference? Can it be used for inference with the vllm serve --quantization gptq method?

@Qubitium
Copy link
Collaborator

Qubitium commented Feb 12, 2025

One more question, have you tested if there are any issues with DeepSeek R1 GPTQ inference? Can it be used for inference with the vllm serve --quantization gptq method?

There are no technical reasons why GPTQ quantized R1 cannot run on vLLM or SGLang.

@hsb1995
Copy link

hsb1995 commented Feb 24, 2025

@Qubitium @Rane2021

Hello, I am quite interested in your work. I would like to ask you a few questions:

  1. Does this link provide the model compressed by your algorithm? https://huggingface.co/OPEA/DeepSeek-R1-int4-gptq-sym-inc
  2. I saw in the demo that it supports up to 3-bit quantization. Can it be lower bit?
  3. What is the difference between your work and
    https://github.com/IST-DASLab/gptq
    ? I would like to see the technical details of your paper.

@hsb1995
Copy link

hsb1995 commented Feb 24, 2025

You can visit https://huggingface.co/models?search=gptq to download our DeepSeek R1 distlled 7B model but we currently do not provide the full R1 model. You can use our toolkit to quantize your down R1 model.

Could you please tell me which deepseek-7B model you can compress? If convenient, please provide the link of 7B model.

@Qubitium
Copy link
Collaborator

Qubitium commented Feb 24, 2025

@hsb1995

  1. The link you referred to is a GPTQ quant model made by AutoRound. However, that model has not been benchmarked, that i am aware of so I can't say one or the other how good it is. AutoRound does not use the same algorithm but generated the a model format that is compatible with GPTQ.
  2. Please check https://github.com/ModelCloud/GPTQModel#citation for link to the papers. We use the same original GPTQ alogorithm pioneered by IST-DASLab.
  3. Please check our readme for link to our quantized DeepSeek 7B model with full-benchmarks. https://github.com/ModelCloud/GPTQModel#quality-gptq-4bit-50-bpw-can-match-bf16

@hsb1995
Copy link

hsb1995 commented Feb 24, 2025

https://arxiv.org/abs/2210.17323
Hello professor, is this paper your project's paper?

@hsb1995
Copy link

hsb1995 commented Feb 24, 2025

https://arxiv.org/abs/2210.17323 Hello professor, is this paper your project's paper?

@Qubitium

@Qubitium
Copy link
Collaborator

https://arxiv.org/abs/2210.17323 Hello professor, is this paper your project's paper?

This paper was written by the original researchers of GPTQ. GPTQModel is code, based on the original code from the original research team plus many modifications on usage, inference, and quantization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants