-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MODEL] Support OpenVLA
#1350
Comments
@billamiable We support image quantization for Qwen2-VL and Ovis-VL models. Please check our https://github.com/ModelCloud/GPTQModel/blob/main/tests/models/test_qwen2_vl.py Other VL models require PR, if you are willing to help add the support please submit a PR and we can add the support for others to use. VL model require special processing due to separate tokenization requirement for image pixels. Also, VL model quantization is only supported when using |
Thanks for the timely response! By saying "very very simple quantization" using Update from my side: I tried to delete all generated tokenizer related files and directly copy those tokenizer related files from the original unquantized model (see files below). added_tokens.json
special_tokens_map.json
tokenizer_config.json
tokenizer.json
tokenizer.model Surprisingly, it is able to run inference. However, the speed is much slower than before. Before, the average speed is ~1.8s/token, however, now it takes ~5.1s/token. I think it might be caused by only quantizing the layers inside llama because I defined |
@billamiable You need to provide us with code-snippits on how you quantize and inference your model, inclduing any changes you made to GPTQModel. I do not have enough information to answer your question since your model, code, env is different from ours. I noticed you work for Intel. If this is an internal project with Intel, we can work together to get OpenVLA officialy added. We have active working relationship with Intel AI team in (Shanghai) so working with BeiJing branch just completes the circle. =) |
I've attached the code above, both for quantization and inference. I made no change to GPTQModel. |
@billamiable I will reach out to you via Teams from |
OpenVLA
Hi, I would like to try GPTQ on OpenVLA. I found that most of the examples are using purely language input (i.e., https://github.com/ModelCloud/GPTQModel/blob/main/examples/quantization/transformers_usage.py). Do we support model with visual and language inputs?
I tried with following scripts to quantize:
Then I tried to inference using the quantized model on Intel A770 with torch 2.5.1 using the following script:
but I got following error:
It seems to be caused by unexpected tokenizer input. Any idea how to fix this? Thanks!
The text was updated successfully, but these errors were encountered: