-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't load a just-built Pixtral quant; RuntimeError: start (0) + length (1280) exceeds dimension size (1024). #1127
Comments
I suspect this is related to the most recent updates to pixtral by vllm and transformers. You may have to update to the most recent transformers version. Will attempt to verify on my side |
fwiw, I couldn't get it to build unless I was on |
Same error with Transformers 0de15c988b0d27758ce360adb2627e9ea99e91b3 |
I was able to replicate this issue, working on a fix |
This is an ongoing issue with saving the pixtral config, being tracked here In the meantime, you can patch you config with these options
Then run with vllm from vllm import LLM;
llm = LLM(
"/home/kyle/llm-compressor/pixtral-12b-W4A16-G128",
gpu_memory_utilization=0.95,
max_model_len=8192
) |
Thanks, adding |
Describe the bug
Just built a Pixtral quant using the example script and git HEAD of llm-compressor. Can't load it in vLLM head, get
RuntimeError: start (0) + length (1280) exceeds dimension size (1024).
Expected behavior
Expected model to run correctly.
Environment
Include all relevant environment information:
f7245c8
]: caee1c8To Reproduce
Exact steps to reproduce the behavior:
Build a Pixtral quant and observe that vLLM can't load it.
Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.
Additional context
Add any other context about the problem here. Also include any relevant files.
The text was updated successfully, but these errors were encountered: