-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Mixtral inference encounters error about tensor location #963
Comments
您好,我们已经很长时间没有试验过在 Mixtral 上是否能运行,并且这个报错看起来是因为权重和输入不在一个设备上导致的 |
Okay, then what should I do in the config or parameters, in order for it to end up in the proper positions? |
And also, if I use local chat module instead of the ktransformers server module it just works. |
It seems like input tokens end up on different devices on different backends somehow |
I also have noticed that implementation of generation loop differs between server backend and local chat module, one passes input ids to model directrly, while other uses prefill_and_generate
local_chat
|
Changing ktransformers/server/backend/interfaces/ktransformers.py from line 115
to
Seems to fix the problem. I think generation methods need some code unification. |
Checklist
Describe the bug
Loading mixtral with default config works, but any message result in the following error
Reproduction
Model is https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF/blob/main/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf
command is
ktransformers --model_path mistralai/Mixtral-8x7B-Instruct-v0.1 --gguf_path /home/pl752/mxtrl8b/ --optimize_config_path ktransformers/optimize/optimize_rules/Mixtral.yaml --web True
config is default
Environment
Ubuntu 24.10, python 3.11, torch 2.8 nightly, cuda 12.8, rtx 3060m, ryzen 5800h, ktransformers latest (git main)
pip list
The text was updated successfully, but these errors were encountered: