-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
使用这个 sh scripts/run_assistant_server.sh 部署模型之后,会不会比VLLM速度慢很多 #506
Comments
不会,一样快,底层就是调用vllm |
sh scripts/run_assistant_server.sh --served-model-name Qwen2-7B-Instruct --model path/to/weights 这条命令怎么修改模型路径的位置,因为使用这条命令,读取的模型位置会自动跳转到modelscope下载的位置,而不是我的本地位置,我本地的模型在自己的路径里,所以会出现requests.exceptions.HTTPError: The request model: /workspace/model/llm/Qwen/Qwen2-7B-Instruct/ does not exist!的报错 |
@zzhangpurdue vi /opt/conda/lib/python3.10/site-packages/vllm/config.py |
之前我们尝试的时候确实是利用modelscope的下载地址进行测试的,没有考虑非modelscope的地址,这里我们看看如何修改。 |
刚试了一下,把模型挪出modelscope的下载路径然后也还是没有复现这个问题,是否可以告诉我一下你的vllm版本? |
GPU环境镜像(python3.10),ubuntu22.04-cuda12.1.0-py310-torch2.1.2-tf2.14.0-1.13.1 。 这个官方镜像里的,vllm是0.3.0。 |
@zzhangpurdue |
我这里run脚本的时候默认 export VLLM_USE_MODELSCOPE=false 应该是可以解决这个问题。 |
sh scripts/run_assistant_server.sh --served-model-name Qwen2-7B-Instruct --model path/to/weights
这个比VLLM推理速度慢吗
The text was updated successfully, but these errors were encountered: