有人尝试过本地部署吗? #828
Replies: 20 comments
-
|
正在尝试,发现没法连接私有化部署的模型 |
Beta Was this translation helpful? Give feedback.
-
可以连接,我现在连了本地部署的 Ollama,用 qwen3:14b 是可以正常工作的,就是 14b 能力太弱,等于请了个弱智来干活。 |
Beta Was this translation helpful? Give feedback.
-
本地ollama不会报错吗,我使用ollama部署qwen coder 2.5接入提示400,模型不支持工具调用 |
Beta Was this translation helpful? Give feedback.
-
你模型名称不正确啊,我的是这样: |
Beta Was this translation helpful? Give feedback.
-
我是通过gguf文件导入的,起的名字就是这个,没有通过ollama pull 的方式下载模型。我在看到你成功接入了qwen3 14B后也更新了ollama,并通过ollama pull 的方式拉取了qwen3 14B,现在报:openAI API error: connection error。请问可以请教一下你这边的部署细节吗 |
Beta Was this translation helpful? Give feedback.
-
我这边私有化部署的Qwencoder ,联系信息是这样的
|
Beta Was this translation helpful? Give feedback.
-
|
I got a ✕ [API Error: OpenAI API error: 404 404 page not found] we ollama. |
Beta Was this translation helpful? Give feedback.
-
Hi, I read the ollama document today and successfully deployed it locally in a completely offline environment. First of all, the BASE URL you are most concerned about is filled in http://localhost:11434/v1/. Note that the last part is 'v1/' instead of 'v1', otherwise you can only log in normally but cannot connect to the API. Other things to note are that if you use the ollama online environment, you can perform ollama pull to update the model. If you use gguf to create a model and import it into ollama, you need to set the template for Tool in the Modelfile used when creating the model. For example, the template for Qwen 2.5 Coder 32B can be found at https://ollama.com/library/qwen2.5-coder:32b/blobs/1e65450c3067. You can also find other models that support tool calls at https://ollama.com/search?c=tools. Note that if you create an ollama model through gguf without using a Modelfile with a Tool template, you will still get a 400 error even if you successfully connect to the API, indicating that the model does not support tool calls. Finally, I created a .env in the project and filled it in: <my model name> is the name displayed in the 'ollama list' command. To add, I am currently using the Qwen 2.5 Coder 32B Instruct FP16 model imported by GGUF. Only the template in the above link is configured in the Modelfile. Now the problem is that Qwen Code will keep repeating the same operation for input, and then force exit after detecting an infinite loop of tool calls. I have not found the specific cause of this problem. |
Beta Was this translation helpful? Give feedback.
-
|
我使用dockerfile创建镜像时报错如下: 你们都是怎么创建镜像的? @pengyichao @xueshuai0922 |
Beta Was this translation helpful? Give feedback.
-
直接用npm装的,没有用docker镜像 |
Beta Was this translation helpful? Give feedback.
-
|
vllm本地部署qwen3-coder,启用tool call解析模板,使用qwen-code正常 |
Beta Was this translation helpful? Give feedback.
-
|
使用tool call能正常接入,但是回答总是无限循环,和线上api效果不一样,这个有哪位遇到解决了么 @Arthur-WWW 请问遇到类似的问题么,我启动的参数是: |
Beta Was this translation helpful? Give feedback.
-
老师好,可以看下你vllm 启动的完整命令吗? 我自己部署了deepseek,现在是支持不了tool call ,很难过~ |
Beta Was this translation helpful? Give feedback.
-
|
能正常调用OLLAMA模型,试过很多模型,大都不能调用工具,只有今天新出的QWEN3:30B,配合的非常好,正常调用工具,但所有模型没有对话历史。查看网络没有相同的问题解决方式,有人能解决一下吗? |
Beta Was this translation helpful? Give feedback.
-
|
我有一个大胆想法: |
Beta Was this translation helpful? Give feedback.
-
|
vllm部署的模型,在调用tool_call的时候,如果是流式输出,结果就会有问题。 所以我这使用会一直有问题,要么重复要不没有结果。不知道还有没有小伙伴有相同的情况? vllm非流式输出就没问题,但是qwen-code默认是用流式的 |
Beta Was this translation helpful? Give feedback.
-
|
ollama 的qwen3:30b不支持tool call。可以使用llama.cpp 部署qwen3:30b GGUF 来给qwen code 提供服务。另外自己遇到vllm部署qwen3:30b 但是qwen code无法调用服务 https://github.com/QwenLM/qwen-code/issues/132。 因为vllm版本时0.9.x的使用的--enable-auto-tool-choice --tool-call-parser hermes 参数指定tool call. 有人知道如何配置vllm吗 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
vllm貌似需要GPU,我没试过... 有个问题请教一下:我配置了llama.cpp 作为本地后端:llama-server -m /7t/ai/models/Qwen3/Qwen3-Coder-30B-A3B/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf -t 64 --host 0.0.0.0 --port 7788 -c 81920 --jinja -b 8192 -tb 128 发现qwen code发过来的请求prompt都非常大,比如打两个字‘你好’,都能有12019个token输入:slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 81920, n_keep = 0, n_prompt_tokens = 12019 请问有这个问题吗?怎么解决呢? 谢谢! |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
-
有人尝试过用docker部署qwen-code,用vllm部署Qwen3-coder模型,纯本地方案吗,效果咋样呢?
Beta Was this translation helpful? Give feedback.
All reactions