有人尝试过本地部署吗？ #828

Yuxin-Yu · 2025-07-23T08:28:21Z

Yuxin-Yu
Jul 23, 2025

有人尝试过用docker部署qwen-code，用vllm部署Qwen3-coder模型，纯本地方案吗，效果咋样呢？

xueshuai0922 · 2025-07-23T09:03:30Z

xueshuai0922
Jul 23, 2025

正在尝试，发现没法连接私有化部署的模型

0 replies

pengyichao · 2025-07-23T12:27:52Z

pengyichao
Jul 23, 2025

正在尝试，发现没法连接私有化部署的模型

可以连接，我现在连了本地部署的 Ollama，用 qwen3:14b 是可以正常工作的，就是 14b 能力太弱，等于请了个弱智来干活。

0 replies

Air1228 · 2025-07-23T13:00:08Z

Air1228
Jul 23, 2025

正在尝试，发现没法连接私有化部署的模型

可以连接，我现在连了本地部署的 Ollama，用 qwen3:14b 是可以正常工作的，就是 14b 能力太弱，等于请了个弱智来干活。

本地ollama不会报错吗，我使用ollama部署qwen coder 2.5接入提示400，模型不支持工具调用
export OPENAI_API_KEY="ollama"
export OPENAI_BASE_URL="http://localhost:11434/v1"
export OPENAI_MODEL="qwen"

0 replies

pengyichao · 2025-07-23T16:33:08Z

pengyichao
Jul 23, 2025

正在尝试，发现没法连接私有化部署的模型

可以连接，我现在连了本地部署的 Ollama，用 qwen3:14b 是可以正常工作的，就是 14b 能力太弱，等于请了个弱智来干活。

本地ollama不会报错吗，我使用ollama部署qwen coder 2.5接入提示400，模型不支持工具调用 export OPENAI_API_KEY="ollama" export OPENAI_BASE_URL="http://localhost:11434/v1" export OPENAI_MODEL="qwen"

你模型名称不正确啊，我的是这样：
export OPENAI_MODEL="qwen3:14b"

0 replies

Air1228 · 2025-07-23T16:38:58Z

Air1228
Jul 23, 2025

正在尝试，发现没法连接私有化部署的模型

可以连接，我现在连了本地部署的 Ollama，用 qwen3:14b 是可以正常工作的，就是 14b 能力太弱，等于请了个弱智来干活。

本地ollama不会报错吗，我使用ollama部署qwen coder 2.5接入提示400，模型不支持工具调用 export OPENAI_API_KEY="ollama" export OPENAI_BASE_URL="http://localhost:11434/v1" export OPENAI_MODEL="qwen"

你模型名称不正确啊，我的是这样： export OPENAI_MODEL="qwen3:14b"

我是通过gguf文件导入的，起的名字就是这个，没有通过ollama pull 的方式下载模型。我在看到你成功接入了qwen3 14B后也更新了ollama，并通过ollama pull 的方式拉取了qwen3 14B，现在报：openAI API error: connection error。请问可以请教一下你这边的部署细节吗

0 replies

xueshuai0922 · 2025-07-24T02:19:44Z

xueshuai0922
Jul 24, 2025

正在尝试，发现没法连接私有化部署的模型

可以连接，我现在连了本地部署的 Ollama，用 qwen3:14b 是可以正常工作的，就是 14b 能力太弱，等于请了个弱智来干活。

我这边私有化部署的Qwencoder ,联系信息是这样的

0 replies

pszymkowiak · 2025-07-24T12:58:12Z

pszymkowiak
Jul 24, 2025

I got a ✕ [API Error: OpenAI API error: 404 404 page not found] we ollama.
What did you put in base URL ?

0 replies

Air1228 · 2025-07-24T13:16:04Z

Air1228
Jul 24, 2025

I got a ✕ [API Error: OpenAI API error: 404 404 page not found] we ollama. What did you put in base URL ?

Hi, I read the ollama document today and successfully deployed it locally in a completely offline environment. First of all, the BASE URL you are most concerned about is filled in http://localhost:11434/v1/. Note that the last part is 'v1/' instead of 'v1', otherwise you can only log in normally but cannot connect to the API.

Other things to note are that if you use the ollama online environment, you can perform ollama pull to update the model. If you use gguf to create a model and import it into ollama, you need to set the template for Tool in the Modelfile used when creating the model. For example, the template for Qwen 2.5 Coder 32B can be found at https://ollama.com/library/qwen2.5-coder:32b/blobs/1e65450c3067. You can also find other models that support tool calls at https://ollama.com/search?c=tools. Note that if you create an ollama model through gguf without using a Modelfile with a Tool template, you will still get a 400 error even if you successfully connect to the API, indicating that the model does not support tool calls.

Finally, I created a .env in the project and filled it in:
export OPENAI_API_KEY="ollama"
export OPENAI_BASE_URL="http://localhost:11434/v1/"
export OPENAI_MODEL="my model name"

<my model name> is the name displayed in the 'ollama list' command.

To add, I am currently using the Qwen 2.5 Coder 32B Instruct FP16 model imported by GGUF. Only the template in the above link is configured in the Modelfile. Now the problem is that Qwen Code will keep repeating the same operation for input, and then force exit after detecting an infinite loop of tool calls. I have not found the specific cause of this problem.

0 replies

Yuxin-Yu · 2025-07-25T02:08:40Z

Yuxin-Yu
Jul 25, 2025
Author

我使用dockerfile创建镜像时报错如下：

 => ERROR [4/6] COPY packages/cli/dist/google-gemini-cli-*.tgz /usr/local/share/npm-global/gemini-cli.tgz                0.1s
------
 > [4/6] COPY packages/cli/dist/google-gemini-cli-*.tgz /usr/local/share/npm-global/gemini-cli.tgz:
------
Dockerfile:43
--------------------
  41 |
  42 |     # install gemini-cli and clean up
  43 | >>> COPY packages/cli/dist/google-gemini-cli-*.tgz /usr/local/share/npm-global/gemini-cli.tgz
  44 |     COPY packages/core/dist/google-gemini-cli-core-*.tgz /usr/local/share/npm-global/gemini-core.tgz
  45 |     RUN npm install -g /usr/local/share/npm-global/gemini-cli.tgz /usr/local/share/npm-global/gemini-core.tgz \
--------------------
ERROR: failed to solve: lstat /packages/cli/dist: no such file or directory

你们都是怎么创建镜像的？ @pengyichao @xueshuai0922

0 replies

xueshuai0922 · 2025-07-25T03:50:17Z

xueshuai0922
Jul 25, 2025

我使用dockerfile创建镜像时报错如下：

 => ERROR [4/6] COPY packages/cli/dist/google-gemini-cli-*.tgz /usr/local/share/npm-global/gemini-cli.tgz                0.1s
------
 > [4/6] COPY packages/cli/dist/google-gemini-cli-*.tgz /usr/local/share/npm-global/gemini-cli.tgz:
------
Dockerfile:43
--------------------
  41 |
  42 |     # install gemini-cli and clean up
  43 | >>> COPY packages/cli/dist/google-gemini-cli-*.tgz /usr/local/share/npm-global/gemini-cli.tgz
  44 |     COPY packages/core/dist/google-gemini-cli-core-*.tgz /usr/local/share/npm-global/gemini-core.tgz
  45 |     RUN npm install -g /usr/local/share/npm-global/gemini-cli.tgz /usr/local/share/npm-global/gemini-core.tgz \
--------------------
ERROR: failed to solve: lstat /packages/cli/dist: no such file or directory

你们都是怎么创建镜像的？ @pengyichao @xueshuai0922

直接用npm装的，没有用docker镜像

0 replies

Arthur-WWW · 2025-07-25T06:58:56Z

Arthur-WWW
Jul 25, 2025

vllm本地部署qwen3-coder，启用tool call解析模板，使用qwen-code正常

0 replies

ltm920716 · 2025-07-25T09:16:34Z

ltm920716
Jul 25, 2025

使用tool call能正常接入，但是回答总是无限循环，和线上api效果不一样，这个有哪位遇到解决了么

@Arthur-WWW 请问遇到类似的问题么，我启动的参数是：

vllm serve /data/models --gpu-memory-utilization 0.95 --max-model-len 256000 --served-model-name qwen-code --tensor-parallel-size 16 --pipeline-parallel-size 1 --trust-remote-code --enable-auto-tool-choice --tool-call-parser hermes

0 replies

adogwangwang · 2025-07-29T07:08:02Z

adogwangwang
Jul 29, 2025

vllm本地部署qwen3-coder，启用tool call解析模板，使用qwen-code正常

老师好，可以看下你vllm 启动的完整命令吗？我自己部署了deepseek，现在是支持不了tool call ，很难过~

0 replies

arrmee-wt · 2025-08-01T12:59:43Z

arrmee-wt
Aug 1, 2025

能正常调用OLLAMA模型，试过很多模型，大都不能调用工具，只有今天新出的QWEN3:30B,配合的非常好，正常调用工具，但所有模型没有对话历史。查看网络没有相同的问题解决方式，有人能解决一下吗？

0 replies

bq2015 · 2025-08-05T08:16:14Z

bq2015
Aug 5, 2025

我有一个大胆想法：
既然请qwen3:14b这小同学干活，哪怕它比较弱智咱也不嫌弃。ctrl +C V 总是优秀的吧。于是，通过逆向，让gpt4 来指导他干活。
大概的流程是：在本地模型qwen3:14b解决不了问题时，就让gpt4来返回内容存入本地X文件。然后让本地模型qwen3:14b去读取X文件里的内容进行干活（gpt4这位导师的指示）。
于是1个7*24时无怨无悔的码农就此诞生，关键是免费！！！

0 replies

hediyuan · 2025-08-06T02:51:57Z

hediyuan
Aug 6, 2025

vllm部署的模型，在调用tool_call的时候，如果是流式输出，结果就会有问题。所以我这使用会一直有问题，要么重复要不没有结果。不知道还有没有小伙伴有相同的情况？ vllm非流式输出就没问题，但是qwen-code默认是用流式的

0 replies

alexhegit · 2025-08-17T23:46:51Z

alexhegit
Aug 17, 2025

ollama 的qwen3:30b不支持tool call。可以使用llama.cpp 部署qwen3:30b GGUF 来给qwen code 提供服务。另外自己遇到vllm部署qwen3:30b 但是qwen code无法调用服务 https://github.com/QwenLM/qwen-code/issues/132。因为vllm版本时0.9.x的使用的--enable-auto-tool-choice --tool-call-parser hermes 参数指定tool call.

有人知道如何配置vllm吗

0 replies

Miaochunxu · 2025-09-11T06:10:40Z

Miaochunxu
Sep 11, 2025

🎉🎉 本地尝试成功了(vllm >= 0.10.0)

模型：modelscope 上的 “Qwen/Qwen3-Coder-30B-A3B-Instruct”

环境与模型启动

# 一个新的conda环境（python3.12），安装vllm，版本很关键
pip install "vllm>=0.10.0" -i https://mirrors.aliyun.com/pypi/simple/

# 启动
CUDA_VISIBLE_DEVICES=0,1 vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct --served_model_name qwen3-coder-30b --tensor-parallel-size 2 --enable-auto-tool-choice --tool-call-parser qwen3_coder --host 0.0.0.0

关键点是（注意vllm版本）：--tool-call-parser qwen3_coder
我本地实际是把模型先下载下来，然后指定的本地路径 ...vllm serve /mnt/nvm1.../Qwen/Qwen3-Coder-30B-A3B-Instruct ...

qwen code

.env 信息

OPENAI_API_KEY=empty
OPENAI_BASE_URL=http://xx.xx.xx.xx:8000/v1
OPENAI_MODEL=qwen3-coder-30b

运行图片（生成的html浏览器中可运行）：

0 replies

zhudy · 2025-09-17T07:47:32Z

zhudy
Sep 17, 2025

ollama 的qwen3:30b不支持tool call。可以使用llama.cpp 部署qwen3:30b GGUF 来给qwen code 提供服务。另外自己遇到vllm部署qwen3:30b 但是qwen code无法调用服务 https://github.com/QwenLM/qwen-code/issues/132。因为vllm版本时0.9.x的使用的--enable-auto-tool-choice --tool-call-parser hermes 参数指定tool call.

有人知道如何配置vllm吗

vllm貌似需要GPU，我没试过... 有个问题请教一下：我配置了llama.cpp 作为本地后端：llama-server -m /7t/ai/models/Qwen3/Qwen3-Coder-30B-A3B/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf -t 64 --host 0.0.0.0 --port 7788 -c 81920 --jinja -b 8192 -tb 128 发现qwen code发过来的请求prompt都非常大，比如打两个字‘你好’，都能有12019个token输入：slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 81920, n_keep = 0, n_prompt_tokens = 12019 请问有这个问题吗？怎么解决呢？谢谢！

0 replies

有人尝试过本地部署吗？ #828

Uh oh!

Replies: 20 comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Yuxin-Yu Jul 25, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

🎉🎉 本地尝试成功了(vllm >= 0.10.0)

环境与模型启动

qwen code

Uh oh!

Yuxin-Yu
Jul 25, 2025
Author