Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAI-API-compatible model not support streaming mode #12143

Closed
5 tasks done
jifei opened this issue Dec 27, 2024 · 8 comments · Fixed by #12171
Closed
5 tasks done

OpenAI-API-compatible model not support streaming mode #12143

jifei opened this issue Dec 27, 2024 · 8 comments · Fixed by #12171

Comments

@jifei
Copy link
Contributor

jifei commented Dec 27, 2024

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.14.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

My-LLM is deployed by MS-Swift, which directly supports streaming mode via POST requests to My-LLM. When using Dify for blocking, it is successful; however, streaming fails. Additionally, it is worth noting that an older version of Dify works without issue.

Fail image and log
image[on_llm_before_invoke]
Model: my-llm
Parameters:
Stream: True
User: eea58032-acf5-4b50-8b89-8b14c867e4c7
Prompt messages:
role: user
content: hello

[on_llm_new_chunk]2024-12-27 02:41:52,791.791 DEBUG [Thread-225 (_generate_worker)] [connectionpool.py:243] - Starting new HTTP connection (1): 10.150.60.47:8000
2024-12-27 02:41:52,910.910 DEBUG [Thread-225 (_generate_worker)] [connectionpool.py:546] - http://10.150.60.47:8000 "POST /v1/chat/completions HTTP/11" 200 None
2024-12-27 02:41:53,331.331 INFO [Thread-224 (process_request_thread)] [_internal.py:97] - 172.20.0.9 - - [27/Dec/2024 02:41:53] "POST /console/api/apps/31001cf9-6636-467c-8591-069c106c856d/chat-messages HTTP/1.1" 200 -

[on_llm_after_invoke]
Content:
Model: my-llm
Usage: prompt_tokens=1 prompt_unit_price=Decimal('0') prompt_price_unit=Decimal('0') prompt_price=Decimal('0E-7') completion_tokens=0 completion_unit_price=Decimal('0') completion_price_unit=Decimal('0') completion_price=Decimal('0E-7') total_tokens=1 total_price=Decimal('0E-7') currency='USD' latency=0.5560645920049865
System Fingerprint: None

✔️ Expected Behavior

No response

❌ Actual Behavior

no response

Copy link

dosubot bot commented Dec 27, 2024

Hey @jifei! I'm here to help you with bugs, questions, and more. Let's work together to solve the issue you're facing.

I found a similar unsolved discussion regarding the Azure OpenAI API Stream options setting [1]. This might be related to the issue you're experiencing with streaming mode not working in Dify version 0.14.2.

For your specific issue, ensure that the stream_function_calling and stream_mode_delimiter settings are correctly configured in your openai_api_compatible.yaml file. Set stream_function_calling to supported and configure the stream_mode_delimiter as needed [2]. Additionally, note that tool calls are not supported in stream mode, and if tools are provided, an InvokeBadRequestError will be raised [3].

If these configurations are correct and the issue persists, it might be beneficial to compare the configurations and code between the older working version and the current version to identify any differences that could be causing the problem.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@jifei
Copy link
Contributor Author

jifei commented Dec 27, 2024

Dify 0.8.3 is right
image

@yihong0618
Copy link
Contributor

hi is this a workflow?
can you share your dsl?

@jifei
Copy link
Contributor Author

jifei commented Dec 27, 2024

hi is this a workflow? can you share your dsl?

Conversation Assistant.

@jifei
Copy link
Contributor Author

jifei commented Dec 31, 2024

I have identified the cause of the issue, which arose during the deployment of the Qwen2.5 model experiment with vllm version 0.5.1 and ms-swift version 2.4.2.post1, specifically when using the stream mode. The returned content is as follows:
data:{"model": "my-llm", "choices": [{"index": 0, "delta": {"role": "assistant", "content": "Hello! How ", "tool_calls": null}, "finish_reason": null, "logprobs": null}], "usage": {"prompt_tokens": 30, "completion_tokens": 7, "total_tokens": 37}, "id": "chatcmpl-d802c95d766a41a5bf9491aafdba2b36", "object": "chat.completion.chunk", "created": 1735303124}

Specifically, the data does not include a leading space after "data:". The issue stems from the outdated versions of vllm or ms-swift, but the direct cause of the incompatibility and failure is due to pull request #11272 by dify. My submission #12171 addresses this problem. @yihong0618 @leslie2046 please review it thank you!

@leslie2046
Copy link
Contributor

moonshot and stepfun alsow start with "data:" ?

@jifei
Copy link
Contributor Author

jifei commented Dec 31, 2024

moonshot and stepfun alsow start with "data:" ?

Maybe not, but the new code is compatible and maintains code consistency.

@leslie2046
Copy link
Contributor

I see,you are right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants