[Bug] Remove compulsory `include_usage` when `stream=true` in gateway #757

gau-nernst · 2025-02-27T05:18:41Z

Pull Request Description

When stream=true, OpenAI API does not require stream_options to be specified. This will work

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
  "model": "gpt-4o",
  "stream": true,
  "messages": [{"role": "user", "content": "help me write a random generator in python"}]
}'

However, currently when stream=true, AIBrix gateway specifically checks for stream_options={"include_usage":true}. This PR simply removes the check.

Note from @Jeffwan

Some features like heterogenous feature relies on the usage to be reported. We probably need some docs changes in the feature page to claim that include_usage is needed.

Related Issues

Resolves: #[Insert issue number(s)]

Important: Before submitting, please complete the description above and review the checklist below.

Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

[Bug]: Corrections to existing functionality
[CI]: Changes to build process or CI pipeline
[Docs]: Updates or additions to documentation
[API]: Modifications to aibrix's API or interface
[CLI]: Changes or additions to the Command Line Interface
[Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

PR title includes appropriate prefix(es)
Changes are clearly explained in the PR description
New and existing tests pass successfully
Code adheres to project style and best practices
Documentation updated to reflect changes (if applicable)
Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

Jeffwan · 2025-02-27T06:01:25Z

/assign @varungup90

Signed-off-by: Thien Tran <[email protected]>

varungup90 · 2025-02-27T18:22:24Z

If the user has enabled rpm/tpm validation then we need to have include usage. To make include_usage optional will need check on whether user has enabled rpm/tpm limit check.

Jeffwan · 2025-02-27T18:26:20Z

For futures relies on usage statistics, can we add in the documentation to ask them enable it explicitly? heterogenous feature need it as well. By default, it should be clean

gau-nernst · 2025-02-28T00:17:06Z

@varungup90 @Jeffwan Let me know how you want me to add the checks and how to test them. I'm eager to contribute, but if it's too complicated, I can close this PR and you can open your own.

Another question. When include_usage is required, is it possible to send include_usage=true to inference pods, but the gateway will post-process the response to make it comply with include_usage=false if the request says so? Because what I'm seeing is that if AIBrix users use features that require include_usage (rpm/tpm validation and heterogeneous GPUs), the server is not exactly OpenAI-compatible?

Jeffwan · 2025-02-28T01:03:58Z

@varungup90 could you give more suggestions on the tpm check? Let's get @gau-nernst onboard.

varungup90 · 2025-02-28T21:11:48Z

I want to understand where is the blocker if we mandate to include stream usage. For client, if they do not want to consume usage report then it is still OK to include in the request.
For implementation, there are two alternatives, first is to add another header same as "routing-strategy" which I feel will make input request bulky or complicated. Second option is that if user has enabled rpm/tpm validation or request tracing then mandate stream_usage check.
If we decide to make include_usage optional then major changes will be in HandleResponseBody to adjust for rpm/tpm check and heterogeneous tracing feature. Given the current lack of e2e test, implementation need to be done carefully.

gau-nernst · 2025-03-03T01:55:02Z

I want to understand where is the blocker if we mandate to include stream usage.

I think the biggest issue is that it's not 100% OpenAI-compatible. Client code that does not expect include_stream=true might not work (from what I understand, there will be an extra last chunk with empty choices and not-null usage. If client code does not handle this, it may break). Actually I discovered this issue when trying to use SGLang's sglang.bench_serving for benchmarking AIBrix. Of course I could modify SGLang's specific code, but the issue regarding general client code is still there. Additionally, sometimes it's not possible to modify client code.

From OpenAI doc https://platform.openai.com/docs/api-reference/chat/create

If set, an additional chunk will be streamed before the data: [DONE] message. The usage field on this chunk shows the token usage statistics for the entire request, and the choices field will always be an empty array. All other chunks will also include a usage field, but with a null value.

Perhaps another option is to always send include_stream=true to inference pods (vLLM), but the gateway may skip the last usage statistics chunk if the client does not request it?

varungup90 · 2025-03-04T00:28:59Z

I have started a PR to make include_usage as optional param by default. If user's TPM limit is configured then include_usage is required.

Heterogenous use case is not supported with streaming right now. Once the feature is added, include_usage should be enabled as well.

Jeffwan requested a review from varungup90 February 27, 2025 06:01

Remove stream_options check

4db03cb

Signed-off-by: Thien Tran <[email protected]>

gau-nernst force-pushed the gateway_stream_include_usage branch from 8413b57 to 4db03cb Compare February 27, 2025 06:07

gau-nernst mentioned this pull request Mar 3, 2025

Failed to run benchmark scripts against the endpoint #783

Open

gau-nernst closed this Mar 4, 2025

gau-nernst deleted the gateway_stream_include_usage branch March 4, 2025 00:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Remove compulsory `include_usage` when `stream=true` in gateway #757

[Bug] Remove compulsory `include_usage` when `stream=true` in gateway #757

gau-nernst commented Feb 27, 2025

Jeffwan commented Feb 27, 2025

varungup90 commented Feb 27, 2025

Jeffwan commented Feb 27, 2025 •

edited

Loading

gau-nernst commented Feb 28, 2025

Jeffwan commented Feb 28, 2025

varungup90 commented Feb 28, 2025

gau-nernst commented Mar 3, 2025

varungup90 commented Mar 4, 2025 •

edited

Loading

[Bug] Remove compulsory include_usage when stream=true in gateway #757

[Bug] Remove compulsory include_usage when stream=true in gateway #757

Conversation

gau-nernst commented Feb 27, 2025

Pull Request Description

Related Issues

Pull Request Title Format

Submission Checklist

Jeffwan commented Feb 27, 2025

varungup90 commented Feb 27, 2025

Jeffwan commented Feb 27, 2025 • edited Loading

gau-nernst commented Feb 28, 2025

Jeffwan commented Feb 28, 2025

varungup90 commented Feb 28, 2025

gau-nernst commented Mar 3, 2025

varungup90 commented Mar 4, 2025 • edited Loading

[Bug] Remove compulsory `include_usage` when `stream=true` in gateway #757

[Bug] Remove compulsory `include_usage` when `stream=true` in gateway #757

Jeffwan commented Feb 27, 2025 •

edited

Loading

varungup90 commented Mar 4, 2025 •

edited

Loading