fix(passthrough): 修复WSv2模式下first_token_ms测量错误#2444
Open
astro-ge wants to merge 1 commit into
Open
Conversation
Contributor
|
All contributors have signed the CLA. ✅ |
Author
|
I have read the CLA Document and I hereby sign the CLA |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
WSv2 passthrough 模式下,每个请求的
first_token_ms都等于duration_ms—— 在 admin UI 表现为「首 TOKEN ≈ 耗时」,并且这个错误值会通过AfterTurn回调持久化到usage_logs.first_token_ms。ctx_pool模式和 HTTP/responses路径不受影响。复现
真实 OpenAI Responses-API 在流式响应中发送的事件序列大致是这样:
关键点:
response.output_text.delta事件在任何层级都不携带response_id,只有response.created/response.completed才有。根因
backend/internal/service/openai_ws_v2/passthrough_relay.go里observeUpstreamMessage的两段逻辑:responseID从response.id/response_id/ 顶层id提取。delta 事件这三个字段都没有,所以responseID == ""。responseID != ""时才更新turnTiming.firstTokenMs。结果就是delta 来的时候,per-turn 的
firstTokenMs永远不会被打点,一直等到response.completed这种带response.id的终结事件才进入这段逻辑 —— 此时now - turnTiming.startAt就等于整个 turn 的 duration。另外行 781 把
response.completed/response.done也算成了isTokenEvent,让 bug 表面看起来"firstTokenMs 至少有值",掩盖了实际错误。state.firstTokenMs(全局,会写入RelayResult.FirstTokenMs)是正确的,因为它在行 548 直接用now - startAt打点,不依赖responseID。但RelayTurnResult.FirstTokenMs(per-turn,被openai_ws_v2_passthrough_adapter.go用于日志和 DB 写入)是错的。修复
在
relayState加一个activeTurn *relayTurnTiming指针,指向当前 in-flight 的 turn。getOrInitTurnTiming新建条目时,同步设置state.activeTurn = timingobserveUpstreamMessage在打state.firstTokenMs的同时,如果state.activeTurn != nil且其firstTokenMs尚未设置,就用now - activeTurn.startAt直接打点 —— 不依赖事件里有没有responseIDopenAIWSRelayDeleteTurnTiming删除条目时,如果state.activeTurn指向被删的那个 timing,置为 nil我用指针而不是
responseID string是因为WSv2 一个连接上 turn 是串行的,同一时刻只有一个 active turn,不需要 map 查找测试
新增
TestRelay_OnTurnComplete_RealOpenAIStream_FirstTokenMs:模拟真实 OpenAI 事件流(response.created→ 3 个response.output_text.delta(不带response_id)→response.completed),断言turn.FirstTokenMs < turn.Duration。修复前的失败信息(精确复现生产症状):
修复后通过。
现有的
TestRelay_OnTurnComplete_ProvidesTurnMetrics之所以没发现这个 bug,是因为它在 delta payload 里手动注入了"response_id":"resp_metric"这不是真实 OpenAI 行为。新测试不注入这个字段,反映真实流。仅影响 passthrough 模式的 per-turn 指标(
RelayTurnResult.FirstTokenMs)