Skip to content

fix(passthrough): 修复WSv2模式下first_token_ms测量错误#2444

Open
astro-ge wants to merge 1 commit into
Wei-Shaw:mainfrom
astro-ge:fix/passthrough-first-token-ms
Open

fix(passthrough): 修复WSv2模式下first_token_ms测量错误#2444
astro-ge wants to merge 1 commit into
Wei-Shaw:mainfrom
astro-ge:fix/passthrough-first-token-ms

Conversation

@astro-ge
Copy link
Copy Markdown

WSv2 passthrough 模式下,每个请求的 first_token_ms 都等于 duration_ms —— 在 admin UI 表现为「首 TOKEN ≈ 耗时」,并且这个错误值会通过 AfterTurn 回调持久化到 usage_logs.first_token_ms

ctx_pool 模式和 HTTP /responses 路径不受影响。

复现

真实 OpenAI Responses-API 在流式响应中发送的事件序列大致是这样:

{"type":"response.created","response":{"id":"resp_X"}}
{"type":"response.output_text.delta","delta":"He"}
{"type":"response.output_text.delta","delta":"llo"}
{"type":"response.completed","response":{"id":"resp_X","usage":{...}}}

关键点:response.output_text.delta 事件在任何层级都不携带 response_id,只有 response.created / response.completed 才有。

根因

backend/internal/service/openai_ws_v2/passthrough_relay.goobserveUpstreamMessage 的两段逻辑:

  1. 行 538-545responseIDresponse.id / response_id / 顶层 id 提取。delta 事件这三个字段都没有,所以 responseID == ""
  2. 行 560-568:只在 responseID != "" 时才更新 turnTiming.firstTokenMs

结果就是delta 来的时候,per-turn 的 firstTokenMs 永远不会被打点,一直等到 response.completed 这种带 response.id 的终结事件才进入这段逻辑 —— 此时 now - turnTiming.startAt 就等于整个 turn 的 duration。

另外行 781 把 response.completed / response.done 也算成了 isTokenEvent,让 bug 表面看起来"firstTokenMs 至少有值",掩盖了实际错误。

state.firstTokenMs(全局,会写入 RelayResult.FirstTokenMs)是正确的,因为它在行 548 直接用 now - startAt 打点,不依赖 responseID。但 RelayTurnResult.FirstTokenMs(per-turn,被 openai_ws_v2_passthrough_adapter.go 用于日志和 DB 写入)是错的。

修复

relayState 加一个 activeTurn *relayTurnTiming 指针,指向当前 in-flight 的 turn。

  • getOrInitTurnTiming 新建条目时,同步设置 state.activeTurn = timing
  • observeUpstreamMessage 在打 state.firstTokenMs 的同时,如果 state.activeTurn != nil 且其 firstTokenMs 尚未设置,就用 now - activeTurn.startAt 直接打点 —— 不依赖事件里有没有 responseID
  • openAIWSRelayDeleteTurnTiming 删除条目时,如果 state.activeTurn 指向被删的那个 timing,置为 nil

我用指针而不是 responseID string是因为WSv2 一个连接上 turn 是串行的,同一时刻只有一个 active turn,不需要 map 查找

测试

新增 TestRelay_OnTurnComplete_RealOpenAIStream_FirstTokenMs:模拟真实 OpenAI 事件流(response.created → 3 个 response.output_text.delta不带 response_id)→ response.completed),断言 turn.FirstTokenMs < turn.Duration

修复前的失败信息(精确复现生产症状):

Error: "120" is not less than "120"
per-turn FirstTokenMs (120ms) should be strictly less than Duration (120ms);
equality indicates the bug where first_token is mistakenly stamped on the
terminal event

修复后通过。

现有的 TestRelay_OnTurnComplete_ProvidesTurnMetrics 之所以没发现这个 bug,是因为它在 delta payload 里手动注入了 "response_id":"resp_metric" 这不是真实 OpenAI 行为。新测试不注入这个字段,反映真实流。

仅影响 passthrough 模式的 per-turn 指标(RelayTurnResult.FirstTokenMs

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

All contributors have signed the CLA. ✅
Posted by the CLA Assistant Lite bot.

@astro-ge
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

github-actions Bot added a commit that referenced this pull request May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant