Skip to content

fix: sticky session binding breaks on streaming path + multi-user isolation with body.user#188

Open
The-five-stooges wants to merge 14 commits into
dwgx:masterfrom
The-five-stooges:master
Open

fix: sticky session binding breaks on streaming path + multi-user isolation with body.user#188
The-five-stooges wants to merge 14 commits into
dwgx:masterfrom
The-five-stooges:master

Conversation

@The-five-stooges
Copy link
Copy Markdown

@The-five-stooges The-five-stooges commented May 25, 2026

概要

修复了粘性会话(sticky session)在多用户代理场景下的两个关键问题,并新增了 Dashboard 可配置项和示例脚本。

修改内容

🐛 Bug 修复

  1. src/handlers/chat.js核心修复:流式重试路径 waitForAccountFn 调用缺少第 5 个参数 callerKey,导致 auth.jscallerKey=null,粘性会话绑定完全失效。补上参数后链路正常。

  2. src/handlers/messages.js — 修复 callerKey 的 :user: 段重复追加问题(当代理脚本已注入 body.user 时,messages.js 会二次追加)。增加检测逻辑,已含 :user: 时跳过追加。

  3. src/dashboard/index.html — 修复 Dashboard 下载日志 401 错误:从空的 sessionStorage 读取密码改为使用组件自身的 this.password

  4. examples/proxy-user-inject.js — 修复代理示例脚本硬编码 method: 'POST' 导致 GET /v1/models 返回 404 的问题。改为保留原始 HTTP 方法,仅对 POST/PUT/PATCH 注入 body.user

✨ 新功能

  1. src/account/sticky-session.js — 新增 stickyBindByUserOnly 开关支持:

    • 启用时,binding key 的模型维度固定为 *,使同一 user 下所有模型共享同一个上游账号
    • 禁用时,callerKey + modelKey 二元组绑定(原有行为)
  2. src/account/sticky-session.js + src/auth.js + src/handlers/chat.js — 新增 stickyNoFallback 开关:

    • 启用时,粘性会话绑定的账号遇到限流/错误/模型不可用时,不会回退到账号池中的其他账号,直接向客户端返回错误
    • 三道防线:auth.js getApiKey 直接返回 null、chat.js waitForAccount 快速失败、chat.js 重试循环 break 而非 continue
    • 确保每个 user 严格只消耗自己绑定账号的额度
  3. src/runtime-config.js — 注册 stickyBindByUserOnlystickyNoFallback 实验性开关,默认均为 false

  4. src/caller-key.js — 当 body.user 存在时,extractBodyCallerSubKey 只用 body.user 哈希,不再拼接其他 metadata 字段,确保 callerKey 稳定

  5. src/dashboard/index.html — Dashboard 面板新增"粘性会话按用户绑定"和"粘性会话禁止回退"两个 toggle 开关

📋 调试日志

  1. src/auth.jssticky-session.jscaller-key.js — 添加 log.info() 级别的 [sticky][caller-key] 诊断日志,可在面板实时查看

📄 文档

  1. examples/proxy-user-inject.js — 完整的多用户隔离代理脚本示例,包含问题说明、使用指南、systemd 示例和安全注意事项

  2. .gitignore — 添加 PR-FLOW.md

测试验证

  • 通过日志验证:[sticky] CHECK 不再出现 callerKey=null
  • MISS → SET → HIT 周期完整,同一用户后续请求正确命中缓存绑定
  • stickyBindByUserOnly=true 时 binding key 末段为 *(忽略模型维度)
  • stickyNoFallback=true 时绑定账号失败直接返回错误,不切换账号消耗额度
  • 代理脚本 GET /v1/models 通过 3004/3005 端口正常返回 200

…lusively when present

extractBodyCallerSubKey concatenates multiple body fields (user, metadata.session_id, conversation_id, etc.) into a single hash via candidates.join('|'). When a proxy script injects body.user='user_a', but Claude Code carries varying metadata.session_id across turns, the concatenated hash changes  producing a different callerKey on every request. The sticky session manager then sees a 'new' caller each time and re-rolls the account binding, defeating the purpose of STICKY_SESSION_ENABLED and CASCADE_REUSE_BY_CALLER. This fix short-circuits: when body.user is present, use ONLY its sha256 hash as the caller subkey, discarding all other metadata fields. The multi-field fallback still applies when body.user is absent (native Claude Code / no proxy), preserving backward compatibility.
…orage

The exportLogs() function reads the dashboard password from sessionStorage.getItem('dashboard_password'), but the login flow writes it to this.password (backed by localStorage.getItem('dp')). sessionStorage.dashboard_password is never set anywhere in the codebase, so it is always empty  causing every log download request to fail with HTTP 401. This patch changes the source to this.password, which is reliably populated after a successful login and survives page reloads via localStorage.
…er injection

Add a fully documented reference proxy script that injects a per-user 'user' field into chat completion requests before forwarding to WindsurfAPI. This enables two or more developers sharing a single WindsurfAPI instance to maintain isolated sticky sessions and independent upstream account bindings. The script includes: problem statement, prerequisites (STICKY_SESSION_ENABLED + CASCADE_REUSE_BY_CALLER + independent LS instances via tinyproxy), step-by-step usage guide, systemd service example, verification steps, and security notes. All examples use placeholder values  no real credentials, IPs, or account emails are included.
…nding

When STICKY_SESSION_ENABLED=1, the sticky binding key is callerKey + modelKey. This means a single user requesting different models (e.g. opus vs haiku) may get bound to different upstream accounts  because modelKey changes. Add a dashboard experimental toggle 'stickyBindByUserOnly' that, when enabled, forces the binding key to callerKey + '*' regardless of model, so all model requests from the same user share one upstream account. Default OFF (preserves per-model isolation). Debug log in caller-key.js reports body.user/subKey for troubleshooting.
Add [sticky] HIT/MISS/SET/CLEAR log entries (filtered to callerKeys containing ':user:') so the Dashboard log panel can show the full lifecycle of user-scoped bindings.
… already injected body.user

messages.js extractCallerSubKey() reads metadata.user_id (Claude Code internal device/session id) and appends it as a second ':user:xxxx' segment to the callerKey. When a proxy script has already injected body.user (resulting in callerKey ending with ':user:<hash>'), this double-stamp changes the callerKey on sub-agent calls where metadata.user_id differs  causing sticky session MISS and cross-account binding. Fix: skip the append when callerKey already contains ':user:'.
…erations

Remove the callerKey.includes(':user:') filter from all [sticky] log entries so every sticky operation is visible in dashboard logs. Also add a SKIP log when callerKey is empty/null to detect code paths that bypass sticky entirely.
…yBinding

Even with the ENABLED/callerKey guards removed from filter conditions, the haiku sub-agent request hitting 838591845@qq.com shows NO [sticky] log at all (not even MISS). This unconditional ENTER log at the very top of getStickyBinding will confirm whether the function is actually being invoked for every request.
…ranch

Add logging at the getApiKey entry point to definitively determine whether sticky session is being skipped because callerKey is falsy or isStickyEnabled() returns false. Also add SCHECK log in the else branch for clarity.
The streaming retry loop at _handleChatCompletionsInner (~L2926) calls waitForAccountFn with only 4 arguments, omitting the 5th callerKey parameter. This causes waitForAccount to default callerKey=null, which makes getApiKey skip the sticky session check (callerKey && isStickyEnabled()  null && true  falsy). As a result, sticky session bindings are never looked up for new conversations on the streaming path  every first request re-rolls a random account. The non-streaming path at L1868 correctly passes callerKey. This one-line fix restores parity.
@The-five-stooges The-five-stooges changed the title Fix sticky session handling and update log export password source fix: sticky session binding breaks on streaming path + multi-user isolation with body.user May 25, 2026
…rotation on bound account failure

When stickyNoFallback is enabled (Dashboard toggle):
- auth.js getApiKey(): bound account unavailable  return null immediately,
  don't clear binding, don't fall through to normal selection
- chat.js waitForAccount(): skip 30s wait loop, fail fast
- chat.js non-stream retry: rate_limit/transient/model_not_available
  from sticky account  break instead of continue
- chat.js stream retry: same pattern, break on any model error

This ensures each user strictly consumes only their bound account's quota
instead of burning through other accounts in the pool.
…ody.user for POST/PUT/PATCH

Previously the proxy scripts hardcoded method:'POST' for all upstream
requests, which broke GET /v1/models responses with a 404 on proxied
ports. GET/HEAD/DELETE requests are now forwarded as-is (no body
injection), while POST/PUT/PATCH still inject body.user for multi-user
isolation.

- scripts/proxy-user-mantou.js  :3005, injects user='mantou'
- scripts/proxy-user-maskja.js  :3004, injects user='maskja'
- examples/proxy-user-inject.js  reference example updated to match
- Level 1 (stickyBindByUserOnly + stickyNoFallback): deterministic SHA256(callerKey) sharding, ignoring all health metrics

- Level 2 (default): soft sharding only when top two candidates are perfectly tied

Fixes cross-user account contamination when multiple users share the same pool
@The-five-stooges
Copy link
Copy Markdown
Author

新增提交 f3e3375feat(auth): add user-aware account sharding tiebreaker in getApiKey

问题

两个用户(maskja 端口 9090、antimantou 端口 8080)共享同一账号池时,getApiKey() 的稳定排序在各项健康指标完全平齐时总是选中同一个账号,导致串号。

解决方案

getApiKey() 候选人排序后增加两档用户分片 (sharding) 逻辑

  • Level 1 — 严格锁定stickyBindByUserOnly=true + stickyNoFallback=true):无视所有健康指标,通过 SHA256(callerKey) % candidates.length 确定性将每个用户固定到特定账号槽位
  • Level 2 — 软分片(默认模式):仅当前两名在 inflight / quotaScore(5%桶) / rpmRatio / lastUsed 全部平齐时才重排,不覆盖合法的负载均衡差异

效果

开启严格锁定后,maskja 和 antimantou 各自固定使用不同的上游账号,不再串号。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant