Skip to content

fix: model routing bugs, per-model cost tracking, and quickstart doc#2

Closed
HelloAnner wants to merge 4 commits into
billion-token-one-task:v6-releasefrom
HelloAnner:fix/model-routing-and-cost-tracking
Closed

fix: model routing bugs, per-model cost tracking, and quickstart doc#2
HelloAnner wants to merge 4 commits into
billion-token-one-task:v6-releasefrom
HelloAnner:fix/model-routing-and-cost-tracking

Conversation

@HelloAnner
Copy link
Copy Markdown

Summary

  • Fix jq syntax error in post-tool.sh that silently broke metrics reporting to dashboard
  • Split cost tracking: complex and simple models now use independent pricing variables in openclaw.json
  • Add per-model cost variable substitution in restart.sh
  • Fix budget exhaustion check (>=>) to avoid premature agent shutdown
  • Fix metrics overview cost estimation to respect per-model pricing instead of using a single fallback
  • Make GITHUB_USERNAME configurable in dashboard-reporter hook and GitHub sync (was hardcoded to BillionClaw)
  • Add docs/quickstart.md covering setup, model switching, budget control, and dashboard usage

Test plan

  • Docker container test: DeepSeek model switching verified end-to-end
  • Agent successfully discovered issue, implemented fix, forked, pushed, and created PR using DeepSeek
  • Dashboard API returns correct per-model pricing in health-check and metrics
  • Verify restart.sh correctly substitutes __INPUT_COST_PER_M_COMPLEX__ placeholders
  • Verify budget exhaustion triggers at > threshold (not >=)

🤖 Generated with Claude Code

- Fix jq syntax error in post-tool.sh that silently broke metrics reporting
- Split cost tracking: complex and simple models now use independent pricing
  variables (__INPUT_COST_PER_M_COMPLEX/SIMPLE__) in openclaw.json
- Add per-model cost variable substitution in restart.sh
- Fix budget exhaustion check (>= to >) to avoid premature shutdown
- Fix metrics overview cost estimation to respect per-model pricing
- Make GITHUB_USERNAME configurable in dashboard-reporter and github sync
- Add docs/quickstart.md with setup, model switching, budget, and dashboard guide

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cla-assistant
Copy link
Copy Markdown

cla-assistant Bot commented Apr 11, 2026

CLA assistant check
All committers have signed the CLA.

HelloAnner and others added 3 commits April 11, 2026 15:59
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a MODEL_TOKEN_BUDGETS config (env var + dashboard settings) that caps
cumulative input+output token usage per model, matched by bare model name so
the same model served by multiple providers (e.g. z-ai/glm-4.6 and
openrouter/glm-4.6) shares a single counter. When a model exceeds its cap, the
health-check endpoint emits a MODEL TOKEN BUDGET EXHAUSTED directive picked up
by the heartbeat loop, and a non-dismissible red banner appears at the top of
every dashboard page.

- Add bareModelName() helper for cross-provider model matching
- Extend /api/agent/health-check with per-model aggregation, directive
  injection, and a new modelBudgets response field (exhausted/usage/caps)
- Add ModelBudgetBanner client component, mounted globally in app/layout.tsx
- Accept and normalize modelTokenBudgets in /api/settings PUT
- Wire NEXT_PUBLIC_LLM_MODEL_COMPLEX into the header model label
- Document MODEL_TOKEN_BUDGETS semantics in docs/model-routing.md and
  .env.example, emphasizing bare-name matching across providers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The dashboard's connection state only reflects whether the
`dashboard-reporter` hook has fired heartbeats, and that hook only fires on
`agent_end`. In continuous mode the main session never ends (stopReason=toolUse
loop), so a running agent whose first LLM call is failing would appear
`disconnected` even though the container is perfectly alive — and an agent
hitting repeated upstream 401/quota errors would be indistinguishable from
one whose container has died.

Add a separate LLM health signal derived directly from the openclaw session
jsonl, and surface it as its own top-of-page banner so the two failure modes
are visually distinct.

- Add /api/agent/llm-health — tails latest session jsonl, walks back from
  the most recent assistant message to classify the LLM as ok/errored/unknown
  and report the most recent error message and timestamps
- Embed the LLM health block inside /api/connection-status so existing
  consumers get both dimensions in a single request
- Add LlmErrorBanner client component, mounted alongside ModelBudgetBanner
  in the root layout. Renders only when llm.state === "errored", showing the
  upstream error text and "last ok/fail" relative timestamps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Protocol-zero-0 added a commit that referenced this pull request Apr 17, 2026
Integrates HelloAnner's fix/model-routing-and-cost-tracking branch with
alpha/v0.1.0's extra features preserved:

  - Generic LLM_PROVIDER / LLM_BASE_URL / LLM_API_KEY / LLM_MODEL_* env scheme
    supersedes hardcoded minimax/MiniMax-M2.7 defaults. CLAWOSS_PRIMARY_MODEL
    etc. kept as optional legacy overrides.
  - Per-tier pricing (INPUT_COST_PER_M_COMPLEX / _SIMPLE, with flat fallback)
    flows through post-tool.sh, handler.ts, and dashboard-sync.sh.
  - heartbeat.every=5m and LLM_* placeholders land in config/openclaw.json;
    acp.defaultAgent=codex, loopDetection, logging.level from alpha preserved.
  - restart.sh now also exports LLM_* and MODEL_TOKEN_BUDGETS into the
    deployed config env block, alongside alpha's CLAWOSS_ROOT / RECORD_* vars.
  - Dashboard: llm-health probe, model-budget banner, cost-models lookup,
    llm-error banner all merged; env-driven model display in header/gateway.
  - .env.example rewritten with LLM_* primary + legacy CLAWOSS_* section and
    full Provider Quick Reference block (Gemini, Mistral, DeepSeek, MiniMax,
    Moonshot, GLM).

Not a clean replay: 10 conflict files resolved by hand. See the PR body for
the dimension-by-dimension integration notes.
Protocol-zero-0 added a commit that referenced this pull request May 17, 2026
Baseline for ClawOSS v1.0 封版。

Reviewed and accepted as Issue #10 的基础设施层 (per #10).

Known issue carried forward: budget exhaustion uses `>=` (off-by-one). Will be fixed in a follow-up commit on v6-release, crediting HelloAnner's discovery in PR #2.
Protocol-zero-0 added a commit that referenced this pull request May 17, 2026
Previously, budget was marked exhausted when usage exactly equaled the
budget cap, which prematurely paused the agent before it could spend the
full budget. Changed to `>` so the agent can use the entire configured
budget before pausing.

Credit: HelloAnner first identified this in PR #2 (closed; superseded by
the telemetry-driven budget rewrite in PR #9). Carrying the fix forward.

Refs #10
Protocol-zero-0 added a commit that referenced this pull request May 17, 2026
Adds a user-facing quickstart covering setup, model switching, budget
control, and dashboard usage. PR #9's README does not cover these flows,
so we are landing this doc on v6-release independently to keep new
contributors onboarded.

Original author: HelloAnner. Original PR: #2 (closed; bug-fix portion
superseded by PR #9's telemetry-driven runtime rewrite, but docs were
not absorbed).

Co-Authored-By: HelloAnner <helloanner@gmail.com>

Refs #2 #10
Protocol-zero-0 pushed a commit that referenced this pull request May 17, 2026
@Protocol-zero-0
Copy link
Copy Markdown
Contributor

@HelloAnner 感谢非常扎实的工作。我们仔细对照了你 7 个修复点和后续 PR #9 的重构,做一份吸收/取代说明:

已带入 v6-release(致谢继承)

修复点 处理
预算 off-by-one (>=>) ✅ 已带入 v6-release,commit bf36581 — PR #9 重构时把这个改回去了,你的发现仍然有效
docs/quickstart.md ✅ 整文档 cherry-pick 进 v6-release,commit bca65fd,带 Co-Authored-By: HelloAnner

被 PR #9 用更彻底的方案取代(不再单独吸收)

修复点 PR #9 方案
jq env.LLM_PROVIDER 静默失败 改用 --arg model 注入,等价修复
complex/simple 共用占位符 抛弃 complex/simple 概念,改 telemetry-driven pricing(每条 telemetry 自带 pricing,直接绕过这个 bug 类)
metrics overview 单一定价(5-20x 偏差) 同上,telemetry-driven 直接使用真实 input/output token 与单价
GITHUB_USERNAME 硬编码 PR #9 引入 CLAW_AGENT_USERNAME env var(虽然 prompt 模板里还有部分残留,会在后续 minor 清理)
restart.sh macOS 强耦合 PR #3 的 Docker 套件已 cherry-pick 到 deploy/docker/,Linux 部署不再依赖 launchctl
token 估算架构缺陷 telemetry-driven 拿真实 input_tokens/output_tokens/cost_usd,不再依赖 JSON.stringify(...).length / 4 估算

端到端验证

你在 huggingface/transformers#45371 跑通的端到端证据,是当前 ClawOSS 仓库里少数有真实上游 PR 落地的验证记录。我们会把这条记录引用进 v1.0 的 release notes。

v1.0 封版 + 后续

v6-release 即将打 v1.0 tag,作为"自建 gateway + dashboard + placeholder 渲染"架构的最终封版。我们正在并行做一个 Phase 2 — 把 ClawOSS 重写成 Claude Code 的 Skill + 模板仓库形态,目标是把现在 ~3700 行的基础设施缩到 ~500 行,同时直接绕过你诊断的 7 个 bug 里至少 5 个(它们的本质是"自建基础设施"的副产品,在 Skill 形态下根本不会存在)。

如果你对这个方向感兴趣,欢迎参与 Phase 2 设计 — 你对 7 个 bug 的诊断质量,在那个语境下会变成"为什么不这么做"的最强论据。

基于上述,本 PR 关闭。再次感谢你的工作。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants