Skip to content

test(e2e): migrate Kimi compatibility to vitest#5555

Merged
cv merged 16 commits into
mainfrom
e2e-migrate/test-kimi-inference-compat
Jun 20, 2026
Merged

test(e2e): migrate Kimi compatibility to vitest#5555
cv merged 16 commits into
mainfrom
e2e-migrate/test-kimi-inference-compat

Conversation

@cv

@cv cv commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

Migrates the Kimi compatibility E2E into a typed live Vitest scenario. The new test uses a local OpenAI-compatible endpoint with the Kimi model id, onboards a real sandbox, verifies Kimi compat/plugin wiring, checks the managed inference.local model route, and drives an OpenClaw agent smoke turn through the fake endpoint.

Related Issue

Refs #5098

Changes

  • Add a typed live Vitest replacement for test/e2e/test-kimi-inference-compat.sh.
  • Wire a free-standing dispatchable Vitest job into .github/workflows/e2e-vitest-scenarios.yaml.
  • Preserve legacy shell deletion and any legacy shell workflow cleanup for Phase 11 per Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 migration governance.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • PR description includes the DCO sign-off declaration and every commit appears as Verified in GitHub
  • Git hooks passed during commit and push, or npx prek run --from-ref main --to-ref HEAD passes
  • Targeted tests pass for changed behavior
  • Full npm test passes (broad runtime changes only)
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Targeted commands run:

  • npx biome check --write test/e2e-scenario/live/kimi-inference-compat.test.ts
  • NEMOCLAW_RUN_E2E_SCENARIOS=1 npx vitest run --project e2e-scenarios-live test/e2e-scenario/live/kimi-inference-compat.test.ts -t __compile_only_nomatch__ --silent=false --reporter=default --passWithNoTests
  • npx vitest run --project e2e-vitest-support test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts
  • npx tsx scripts/check-test-file-size-budget.ts test/e2e-scenario/live/kimi-inference-compat.test.ts
  • npx tsc --noEmit --strict --moduleResolution bundler --module preserve --target ES2022 --types node --allowImportingTsExtensions test/e2e-scenario/live/kimi-inference-compat.test.ts
  • git diff --check

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

  • Tests
    • Added a live end-to-end scenario for Kimi-compatible inference, verifying sandbox onboarding, inference provider/base URL wiring, Kimi model compatibility settings, enabled plugin routing, and exposed /v1/models.
    • Validates end-to-end tool execution behavior, including correct tool-splitting/trajectory output and that chat requests return tool results.
  • Chores
    • Added a dedicated CI job to run the Kimi compatibility scenario and upload Vitest artifacts.
    • Updated pull request status reporting to include the new job.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv self-assigned this Jun 19, 2026
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new live E2E Vitest scenario (kimi-inference-compat.test.ts) that validates Kimi-compatible inference endpoint wiring in an OpenClaw sandbox. The test starts a fake OpenAI-compatible server, runs nemoclaw onboard, and asserts configuration (provider, base URL, model compat flags, plugin enablement, tool-search state) and agent behavior (basic output, tool execution with trajectory validation). A corresponding CI workflow job (kimi-inference-compat-vitest) is added with Kimi-specific environment configuration, artifact upload, and inclusion in the report-to-pr aggregation. The E2E workflow boundary test is updated to support free-standing scenario enumeration.

Changes

Kimi Inference Compat E2E Scenario

Layer / File(s) Summary
Mock server and helper utilities
test/e2e-scenario/live/kimi-inference-compat.test.ts
Establishes test imports, environment construction with Kimi-specific variables, the startKimiMock() HTTP server that mocks /v1/models and /v1/chat/completions with request recording, and the parseConfig() helper to extract provider/route/plugin/tool-search state from sandbox configuration.
Test orchestration and configuration assertions
test/e2e-scenario/live/kimi-inference-compat.test.ts
Implements the test execution flow: starting the mock server, provisioning/cleaning up the OpenClaw sandbox, running nemoclaw onboard with the mock endpoint, reading openclaw.json, and asserting inference provider setup, base URL, API type, model compat flags, plugin enablement, tool-search disabled, and plugin path presence.
Agent behavior and tool execution validation
test/e2e-scenario/live/kimi-inference-compat.test.ts
Validates the models endpoint, runs a basic OpenClaw agent expecting OK output, then runs a second agent configured to trigger tool execution with hostname/date/uptime tools. Asserts successful completion, verifies tool-trajectory JSONL artifacts, and confirms the mock server received chat completion requests with tool enablement and tool-result messages.
CI job and PR result reporting wiring
.github/workflows/e2e-vitest-scenarios.yaml
Adds the kimi-inference-compat-vitest job with conditional execution gating, Kimi-specific env vars, OpenShell installation, Vitest invocation, and artifact upload (14-day retention). Extends report-to-pr's needs list to include the new job.
E2E workflow boundary test infrastructure updates
test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts
Reorders module imports so workflow-boundary utilities precede testTimeoutOptions. Adds explicit 60-second timeout configuration to the free-standing inventory derivation test call.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#5401: Main PR's new kimi-inference-compat E2E Vitest scenario validates configuration wiring for the nemoclaw-kimi-inference-compat plugin, which is enhanced by the retrieved PR's improvements to managed inference model recognition.
  • NVIDIA/NemoClaw#5413: Both PRs target the Kimi inference compatibility end-to-end test's tool-call and trajectory validation logic, with the main PR adding live Kimi tool execution assertions and the retrieved PR adjusting the trajectory checker for multiturn tool calls.
  • NVIDIA/NemoClaw#5370: Updates the selector-inventory derivation and boundary logic in the same workflow file that gates free-standing jobs like kimi-inference-compat-vitest via inputs.jobs/inputs.scenarios.

Suggested labels

area: e2e, chore

Poem

🐇 A Kimi endpoint, shiny and new,
A fake server spun from morning dew,
The sandbox onboards, the config aligns,
The agent says "OK" — all good signs!
With artifacts saved and the PR table bright,
This bunny hops on into the night. 🌙

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main change: migrating a Kimi compatibility test from shell script to Vitest.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch e2e-migrate/test-kimi-inference-compat

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-code-quality

github-code-quality Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Code Coverage Overview

Languages: TypeScript

TypeScript / code-coverage/plugin

The overall coverage in the branch is 96%. Coverage data for the branch is not yet available.

Show a code coverage summary of the most covered files.
File 3fb46c5 +/-
nemoclaw/src/se...cret-scanner.ts 100%
nemoclaw/src/commands/slash.ts 100%
nemoclaw/src/li...bprocess-env.ts 100%
nemoclaw/src/bl...eprint/state.ts 98%
nemoclaw/src/onboard/config.ts 98%
nemoclaw/src/bl...int/snapshot.ts 97%
nemoclaw/src/bl...print/runner.ts 95%
nemoclaw/src/co...ration-state.ts 94%
nemoclaw/src/bl...ate-networks.ts 94%
nemoclaw/src/index.ts 94%

TypeScript / code-coverage/cli

The overall coverage in the branch is 46%. Coverage data for the branch is not yet available.

Show a code coverage summary of the most covered files.
File 3fb46c5 +/-
src/lib/state/o...oard-session.ts 91%
src/lib/inference/local.ts 76%
src/lib/sandbox/config.ts 72%
src/lib/actions...dbox/rebuild.ts 67%
src/lib/onboard/preflight.ts 64%
src/lib/actions...licy-channel.ts 56%
src/lib/state/sandbox.ts 55%
src/lib/policy/index.ts 49%
src/lib/onboard...er-gpu-patch.ts 44%
src/lib/onboard.ts 18%

Updated June 20, 2026 23:00 UTC
Code Coverage is in Public Preview. Learn more and provide us with your feedback.

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: kimi-inference-compat-vitest
Optional E2E: inference-routing-vitest, cloud-inference-vitest

Dispatch hint: kimi-inference-compat-vitest

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • kimi-inference-compat-vitest (medium): This PR introduces and wires this live E2E job; it should run to validate the new workflow entry, fake compatible endpoint setup, onboarding path, sandbox inference.local route, plugin config, and OpenClaw agent/tool-call flow.

Optional E2E

  • inference-routing-vitest (medium): Adjacent confidence for inference.local routing and provider/error-path behavior, but the PR only adds a new Kimi-specific scenario rather than changing shared inference runtime code.
  • cloud-inference-vitest (medium): Optional broader smoke coverage for live onboard plus sandbox inference.local agent requests, adjacent to the new compatible-endpoint scenario.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: .github/workflows/e2e-vitest-scenarios.yaml
  • jobs input: kimi-inference-compat-vitest

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: kimi-inference-compat-vitest
Optional Vitest E2E scenarios: None

Dispatch required Vitest E2E scenarios:

  • gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=kimi-inference-compat-vitest

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

  • kimi-inference-compat-vitest: Focused free-standing Vitest job wired for changed live test test/e2e-scenario/live/kimi-inference-compat.test.ts.
    • Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=kimi-inference-compat-vitest

Optional Vitest E2E scenarios

  • None.

Relevant changed files

  • .github/workflows/e2e-vitest-scenarios.yaml
  • test/e2e-scenario/live/kimi-inference-compat-helpers.ts
  • test/e2e-scenario/live/kimi-inference-compat.test.ts

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 1 worth checking, 0 nice ideas
Since last review: 2 prior items resolved, 0 still apply, 1 new item found

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Preserve streamed Kimi chat request coverage (test/e2e-scenario/live/kimi-inference-compat.test.ts:125): The legacy Kimi shell scenario required the mock to observe two authenticated streamed `/v1/chat/completions` requests, covering both the tool-call turn and final-answer/tool-result turn. The new mock supports streaming, but `KimiRequest` does not record the `stream` field and the final assertions only require a request with tools and a request with a tool result. A regression that changed the agent/provider path to non-streaming could still pass this migrated test.
    • Recommendation: Record `body.stream` in the mock request metadata and assert the tool-call and tool-result chat requests are authenticated and streamed, matching the legacy K6 behavior.
    • Evidence: New assertions check `request.authOk`, `/chat/completions`, model, `hasTools`, and `hasToolResult`, but not `stream`. The retained legacy `check_upstream_observed_agent_traffic` counted `POST /v1/chat/completions auth=ok stream=True` and required at least two such requests.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — Kimi mock endpoint is reachable from the sandbox/gateway through `https://inference.local/v1/models\` after onboarding with a `http://host.openshell.internal:${port}/v1\` provider URL.. The changed behavior crosses workflow dispatch, host mock networking, sandbox lifecycle, OpenShell managed inference routing, and OpenClaw agent/tool-call execution; runtime validation is appropriate even though the code structure is straightforward.
  • **Runtime validation** — Kimi agent chat and tool-splitting requests reach the same host mock through managed inference and record authenticated `/v1/chat/completions` requests with both tool calls and tool results.. The changed behavior crosses workflow dispatch, host mock networking, sandbox lifecycle, OpenShell managed inference routing, and OpenClaw agent/tool-call execution; runtime validation is appropriate even though the code structure is straightforward.
  • **Runtime validation** — Kimi-compatible Vitest records streamed tool-call and tool-result chat requests through the host mock.. The changed behavior crosses workflow dispatch, host mock networking, sandbox lifecycle, OpenShell managed inference routing, and OpenClaw agent/tool-call execution; runtime validation is appropriate even though the code structure is straightforward.
  • **Runtime validation** — Workflow selectors dispatch `scenarios=kimi-inference-compat` and `jobs=kimi-inference-compat-vitest` to the free-standing Kimi job.. The changed behavior crosses workflow dispatch, host mock networking, sandbox lifecycle, OpenShell managed inference routing, and OpenClaw agent/tool-call execution; runtime validation is appropriate even though the code structure is straightforward.
  • **Runtime validation** — Kimi trajectory assertion rejects a `model.completed` record with `promptErrorSource` set.. The changed behavior crosses workflow dispatch, host mock networking, sandbox lifecycle, OpenShell managed inference routing, and OpenClaw agent/tool-call execution; runtime validation is appropriate even though the code structure is straightforward.
  • **Preserve streamed Kimi chat request coverage** — Record `body.stream` in the mock request metadata and assert the tool-call and tool-result chat requests are authenticated and streamed, matching the legacy K6 behavior.
  • **Acceptance clause:** No linked issue clauses were available in the deterministic review context. — add test evidence or identify existing coverage. `linkedIssues` was empty in the validation context. The PR body references `Refs Epic: Migrate legacy bash E2E into the Vitest E2E system #5098`, but the issue body/comments were not available for literal clause extraction.
Since last review details

Current findings:

  • Preserve streamed Kimi chat request coverage (test/e2e-scenario/live/kimi-inference-compat.test.ts:125): The legacy Kimi shell scenario required the mock to observe two authenticated streamed `/v1/chat/completions` requests, covering both the tool-call turn and final-answer/tool-result turn. The new mock supports streaming, but `KimiRequest` does not record the `stream` field and the final assertions only require a request with tools and a request with a tool result. A regression that changed the agent/provider path to non-streaming could still pass this migrated test.
    • Recommendation: Record `body.stream` in the mock request metadata and assert the tool-call and tool-result chat requests are authenticated and streamed, matching the legacy K6 behavior.
    • Evidence: New assertions check `request.authOk`, `/chat/completions`, model, `hasTools`, and `hasToolResult`, but not `stream`. The retained legacy `check_upstream_observed_agent_traffic` counted `POST /v1/chat/completions auth=ok stream=True` and required at least two such requests.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv added the v0.0.66 Release target label Jun 19, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-scenario/live/kimi-inference-compat.test.ts`:
- Around line 77-83: The JSON.parse call in the req.on("end", ...) callback
within startKimiMock is unguarded and will crash the Vitest worker if a
malformed payload is received. Wrap the JSON.parse(raw || "{}") call in a
try-catch block to handle parsing errors gracefully. When a parsing error
occurs, send an appropriate error response to the client (such as a 400 Bad
Request) instead of allowing the exception to propagate and crash the worker
process.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ea0b1764-88a0-42cb-aa42-177f95d41d25

📥 Commits

Reviewing files that changed from the base of the PR and between 3d47296 and 138b4b2.

📒 Files selected for processing (1)
  • test/e2e-scenario/live/kimi-inference-compat.test.ts

Comment thread test/e2e-scenario/live/kimi-inference-compat.test.ts Outdated
@cv cv linked an issue Jun 19, 2026 that may be closed by this pull request
79 tasks
cv added 3 commits June 19, 2026 14:07
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
…-inference-compat

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
cv added 7 commits June 19, 2026 14:44
…-inference-compat

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
…compat' into e2e-migrate/test-kimi-inference-compat

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
…-inference-compat

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv

cv commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator Author

Merged latest origin/main and resolved the workflow conflict by keeping both free-standing jobs (kimi-inference-compat-vitest and hermes-inference-switch-vitest) wired into report-to-pr.needs.

Also addressed the still-valid Kimi endpoint feedback:

  • startKimiMock() now binds the mock server on 0.0.0.0 so it is host-reachable from the sandbox/gateway path.
  • The advertised provider URL now defaults to http://host.openshell.internal:<port>/v1, matching the retained legacy shell boundary. NEMOCLAW_KIMI_MOCK_HOST can override the advertised host if needed.
  • The mock chat body parser already handles malformed JSON with a 400 response instead of letting parse errors escape.

Validation rerun:

npx biome check --write .github/workflows/e2e-vitest-scenarios.yaml test/e2e-scenario/live/kimi-inference-compat.test.ts test/e2e-scenario/live/kimi-inference-compat-helpers.ts test/e2e-scenario/live/hermes-inference-switch.test.ts test/e2e-scenario/live/hermes-inference-switch-helpers.ts
npm run typecheck:cli
npx vitest run test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts test/e2e-scenario/support-tests/e2e-live-project-config.test.ts

Push note: I pushed the signed merge/update commits over SSH because this branch updates workflow files.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv

cv commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator Author

Pushed a signed empty refresh commit after resolving conflicts to force GitHub to recompute mergeability. Local verification confirms the PR branch contains current origin/main:

git merge-base --is-ancestor origin/main HEAD

returned success locally before the refresh push.

@cv cv merged commit b025230 into main Jun 20, 2026
40 checks passed
@cv cv deleted the e2e-migrate/test-kimi-inference-compat branch June 20, 2026 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v0.0.66 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant