Skip to content

test(e2e): migrate Hermes inference switch to vitest#5553

Merged
cv merged 16 commits into
mainfrom
e2e-migrate/test-hermes-inference-switch
Jun 20, 2026
Merged

test(e2e): migrate Hermes inference switch to vitest#5553
cv merged 16 commits into
mainfrom
e2e-migrate/test-hermes-inference-switch

Conversation

@cv

@cv cv commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

Migrates the Hermes inference switch E2E into a typed live Vitest scenario. The new test onboards a real Hermes sandbox, switches the inference route, verifies OpenShell route state plus Hermes config/registry state, and sends a live Hermes API chat after the switch.

Related Issue

Refs #5098

Changes

  • Add a typed live Vitest replacement for test/e2e/test-hermes-inference-switch.sh.
  • Wire a free-standing dispatchable Vitest job into .github/workflows/e2e-vitest-scenarios.yaml.
  • Preserve legacy shell deletion and any legacy shell workflow cleanup for Phase 11 per Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 migration governance.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • PR description includes the DCO sign-off declaration and every commit appears as Verified in GitHub
  • Git hooks passed during commit and push, or npx prek run --from-ref main --to-ref HEAD passes
  • Targeted tests pass for changed behavior
  • Full npm test passes (broad runtime changes only)
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Targeted commands run:

  • npx biome check --write test/e2e-scenario/live/hermes-inference-switch.test.ts
  • NEMOCLAW_RUN_E2E_SCENARIOS=1 npx vitest run --project e2e-scenarios-live test/e2e-scenario/live/hermes-inference-switch.test.ts -t __compile_only_nomatch__ --silent=false --reporter=default --passWithNoTests
  • npx vitest run --project e2e-vitest-support test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts
  • npx tsx scripts/check-test-file-size-budget.ts test/e2e-scenario/live/hermes-inference-switch.test.ts
  • npx tsc --noEmit --strict --moduleResolution bundler --module preserve --target ES2022 --types node --allowImportingTsExtensions test/e2e-scenario/live/hermes-inference-switch.test.ts
  • git diff --check

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

  • CI / Automation
    • Added a new gated end-to-end Vitest job for the Hermes inference switch and included its results in the PR reporting summary.
    • Runs non-interactively with Hermes/NemoClaw switch settings and publishes scenario artifacts.
  • Tests
    • Added a live E2E scenario that installs Hermes, switches the inference set, and verifies gateway process stability and health.
    • Validates gateway routing/config (model/provider/base URL and API mode), preserves .env and sandbox/session metadata, checks config hash files/permissions, and probes chat-completions on both external and local endpoints.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv self-assigned this Jun 19, 2026
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

A new live Vitest E2E test validates that switching Hermes inference provider and model updates routing and configuration while preserving the gateway process PID and .env contents. The test is integrated into the CI workflow as a hermes-inference-switch-vitest job and added to PR reporting.

Changes

Hermes Inference Switch E2E

Layer / File(s) Summary
Test configuration and helper functions
test/e2e-scenario/live/hermes-inference-switch.test.ts
Defines sandbox name, switch provider/model/API from env vars, retry count, global 45-minute timeout, and four local helpers: env() constructs process/sandbox environment with third-party acceptance flags; bestEffort() swallows cleanup errors; parseHermesModelBlock() parses the model: YAML section into key/value pairs; chatContent() robustly extracts chat text from Anthropic-style content arrays or OpenAI-style first-choice message fields.
Test scaffolding, Docker validation, and Hermes installation
test/e2e-scenario/live/hermes-inference-switch.test.ts
Implements gated test structure with artifact creation and registered best-effort cleanup (nemoclaw destroy + OpenShell sandbox delete). Pre-run cleanup runs best-effort destroy/delete. Checks Docker availability via docker info. Installs Hermes by running install.sh with retry up to a computed limit, backing off only on transient provider-validation errors. Captures Hermes gateway PID and SHA-256 hash of /sandbox/.hermes/.env before the switch operation.
Inference switch execution and comprehensive validation
test/e2e-scenario/live/hermes-inference-switch.test.ts
Executes inference-set switch via nemoclaw CLI and re-captures PID to assert process stability. Validates Hermes health endpoint returns "ok", checks OpenShell inference route/config contains switched provider and model. Reads and parses /sandbox/.hermes/config.yaml: asserts model.default equals switched model, provider is "custom", base_url matches expected endpoint shape (with or without /v1 based on SWITCH_API), api_mode is set for known APIs or unset otherwise, and api_key is present; verifies no unexpected models: stanza. Verifies config hash-check files under /etc/nemoclaw/ and /sandbox/.hermes/ with sha256sum -c and validates strict hash file has no write bits. Asserts .env hash after switch matches pre-switch value when pre-switch hash is non-empty. Validates ~/.nemoclaw/sandboxes.json and ~/.nemoclaw/onboard-session.json metadata for agent, provider, model, and sandbox name. Probes external inference.local chat endpoint and Hermes gateway at http://localhost:8642/, extracting chat response text and asserting both return "PONG".
CI workflow job and PR reporting
.github/workflows/e2e-vitest-scenarios.yaml
Adds the hermes-inference-switch-vitest job with workflow_dispatch gating, Hermes/NemoClaw/non-interactive env flags, OpenShell installation via scripts/install-openshell.sh, Vitest test execution, and artifact upload from e2e-artifacts/vitest/hermes-inference-switch/. Extends report-to-pr job's needs list to include the new job so test results are included in the PR comment table.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#5243: Both PRs modify .github/workflows/e2e-vitest-scenarios.yaml to wire free-standing Vitest E2E jobs into the workflow_dispatch job-selector scheme and update report-to-pr.needs.
  • NVIDIA/NemoClaw#5256: Both PRs add Hermes-specific free-standing Vitest jobs to .github/workflows/e2e-vitest-scenarios.yaml and wire Hermes results into the report-to-pr job via needs.
  • NVIDIA/NemoClaw#5498: Both PRs add a free-standing live Vitest job to .github/workflows/e2e-vitest-scenarios.yaml and extend report-to-pr.needs to include the new job's results.

Suggested labels

area: e2e, chore

Poem

🐇 Hop, hop, the gateway stays alive,
A new provider switched — PID survives!
I curl the sandbox, PONG comes back to me,
The config YAML checked most carefully.
No flaky timing, just TypeScript and glee! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately describes the main change: migrating an existing Hermes inference switch E2E test from shell script to a typed Vitest scenario, which is the primary objective of this changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch e2e-migrate/test-hermes-inference-switch

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-code-quality

github-code-quality Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Code Coverage Overview

Languages: TypeScript

TypeScript / code-coverage/plugin

The overall coverage in the branch is 96%. Coverage data for the branch is not yet available.

Show a code coverage summary of the most covered files.
File 15c05a3 +/-
nemoclaw/src/se...cret-scanner.ts 100%
nemoclaw/src/commands/slash.ts 100%
nemoclaw/src/li...bprocess-env.ts 100%
nemoclaw/src/bl...eprint/state.ts 98%
nemoclaw/src/onboard/config.ts 98%
nemoclaw/src/bl...int/snapshot.ts 97%
nemoclaw/src/bl...print/runner.ts 95%
nemoclaw/src/co...ration-state.ts 94%
nemoclaw/src/bl...ate-networks.ts 94%
nemoclaw/src/index.ts 94%

TypeScript / code-coverage/cli

The overall coverage in the branch is 46%. Coverage data for the branch is not yet available.

Show a code coverage summary of the most covered files.
File 15c05a3 +/-
src/lib/state/o...oard-session.ts 91%
src/lib/inference/local.ts 76%
src/lib/sandbox/config.ts 72%
src/lib/actions...dbox/rebuild.ts 67%
src/lib/onboard/preflight.ts 64%
src/lib/actions...licy-channel.ts 56%
src/lib/state/sandbox.ts 55%
src/lib/policy/index.ts 49%
src/lib/onboard...er-gpu-patch.ts 44%
src/lib/onboard.ts 18%

Updated June 20, 2026 20:22 UTC
Code Coverage is in Public Preview. Learn more and provide us with your feedback.

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: hermes-inference-switch-vitest, openclaw-inference-switch-vitest, inference-routing-vitest

Dispatch hint: hermes-inference-switch-vitest

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None. No merge-blocking E2E is required because the changed files only add or modify E2E workflow/test code and do not alter installer, onboarding, sandbox, credentials, network policy, inference routing implementation, deployment, or assistant runtime code.

Optional E2E

  • hermes-inference-switch-vitest (high): Useful to validate the newly added workflow job and live scenario wiring end-to-end. This is not merge-blocking for runtime safety because the PR changes only E2E workflow/test assets, but it is the direct confidence check for the new coverage.
  • openclaw-inference-switch-vitest (high): Adjacent existing inference-switch scenario for the OpenClaw agent; optional comparison coverage if reviewers want confidence that the new Hermes-specific scenario remains aligned with the established inference-switch flow.
  • inference-routing-vitest (medium): Optional lower-scope inference routing check for provider route classification and cleanup behavior adjacent to the new Hermes inference-switch scenario.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: .github/workflows/e2e-vitest-scenarios.yaml
  • jobs input: hermes-inference-switch-vitest

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: hermes-inference-switch-vitest
Optional Vitest E2E scenarios: None

Dispatch required Vitest E2E scenarios:

  • gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=hermes-inference-switch-vitest

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

  • hermes-inference-switch-vitest: Focused free-standing Vitest job wired for changed live test test/e2e-scenario/live/hermes-inference-switch.test.ts.
    • Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=hermes-inference-switch-vitest

Optional Vitest E2E scenarios

  • None.

Relevant changed files

  • .github/workflows/e2e-vitest-scenarios.yaml
  • test/e2e-scenario/live/hermes-inference-switch-helpers.ts
  • test/e2e-scenario/live/hermes-inference-switch.test.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.github/workflows/e2e-vitest-scenarios.yaml (1)

745-792: ⚡ Quick win

Consider adding Docker Hub authentication to avoid rate limiting.

The hermes-inference-switch-vitest job pulls Docker images during install.sh but lacks Docker Hub authentication, unlike similar jobs such as issue-2478-crash-loop-recovery-vitest (lines 3537-3566). Anonymous Docker Hub pulls are subject to rate limiting, which could cause intermittent CI failures.

If Hermes image pulls are expected to hit Docker Hub, consider adding the same Docker auth pattern used by other Docker-dependent jobs in this workflow.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e-vitest-scenarios.yaml around lines 745 - 792, The
hermes-inference-switch-vitest job is missing Docker Hub authentication
credentials, which could lead to rate limiting failures when pulling Docker
images during the Install OpenShell CLI step. Add a new step before the "Install
OpenShell CLI" step that authenticates to Docker Hub using GitHub secrets
(similar to the pattern used in the issue-2478-crash-loop-recovery-vitest job).
This step should log in to Docker Hub with appropriate credentials to prevent
anonymous pull rate limiting issues.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 745-792: The hermes-inference-switch-vitest job is missing Docker
Hub authentication credentials, which could lead to rate limiting failures when
pulling Docker images during the Install OpenShell CLI step. Add a new step
before the "Install OpenShell CLI" step that authenticates to Docker Hub using
GitHub secrets (similar to the pattern used in the
issue-2478-crash-loop-recovery-vitest job). This step should log in to Docker
Hub with appropriate credentials to prevent anonymous pull rate limiting issues.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1d11cc31-1634-4f4a-a5d4-c2f71a7bc123

📥 Commits

Reviewing files that changed from the base of the PR and between 9d4d1f5 and bea95ca.

📒 Files selected for processing (2)
  • .github/workflows/e2e-vitest-scenarios.yaml
  • test/e2e-scenario/live/hermes-inference-switch.test.ts

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 8 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 5 still apply, 1 new item found

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: Sandbox curl payload shell quoting: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `quotePayload()` manually replaces single quotes and both curl command helpers interpolate the result into `trustedSandboxShellScript(...)`.
  • Source-of-truth review needed: Best-effort Hermes sandbox cleanup: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `bestEffort()` catches with an empty `catch {}`, and cleanup only invokes NemoClaw destroy plus OpenShell sandbox delete.
  • Avoid artifacting the full Hermes config while asserting generated API key shape (test/e2e-scenario/live/hermes-inference-switch.test.ts:105): The test captures `/sandbox/.hermes/config.yaml` into shell-probe artifacts while separately asserting that the generated `api_key` exists. The hosted NVIDIA key is explicitly redacted and the fixture regex currently covers long `sk-...` tokens, but the generated local Hermes/API server key is not discovered and passed as an explicit redaction value. If the generated key shape or length changes, the artifact could expose a live local auth secret.
    • Recommendation: Parse the needed non-sensitive config fields inside the sandbox and return only assertion output, or discover generated `api_key` and `.env` `API_SERVER_KEY` values before artifact creation and pass them as explicit `redactionValues`. At minimum, redact `api_key:` values from the config artifact while preserving the malformed/missing-key assertion.
    • Evidence: `sandbox.exec(SANDBOX_NAME, ["cat", "/sandbox/.hermes/config.yaml"], { artifactName: "hermes-config-yaml", redactionValues: [apiKey] })` writes the full config, while `apiKeyShape(sandbox)` proves an `sk-...` key exists in that same file.
  • Avoid shell-string construction for sandbox curl payloads (test/e2e-scenario/live/hermes-inference-switch-helpers.ts:388): The in-sandbox curl probes build a JSON payload string, manually escape single quotes, and interpolate it into a `sh -lc` script. The current payload is mostly test-controlled, but `SWITCH_MODEL` can come from the environment and `trustedSandboxShellScript()` only brands non-empty strings; it does not sanitize shell syntax. This is a sandbox-boundary maintenance risk if future payload fields become less controlled.
    • Recommendation: Write the JSON payload through stdin or a temp file and use `--data-binary @file`, or centralize a covered shell-quoting helper. Add regression coverage for single quotes, semicolons, command substitutions, backticks, and newlines in model/prompt values.
    • Evidence: `quotePayload()` only does `payload.replace(/'/gu, ...)`, and both `inferenceLocalCommand()` and `hermesApiCommand()` interpolate the result into strings passed to `trustedSandboxShellScript(...)`.
  • Keep the legacy transient HTTP handling in live chat probes (test/e2e-scenario/live/hermes-inference-switch-helpers.ts:388): The Vitest replacement sends one bare curl request and then parses stdout as JSON. The legacy shell test captured HTTP status, retried three times, and treated curl timeout 28 and live-provider HTTP 502/503/504 as transient after route/config assertions had already passed. Dropping that behavior makes this live third-party inference scenario more likely to fail on provider flake rather than on the migration behavior under test.
    • Recommendation: Restore the legacy status marker, retry loop, and transient classification around both `inference.local` and Hermes API chat probes, or document why the Vitest replacement should fail hard on the first transient provider response.
    • Evidence: `inferenceLocalCommand()` and `hermesApiCommand()` return plain `curl -sS` commands. The retained legacy `check_inference_local()` and `check_hermes_api_chat()` used `curl -w '%{http_code}'`, retry attempts, and transient classification for 502/503/504 and curl exit 28.
  • Preserve cleanup failure evidence for Hermes sandbox teardown (test/e2e-scenario/live/hermes-inference-switch-helpers.ts:51): Cleanup is intentionally best-effort, but the helper catches and discards all cleanup errors. It also does not mirror the legacy script's gateway destroy and post-destroy registry removal check, so stale sandbox/gateway state can be hidden and make later live runs harder to diagnose.
    • Recommendation: Keep expected not-found cleanup tolerant, but record cleanup command failures in artifacts or return a structured cleanup result. Consider mirroring the legacy `openshell gateway destroy -g nemoclaw` and registry-removal check, or document why NemoClaw destroy plus sandbox delete is sufficient for this Vitest path.
    • Evidence: `bestEffort()` has an empty `catch {}` and `cleanupHermesSwitch()` only runs `node ... destroy --yes` plus `openshell sandbox delete`. The legacy shell test also runs `openshell gateway destroy -g nemoclaw` and checks that the sandbox name is no longer present in `~/.nemoclaw/sandboxes.json`.
  • Treat the new secret-bearing workflow job as a trusted-code boundary (.github/workflows/e2e-vitest-scenarios.yaml:777): The new free-standing job checks out branch-controlled code, installs dependencies, builds the CLI, runs the branch-controlled OpenShell installer, and then executes branch-controlled Vitest code with `NVIDIA_INFERENCE_API_KEY` available. This follows the existing live E2E pattern and uses pinned actions with `persist-credentials: false`, but it is still a new credential-bearing dispatch path.
    • Recommendation: Ensure this job remains restricted to trusted-code dispatches and keep secret use as late and narrow as possible. If the workflow may be run against untrusted PR code, split static/compile validation from secret-bearing live execution or add an explicit trusted-ref guard.
    • Evidence: The job runs `npm run build:cli`, `bash scripts/install-openshell.sh`, and `npx vitest ... hermes-inference-switch.test.ts` with `NVIDIA_INFERENCE_API_KEY: ${{ secrets.NVIDIA_INFERENCE_API_KEY }}`.
  • Cover auth-negative behavior around generated Hermes and compatible-provider keys (test/e2e-scenario/live/hermes-inference-switch-helpers.ts:131): The new scenario proves positive chat paths, but it does not verify that the Hermes API rejects missing or wrong `API_SERVER_KEY`, and the mock compatible Anthropic provider accepts requests without checking the configured credential. That limits confidence that the migration preserves credential forwarding and local API authorization boundaries.
    • Recommendation: Add negative-path assertions for missing/wrong Hermes API server keys and, for the compatible Anthropic mock mode, record and assert the expected authorization header or explicit credential behavior.
    • Evidence: `hermesApiCommand()` always sends `Authorization: Bearer ${API_SERVER_KEY:-}` for the positive request, while `startMockAnthropicProvider()` returns responses for `/v1/messages` without validating authorization.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — Verify `scenarios=hermes-inference-switch` selects only `hermes-inference-switch-vitest` and leaves the registry scenario matrix empty.. Static review can verify wiring and assertions, but changed behavior depends on workflow dispatch selection, Docker/OpenShell installation, sandbox lifecycle, Hermes runtime state, generated config, `inference.local`, and live inference credentials.
  • **Runtime validation** — Verify `jobs=hermes-inference-switch-vitest` selects only the Hermes switch free-standing job and leaves the registry scenario matrix empty.. Static review can verify wiring and assertions, but changed behavior depends on workflow dispatch selection, Docker/OpenShell installation, sandbox lifecycle, Hermes runtime state, generated config, `inference.local`, and live inference credentials.
  • **Runtime validation** — Verify compatible Anthropic mode with `NEMOCLAW_SWITCH_PROVIDER=compatible-anthropic-endpoint`, `NEMOCLAW_SWITCH_INFERENCE_API=anthropic-messages`, `NEMOCLAW_SWITCH_MOCK_ANTHROPIC=1`, and model `mock-anthropic-model` registers the provider before `inference set` and probes `/v1/messages` with the `anthropic-version` header.. Static review can verify wiring and assertions, but changed behavior depends on workflow dispatch selection, Docker/OpenShell installation, sandbox lifecycle, Hermes runtime state, generated config, `inference.local`, and live inference credentials.
  • **Runtime validation** — Verify the live `inference.local` and Hermes API chat probes capture HTTP status, retry, and classify HTTP 502/503/504 plus curl timeout 28 as transient rather than failing on the first transient provider response.. Static review can verify wiring and assertions, but changed behavior depends on workflow dispatch selection, Docker/OpenShell installation, sandbox lifecycle, Hermes runtime state, generated config, `inference.local`, and live inference credentials.
  • **Runtime validation** — Verify generated Hermes `api_key` and `.env` `API_SERVER_KEY` are never written to artifacts while the test still fails if the generated key is missing or malformed.. Static review can verify wiring and assertions, but changed behavior depends on workflow dispatch selection, Docker/OpenShell installation, sandbox lifecycle, Hermes runtime state, generated config, `inference.local`, and live inference credentials.
  • **Cover auth-negative behavior around generated Hermes and compatible-provider keys** — Add negative-path assertions for missing/wrong Hermes API server keys and, for the compatible Anthropic mock mode, record and assert the expected authorization header or explicit credential behavior.
  • **Acceptance clause:** Refs Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 — add test evidence or identify existing coverage. The deterministic validation context reported `linkedIssues: []`, so issue Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 body/comments were not available to extract and map.
  • **Acceptance clause:** Add a typed live Vitest replacement for `test/e2e/test-hermes-inference-switch.sh`. — add test evidence or identify existing coverage. The PR adds `test/e2e-scenario/live/hermes-inference-switch.test.ts` and helper coverage for Docker, install, route/config, hashes, registry/session, `inference.local`, and Hermes API chat. Remaining replacement drift: live chat probes dropped legacy status/retry/transient classification, and cleanup no longer destroys the gateway or verifies registry removal.
Since last review details

Current findings:

  • Source-of-truth review needed: Sandbox curl payload shell quoting: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `quotePayload()` manually replaces single quotes and both curl command helpers interpolate the result into `trustedSandboxShellScript(...)`.
  • Source-of-truth review needed: Best-effort Hermes sandbox cleanup: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `bestEffort()` catches with an empty `catch {}`, and cleanup only invokes NemoClaw destroy plus OpenShell sandbox delete.
  • Avoid artifacting the full Hermes config while asserting generated API key shape (test/e2e-scenario/live/hermes-inference-switch.test.ts:105): The test captures `/sandbox/.hermes/config.yaml` into shell-probe artifacts while separately asserting that the generated `api_key` exists. The hosted NVIDIA key is explicitly redacted and the fixture regex currently covers long `sk-...` tokens, but the generated local Hermes/API server key is not discovered and passed as an explicit redaction value. If the generated key shape or length changes, the artifact could expose a live local auth secret.
    • Recommendation: Parse the needed non-sensitive config fields inside the sandbox and return only assertion output, or discover generated `api_key` and `.env` `API_SERVER_KEY` values before artifact creation and pass them as explicit `redactionValues`. At minimum, redact `api_key:` values from the config artifact while preserving the malformed/missing-key assertion.
    • Evidence: `sandbox.exec(SANDBOX_NAME, ["cat", "/sandbox/.hermes/config.yaml"], { artifactName: "hermes-config-yaml", redactionValues: [apiKey] })` writes the full config, while `apiKeyShape(sandbox)` proves an `sk-...` key exists in that same file.
  • Avoid shell-string construction for sandbox curl payloads (test/e2e-scenario/live/hermes-inference-switch-helpers.ts:388): The in-sandbox curl probes build a JSON payload string, manually escape single quotes, and interpolate it into a `sh -lc` script. The current payload is mostly test-controlled, but `SWITCH_MODEL` can come from the environment and `trustedSandboxShellScript()` only brands non-empty strings; it does not sanitize shell syntax. This is a sandbox-boundary maintenance risk if future payload fields become less controlled.
    • Recommendation: Write the JSON payload through stdin or a temp file and use `--data-binary @file`, or centralize a covered shell-quoting helper. Add regression coverage for single quotes, semicolons, command substitutions, backticks, and newlines in model/prompt values.
    • Evidence: `quotePayload()` only does `payload.replace(/'/gu, ...)`, and both `inferenceLocalCommand()` and `hermesApiCommand()` interpolate the result into strings passed to `trustedSandboxShellScript(...)`.
  • Keep the legacy transient HTTP handling in live chat probes (test/e2e-scenario/live/hermes-inference-switch-helpers.ts:388): The Vitest replacement sends one bare curl request and then parses stdout as JSON. The legacy shell test captured HTTP status, retried three times, and treated curl timeout 28 and live-provider HTTP 502/503/504 as transient after route/config assertions had already passed. Dropping that behavior makes this live third-party inference scenario more likely to fail on provider flake rather than on the migration behavior under test.
    • Recommendation: Restore the legacy status marker, retry loop, and transient classification around both `inference.local` and Hermes API chat probes, or document why the Vitest replacement should fail hard on the first transient provider response.
    • Evidence: `inferenceLocalCommand()` and `hermesApiCommand()` return plain `curl -sS` commands. The retained legacy `check_inference_local()` and `check_hermes_api_chat()` used `curl -w '%{http_code}'`, retry attempts, and transient classification for 502/503/504 and curl exit 28.
  • Preserve cleanup failure evidence for Hermes sandbox teardown (test/e2e-scenario/live/hermes-inference-switch-helpers.ts:51): Cleanup is intentionally best-effort, but the helper catches and discards all cleanup errors. It also does not mirror the legacy script's gateway destroy and post-destroy registry removal check, so stale sandbox/gateway state can be hidden and make later live runs harder to diagnose.
    • Recommendation: Keep expected not-found cleanup tolerant, but record cleanup command failures in artifacts or return a structured cleanup result. Consider mirroring the legacy `openshell gateway destroy -g nemoclaw` and registry-removal check, or document why NemoClaw destroy plus sandbox delete is sufficient for this Vitest path.
    • Evidence: `bestEffort()` has an empty `catch {}` and `cleanupHermesSwitch()` only runs `node ... destroy --yes` plus `openshell sandbox delete`. The legacy shell test also runs `openshell gateway destroy -g nemoclaw` and checks that the sandbox name is no longer present in `~/.nemoclaw/sandboxes.json`.
  • Treat the new secret-bearing workflow job as a trusted-code boundary (.github/workflows/e2e-vitest-scenarios.yaml:777): The new free-standing job checks out branch-controlled code, installs dependencies, builds the CLI, runs the branch-controlled OpenShell installer, and then executes branch-controlled Vitest code with `NVIDIA_INFERENCE_API_KEY` available. This follows the existing live E2E pattern and uses pinned actions with `persist-credentials: false`, but it is still a new credential-bearing dispatch path.
    • Recommendation: Ensure this job remains restricted to trusted-code dispatches and keep secret use as late and narrow as possible. If the workflow may be run against untrusted PR code, split static/compile validation from secret-bearing live execution or add an explicit trusted-ref guard.
    • Evidence: The job runs `npm run build:cli`, `bash scripts/install-openshell.sh`, and `npx vitest ... hermes-inference-switch.test.ts` with `NVIDIA_INFERENCE_API_KEY: ${{ secrets.NVIDIA_INFERENCE_API_KEY }}`.
  • Cover auth-negative behavior around generated Hermes and compatible-provider keys (test/e2e-scenario/live/hermes-inference-switch-helpers.ts:131): The new scenario proves positive chat paths, but it does not verify that the Hermes API rejects missing or wrong `API_SERVER_KEY`, and the mock compatible Anthropic provider accepts requests without checking the configured credential. That limits confidence that the migration preserves credential forwarding and local API authorization boundaries.
    • Recommendation: Add negative-path assertions for missing/wrong Hermes API server keys and, for the compatible Anthropic mock mode, record and assert the expected authorization header or explicit credential behavior.
    • Evidence: `hermesApiCommand()` always sends `Authorization: Bearer ${API_SERVER_KEY:-}` for the positive request, while `startMockAnthropicProvider()` returns responses for `/v1/messages` without validating authorization.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv added the v0.0.66 Release target label Jun 19, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-scenario/live/hermes-inference-switch.test.ts`:
- Around line 271-279: The inferenceLocalPayload and curl request are hardcoded
to use the OpenAI Chat Completions format with /v1/chat/completions endpoint,
but the test validates both anthropic-messages and openai-responses configs.
Build the probe request dynamically based on the SWITCH_API configuration being
tested instead of using a hardcoded OpenAI format, or restrict this probe to
only execute against chat-completions compatible configs to ensure the test
validates the correct endpoint and payload format for each supported switch
mode.
- Line 226: The assertion using toMatch() on model.api_key can expose the actual
API key value in test output when the assertion fails. Replace the toMatch()
assertion with a boolean predicate that checks if the API key matches the
pattern (such as using a regex test method) and assert the resulting boolean
value instead, so only true or false is printed in failed assertions rather than
the secret key itself.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ee870947-11fa-479a-bd1f-9aee2ee1dba5

📥 Commits

Reviewing files that changed from the base of the PR and between bea95ca and d965382.

📒 Files selected for processing (1)
  • test/e2e-scenario/live/hermes-inference-switch.test.ts

Comment thread test/e2e-scenario/live/hermes-inference-switch.test.ts Outdated
Comment thread test/e2e-scenario/live/hermes-inference-switch.test.ts Outdated
@cv cv linked an issue Jun 19, 2026 that may be closed by this pull request
79 tasks
cv added 3 commits June 19, 2026 14:07
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
…es-inference-switch

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
cv added 7 commits June 19, 2026 14:44
…es-inference-switch

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
…e-switch' into e2e-migrate/test-hermes-inference-switch

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Comment thread test/e2e-scenario/live/hermes-inference-switch-helpers.ts Fixed
Comment thread test/e2e-scenario/live/hermes-inference-switch-helpers.ts Fixed
cv and others added 2 commits June 20, 2026 11:07
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
…es-inference-switch

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv merged commit 7bcd559 into main Jun 20, 2026
40 checks passed
@cv cv deleted the e2e-migrate/test-hermes-inference-switch branch June 20, 2026 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v0.0.66 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants