test(e2e): migrate Hermes inference switch to vitest by cv · Pull Request #5553 · NVIDIA/NemoClaw

cv · 2026-06-19T19:40:44Z

Summary

Migrates the Hermes inference switch E2E into a typed live Vitest scenario. The new test onboards a real Hermes sandbox, switches the inference route, verifies OpenShell route state plus Hermes config/registry state, and sends a live Hermes API chat after the switch.

Related Issue

Refs #5098

Changes

Add a typed live Vitest replacement for test/e2e/test-hermes-inference-switch.sh.
Wire a free-standing dispatchable Vitest job into .github/workflows/e2e-vitest-scenarios.yaml.
Preserve legacy shell deletion and any legacy shell workflow cleanup for Phase 11 per Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 migration governance.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

Targeted commands run:

npx biome check --write test/e2e-scenario/live/hermes-inference-switch.test.ts
NEMOCLAW_RUN_E2E_SCENARIOS=1 npx vitest run --project e2e-scenarios-live test/e2e-scenario/live/hermes-inference-switch.test.ts -t __compile_only_nomatch__ --silent=false --reporter=default --passWithNoTests
npx vitest run --project e2e-vitest-support test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts
npx tsx scripts/check-test-file-size-budget.ts test/e2e-scenario/live/hermes-inference-switch.test.ts
npx tsc --noEmit --strict --moduleResolution bundler --module preserve --target ES2022 --types node --allowImportingTsExtensions test/e2e-scenario/live/hermes-inference-switch.test.ts
git diff --check

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

CI / Automation
- Added a new gated end-to-end Vitest job for the Hermes inference switch and included its results in the PR reporting summary.
- Runs non-interactively with Hermes/NemoClaw switch settings and publishes scenario artifacts.
Tests
- Added a live E2E scenario that installs Hermes, switches the inference set, and verifies gateway process stability and health.
- Validates gateway routing/config (model/provider/base URL and API mode), preserves .env and sandbox/session metadata, checks config hash files/permissions, and probes chat-completions on both external and local endpoints.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

coderabbitai · 2026-06-19T19:41:00Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

A new live Vitest E2E test validates that switching Hermes inference provider and model updates routing and configuration while preserving the gateway process PID and .env contents. The test is integrated into the CI workflow as a hermes-inference-switch-vitest job and added to PR reporting.

Changes

Hermes Inference Switch E2E

Layer / File(s)	Summary
Test configuration and helper functions `test/e2e-scenario/live/hermes-inference-switch.test.ts`	Defines sandbox name, switch provider/model/API from env vars, retry count, global 45-minute timeout, and four local helpers: `env()` constructs process/sandbox environment with third-party acceptance flags; `bestEffort()` swallows cleanup errors; `parseHermesModelBlock()` parses the `model:` YAML section into key/value pairs; `chatContent()` robustly extracts chat text from Anthropic-style `content` arrays or OpenAI-style first-choice message fields.
Test scaffolding, Docker validation, and Hermes installation `test/e2e-scenario/live/hermes-inference-switch.test.ts`	Implements gated test structure with artifact creation and registered best-effort cleanup (nemoclaw destroy + OpenShell sandbox delete). Pre-run cleanup runs best-effort destroy/delete. Checks Docker availability via `docker info`. Installs Hermes by running `install.sh` with retry up to a computed limit, backing off only on transient provider-validation errors. Captures Hermes gateway PID and SHA-256 hash of `/sandbox/.hermes/.env` before the switch operation.
Inference switch execution and comprehensive validation `test/e2e-scenario/live/hermes-inference-switch.test.ts`	Executes inference-set switch via nemoclaw CLI and re-captures PID to assert process stability. Validates Hermes health endpoint returns "ok", checks OpenShell inference route/config contains switched provider and model. Reads and parses `/sandbox/.hermes/config.yaml`: asserts `model.default` equals switched model, `provider` is "custom", `base_url` matches expected endpoint shape (with or without `/v1` based on SWITCH_API), `api_mode` is set for known APIs or unset otherwise, and `api_key` is present; verifies no unexpected `models:` stanza. Verifies config hash-check files under `/etc/nemoclaw/` and `/sandbox/.hermes/` with `sha256sum -c` and validates strict hash file has no write bits. Asserts `.env` hash after switch matches pre-switch value when pre-switch hash is non-empty. Validates `~/.nemoclaw/sandboxes.json` and `~/.nemoclaw/onboard-session.json` metadata for agent, provider, model, and sandbox name. Probes external `inference.local` chat endpoint and Hermes gateway at `http://localhost:8642/`, extracting chat response text and asserting both return "PONG".
CI workflow job and PR reporting `.github/workflows/e2e-vitest-scenarios.yaml`	Adds the `hermes-inference-switch-vitest` job with `workflow_dispatch` gating, Hermes/NemoClaw/non-interactive env flags, OpenShell installation via `scripts/install-openshell.sh`, Vitest test execution, and artifact upload from `e2e-artifacts/vitest/hermes-inference-switch/`. Extends `report-to-pr` job's `needs` list to include the new job so test results are included in the PR comment table.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

NVIDIA/NemoClaw#5243: Both PRs modify .github/workflows/e2e-vitest-scenarios.yaml to wire free-standing Vitest E2E jobs into the workflow_dispatch job-selector scheme and update report-to-pr.needs.
NVIDIA/NemoClaw#5256: Both PRs add Hermes-specific free-standing Vitest jobs to .github/workflows/e2e-vitest-scenarios.yaml and wire Hermes results into the report-to-pr job via needs.
NVIDIA/NemoClaw#5498: Both PRs add a free-standing live Vitest job to .github/workflows/e2e-vitest-scenarios.yaml and extend report-to-pr.needs to include the new job's results.

Suggested labels

area: e2e, chore

Poem

🐇 Hop, hop, the gateway stays alive,
A new provider switched — PID survives!
I curl the sandbox, PONG comes back to me,
The config YAML checked most carefully.
No flaky timing, just TypeScript and glee! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately describes the main change: migrating an existing Hermes inference switch E2E test from shell script to a typed Vitest scenario, which is the primary objective of this changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch e2e-migrate/test-hermes-inference-switch

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-code-quality · 2026-06-19T19:41:19Z

Code Coverage Overview

Languages: TypeScript

TypeScript / code-coverage/plugin

The overall coverage in the e2e-migrate/test-her... branch is 96%. Coverage data for the main branch is not yet available.

Show a code coverage summary of the most covered files.

File	main	e2e-migrate/test-her... `15c05a3`	+/-
`nemoclaw/src/se...cret-scanner.ts`	—	100%	—
`nemoclaw/src/commands/slash.ts`	—	100%	—
`nemoclaw/src/li...bprocess-env.ts`	—	100%	—
`nemoclaw/src/bl...eprint/state.ts`	—	98%	—
`nemoclaw/src/onboard/config.ts`	—	98%	—
`nemoclaw/src/bl...int/snapshot.ts`	—	97%	—
`nemoclaw/src/bl...print/runner.ts`	—	95%	—
`nemoclaw/src/co...ration-state.ts`	—	94%	—
`nemoclaw/src/bl...ate-networks.ts`	—	94%	—
`nemoclaw/src/index.ts`	—	94%	—

TypeScript / code-coverage/cli

The overall coverage in the e2e-migrate/test-her... branch is 46%. Coverage data for the main branch is not yet available.

Show a code coverage summary of the most covered files.

File	main	e2e-migrate/test-her... `15c05a3`	+/-
`src/lib/state/o...oard-session.ts`	—	91%	—
`src/lib/inference/local.ts`	—	76%	—
`src/lib/sandbox/config.ts`	—	72%	—
`src/lib/actions...dbox/rebuild.ts`	—	67%	—
`src/lib/onboard/preflight.ts`	—	64%	—
`src/lib/actions...licy-channel.ts`	—	56%	—
`src/lib/state/sandbox.ts`	—	55%	—
`src/lib/policy/index.ts`	—	49%	—
`src/lib/onboard...er-gpu-patch.ts`	—	44%	—
`src/lib/onboard.ts`	—	18%	—

_{Updated June 20, 2026 20:22 UTC
Code Coverage is in Public Preview. Learn more and provide us with your feedback.}

github-actions · 2026-06-19T19:41:42Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: hermes-inference-switch-vitest, openclaw-inference-switch-vitest, inference-routing-vitest

Dispatch hint: hermes-inference-switch-vitest

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

None. No merge-blocking E2E is required because the changed files only add or modify E2E workflow/test code and do not alter installer, onboarding, sandbox, credentials, network policy, inference routing implementation, deployment, or assistant runtime code.

Optional E2E

hermes-inference-switch-vitest (high): Useful to validate the newly added workflow job and live scenario wiring end-to-end. This is not merge-blocking for runtime safety because the PR changes only E2E workflow/test assets, but it is the direct confidence check for the new coverage.
openclaw-inference-switch-vitest (high): Adjacent existing inference-switch scenario for the OpenClaw agent; optional comparison coverage if reviewers want confidence that the new Hermes-specific scenario remains aligned with the established inference-switch flow.
inference-routing-vitest (medium): Optional lower-scope inference routing check for provider route classification and cleanup behavior adjacent to the new Hermes inference-switch scenario.

New E2E recommendations

None.

Dispatch hint

Workflow: .github/workflows/e2e-vitest-scenarios.yaml
jobs input: hermes-inference-switch-vitest

github-actions · 2026-06-19T19:41:43Z

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: hermes-inference-switch-vitest
Optional Vitest E2E scenarios: None

Dispatch required Vitest E2E scenarios:

gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=hermes-inference-switch-vitest

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

hermes-inference-switch-vitest: Focused free-standing Vitest job wired for changed live test test/e2e-scenario/live/hermes-inference-switch.test.ts.
- Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=hermes-inference-switch-vitest

Optional Vitest E2E scenarios

None.

Relevant changed files

.github/workflows/e2e-vitest-scenarios.yaml
test/e2e-scenario/live/hermes-inference-switch-helpers.ts
test/e2e-scenario/live/hermes-inference-switch.test.ts

coderabbitai

🧹 Nitpick comments (1)

.github/workflows/e2e-vitest-scenarios.yaml (1)
745-792: ⚡ Quick win

Consider adding Docker Hub authentication to avoid rate limiting.

The hermes-inference-switch-vitest job pulls Docker images during install.sh but lacks Docker Hub authentication, unlike similar jobs such as issue-2478-crash-loop-recovery-vitest (lines 3537-3566). Anonymous Docker Hub pulls are subject to rate limiting, which could cause intermittent CI failures.

If Hermes image pulls are expected to hit Docker Hub, consider adding the same Docker auth pattern used by other Docker-dependent jobs in this workflow.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e-vitest-scenarios.yaml around lines 745 - 792, The
hermes-inference-switch-vitest job is missing Docker Hub authentication
credentials, which could lead to rate limiting failures when pulling Docker
images during the Install OpenShell CLI step. Add a new step before the "Install
OpenShell CLI" step that authenticates to Docker Hub using GitHub secrets
(similar to the pattern used in the issue-2478-crash-loop-recovery-vitest job).
This step should log in to Docker Hub with appropriate credentials to prevent
anonymous pull rate limiting issues.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 745-792: The hermes-inference-switch-vitest job is missing Docker
Hub authentication credentials, which could lead to rate limiting failures when
pulling Docker images during the Install OpenShell CLI step. Add a new step
before the "Install OpenShell CLI" step that authenticates to Docker Hub using
GitHub secrets (similar to the pattern used in the
issue-2478-crash-loop-recovery-vitest job). This step should log in to Docker
Hub with appropriate credentials to prevent anonymous pull rate limiting issues.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1d11cc31-1634-4f4a-a5d4-c2f71a7bc123

📥 Commits

Reviewing files that changed from the base of the PR and between 9d4d1f5 and bea95ca.

📒 Files selected for processing (2)

.github/workflows/e2e-vitest-scenarios.yaml
test/e2e-scenario/live/hermes-inference-switch.test.ts

github-actions · 2026-06-19T19:44:53Z

PR Review Advisor

Findings: 0 needs attention, 8 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 5 still apply, 1 new item found

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Source-of-truth review needed: Sandbox curl payload shell quoting: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `quotePayload()` manually replaces single quotes and both curl command helpers interpolate the result into `trustedSandboxShellScript(...)`.
Source-of-truth review needed: Best-effort Hermes sandbox cleanup: The advisor marked localized patch analysis as missing.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `bestEffort()` catches with an empty `catch {}`, and cleanup only invokes NemoClaw destroy plus OpenShell sandbox delete.
Avoid artifacting the full Hermes config while asserting generated API key shape (test/e2e-scenario/live/hermes-inference-switch.test.ts:105): The test captures `/sandbox/.hermes/config.yaml` into shell-probe artifacts while separately asserting that the generated `api_key` exists. The hosted NVIDIA key is explicitly redacted and the fixture regex currently covers long `sk-...` tokens, but the generated local Hermes/API server key is not discovered and passed as an explicit redaction value. If the generated key shape or length changes, the artifact could expose a live local auth secret.
- Recommendation: Parse the needed non-sensitive config fields inside the sandbox and return only assertion output, or discover generated `api_key` and `.env` `API_SERVER_KEY` values before artifact creation and pass them as explicit `redactionValues`. At minimum, redact `api_key:` values from the config artifact while preserving the malformed/missing-key assertion.
- Evidence: `sandbox.exec(SANDBOX_NAME, ["cat", "/sandbox/.hermes/config.yaml"], { artifactName: "hermes-config-yaml", redactionValues: [apiKey] })` writes the full config, while `apiKeyShape(sandbox)` proves an `sk-...` key exists in that same file.
Avoid shell-string construction for sandbox curl payloads (test/e2e-scenario/live/hermes-inference-switch-helpers.ts:388): The in-sandbox curl probes build a JSON payload string, manually escape single quotes, and interpolate it into a `sh -lc` script. The current payload is mostly test-controlled, but `SWITCH_MODEL` can come from the environment and `trustedSandboxShellScript()` only brands non-empty strings; it does not sanitize shell syntax. This is a sandbox-boundary maintenance risk if future payload fields become less controlled.
- Recommendation: Write the JSON payload through stdin or a temp file and use `--data-binary @file`, or centralize a covered shell-quoting helper. Add regression coverage for single quotes, semicolons, command substitutions, backticks, and newlines in model/prompt values.
- Evidence: `quotePayload()` only does `payload.replace(/'/gu, ...)`, and both `inferenceLocalCommand()` and `hermesApiCommand()` interpolate the result into strings passed to `trustedSandboxShellScript(...)`.
Keep the legacy transient HTTP handling in live chat probes (test/e2e-scenario/live/hermes-inference-switch-helpers.ts:388): The Vitest replacement sends one bare curl request and then parses stdout as JSON. The legacy shell test captured HTTP status, retried three times, and treated curl timeout 28 and live-provider HTTP 502/503/504 as transient after route/config assertions had already passed. Dropping that behavior makes this live third-party inference scenario more likely to fail on provider flake rather than on the migration behavior under test.
- Recommendation: Restore the legacy status marker, retry loop, and transient classification around both `inference.local` and Hermes API chat probes, or document why the Vitest replacement should fail hard on the first transient provider response.
- Evidence: `inferenceLocalCommand()` and `hermesApiCommand()` return plain `curl -sS` commands. The retained legacy `check_inference_local()` and `check_hermes_api_chat()` used `curl -w '%{http_code}'`, retry attempts, and transient classification for 502/503/504 and curl exit 28.
Preserve cleanup failure evidence for Hermes sandbox teardown (test/e2e-scenario/live/hermes-inference-switch-helpers.ts:51): Cleanup is intentionally best-effort, but the helper catches and discards all cleanup errors. It also does not mirror the legacy script's gateway destroy and post-destroy registry removal check, so stale sandbox/gateway state can be hidden and make later live runs harder to diagnose.
- Recommendation: Keep expected not-found cleanup tolerant, but record cleanup command failures in artifacts or return a structured cleanup result. Consider mirroring the legacy `openshell gateway destroy -g nemoclaw` and registry-removal check, or document why NemoClaw destroy plus sandbox delete is sufficient for this Vitest path.
- Evidence: `bestEffort()` has an empty `catch {}` and `cleanupHermesSwitch()` only runs `node ... destroy --yes` plus `openshell sandbox delete`. The legacy shell test also runs `openshell gateway destroy -g nemoclaw` and checks that the sandbox name is no longer present in `~/.nemoclaw/sandboxes.json`.
Treat the new secret-bearing workflow job as a trusted-code boundary (.github/workflows/e2e-vitest-scenarios.yaml:777): The new free-standing job checks out branch-controlled code, installs dependencies, builds the CLI, runs the branch-controlled OpenShell installer, and then executes branch-controlled Vitest code with `NVIDIA_INFERENCE_API_KEY` available. This follows the existing live E2E pattern and uses pinned actions with `persist-credentials: false`, but it is still a new credential-bearing dispatch path.
- Recommendation: Ensure this job remains restricted to trusted-code dispatches and keep secret use as late and narrow as possible. If the workflow may be run against untrusted PR code, split static/compile validation from secret-bearing live execution or add an explicit trusted-ref guard.
- Evidence: The job runs `npm run build:cli`, `bash scripts/install-openshell.sh`, and `npx vitest ... hermes-inference-switch.test.ts` with `NVIDIA_INFERENCE_API_KEY: ${{ secrets.NVIDIA_INFERENCE_API_KEY }}`.
Cover auth-negative behavior around generated Hermes and compatible-provider keys (test/e2e-scenario/live/hermes-inference-switch-helpers.ts:131): The new scenario proves positive chat paths, but it does not verify that the Hermes API rejects missing or wrong `API_SERVER_KEY`, and the mock compatible Anthropic provider accepts requests without checking the configured credential. That limits confidence that the migration preserves credential forwarding and local API authorization boundaries.
- Recommendation: Add negative-path assertions for missing/wrong Hermes API server keys and, for the compatible Anthropic mock mode, record and assert the expected authorization header or explicit credential behavior.
- Evidence: `hermesApiCommand()` always sends `Authorization: Bearer ${API_SERVER_KEY:-}` for the positive request, while `startMockAnthropicProvider()` returns responses for `/v1/messages` without validating authorization.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — Verify `scenarios=hermes-inference-switch` selects only `hermes-inference-switch-vitest` and leaves the registry scenario matrix empty.. Static review can verify wiring and assertions, but changed behavior depends on workflow dispatch selection, Docker/OpenShell installation, sandbox lifecycle, Hermes runtime state, generated config, `inference.local`, and live inference credentials.
**Runtime validation** — Verify `jobs=hermes-inference-switch-vitest` selects only the Hermes switch free-standing job and leaves the registry scenario matrix empty.. Static review can verify wiring and assertions, but changed behavior depends on workflow dispatch selection, Docker/OpenShell installation, sandbox lifecycle, Hermes runtime state, generated config, `inference.local`, and live inference credentials.
**Runtime validation** — Verify compatible Anthropic mode with `NEMOCLAW_SWITCH_PROVIDER=compatible-anthropic-endpoint`, `NEMOCLAW_SWITCH_INFERENCE_API=anthropic-messages`, `NEMOCLAW_SWITCH_MOCK_ANTHROPIC=1`, and model `mock-anthropic-model` registers the provider before `inference set` and probes `/v1/messages` with the `anthropic-version` header.. Static review can verify wiring and assertions, but changed behavior depends on workflow dispatch selection, Docker/OpenShell installation, sandbox lifecycle, Hermes runtime state, generated config, `inference.local`, and live inference credentials.
**Runtime validation** — Verify the live `inference.local` and Hermes API chat probes capture HTTP status, retry, and classify HTTP 502/503/504 plus curl timeout 28 as transient rather than failing on the first transient provider response.. Static review can verify wiring and assertions, but changed behavior depends on workflow dispatch selection, Docker/OpenShell installation, sandbox lifecycle, Hermes runtime state, generated config, `inference.local`, and live inference credentials.
**Runtime validation** — Verify generated Hermes `api_key` and `.env` `API_SERVER_KEY` are never written to artifacts while the test still fails if the generated key is missing or malformed.. Static review can verify wiring and assertions, but changed behavior depends on workflow dispatch selection, Docker/OpenShell installation, sandbox lifecycle, Hermes runtime state, generated config, `inference.local`, and live inference credentials.
**Cover auth-negative behavior around generated Hermes and compatible-provider keys** — Add negative-path assertions for missing/wrong Hermes API server keys and, for the compatible Anthropic mock mode, record and assert the expected authorization header or explicit credential behavior.
**Acceptance clause:** Refs Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 — add test evidence or identify existing coverage. The deterministic validation context reported `linkedIssues: []`, so issue Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 body/comments were not available to extract and map.
**Acceptance clause:** Add a typed live Vitest replacement for `test/e2e/test-hermes-inference-switch.sh`. — add test evidence or identify existing coverage. The PR adds `test/e2e-scenario/live/hermes-inference-switch.test.ts` and helper coverage for Docker, install, route/config, hashes, registry/session, `inference.local`, and Hermes API chat. Remaining replacement drift: live chat probes dropped legacy status/retry/transient classification, and cleanup no longer destroys the gateway or verifies registry removal.

Since last review details

Current findings:

Source-of-truth review needed: Sandbox curl payload shell quoting: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `quotePayload()` manually replaces single quotes and both curl command helpers interpolate the result into `trustedSandboxShellScript(...)`.
Source-of-truth review needed: Best-effort Hermes sandbox cleanup: The advisor marked localized patch analysis as missing.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `bestEffort()` catches with an empty `catch {}`, and cleanup only invokes NemoClaw destroy plus OpenShell sandbox delete.
Avoid artifacting the full Hermes config while asserting generated API key shape (test/e2e-scenario/live/hermes-inference-switch.test.ts:105): The test captures `/sandbox/.hermes/config.yaml` into shell-probe artifacts while separately asserting that the generated `api_key` exists. The hosted NVIDIA key is explicitly redacted and the fixture regex currently covers long `sk-...` tokens, but the generated local Hermes/API server key is not discovered and passed as an explicit redaction value. If the generated key shape or length changes, the artifact could expose a live local auth secret.
- Recommendation: Parse the needed non-sensitive config fields inside the sandbox and return only assertion output, or discover generated `api_key` and `.env` `API_SERVER_KEY` values before artifact creation and pass them as explicit `redactionValues`. At minimum, redact `api_key:` values from the config artifact while preserving the malformed/missing-key assertion.
- Evidence: `sandbox.exec(SANDBOX_NAME, ["cat", "/sandbox/.hermes/config.yaml"], { artifactName: "hermes-config-yaml", redactionValues: [apiKey] })` writes the full config, while `apiKeyShape(sandbox)` proves an `sk-...` key exists in that same file.
Avoid shell-string construction for sandbox curl payloads (test/e2e-scenario/live/hermes-inference-switch-helpers.ts:388): The in-sandbox curl probes build a JSON payload string, manually escape single quotes, and interpolate it into a `sh -lc` script. The current payload is mostly test-controlled, but `SWITCH_MODEL` can come from the environment and `trustedSandboxShellScript()` only brands non-empty strings; it does not sanitize shell syntax. This is a sandbox-boundary maintenance risk if future payload fields become less controlled.
- Recommendation: Write the JSON payload through stdin or a temp file and use `--data-binary @file`, or centralize a covered shell-quoting helper. Add regression coverage for single quotes, semicolons, command substitutions, backticks, and newlines in model/prompt values.
- Evidence: `quotePayload()` only does `payload.replace(/'/gu, ...)`, and both `inferenceLocalCommand()` and `hermesApiCommand()` interpolate the result into strings passed to `trustedSandboxShellScript(...)`.
Keep the legacy transient HTTP handling in live chat probes (test/e2e-scenario/live/hermes-inference-switch-helpers.ts:388): The Vitest replacement sends one bare curl request and then parses stdout as JSON. The legacy shell test captured HTTP status, retried three times, and treated curl timeout 28 and live-provider HTTP 502/503/504 as transient after route/config assertions had already passed. Dropping that behavior makes this live third-party inference scenario more likely to fail on provider flake rather than on the migration behavior under test.
- Recommendation: Restore the legacy status marker, retry loop, and transient classification around both `inference.local` and Hermes API chat probes, or document why the Vitest replacement should fail hard on the first transient provider response.
- Evidence: `inferenceLocalCommand()` and `hermesApiCommand()` return plain `curl -sS` commands. The retained legacy `check_inference_local()` and `check_hermes_api_chat()` used `curl -w '%{http_code}'`, retry attempts, and transient classification for 502/503/504 and curl exit 28.
Preserve cleanup failure evidence for Hermes sandbox teardown (test/e2e-scenario/live/hermes-inference-switch-helpers.ts:51): Cleanup is intentionally best-effort, but the helper catches and discards all cleanup errors. It also does not mirror the legacy script's gateway destroy and post-destroy registry removal check, so stale sandbox/gateway state can be hidden and make later live runs harder to diagnose.
- Recommendation: Keep expected not-found cleanup tolerant, but record cleanup command failures in artifacts or return a structured cleanup result. Consider mirroring the legacy `openshell gateway destroy -g nemoclaw` and registry-removal check, or document why NemoClaw destroy plus sandbox delete is sufficient for this Vitest path.
- Evidence: `bestEffort()` has an empty `catch {}` and `cleanupHermesSwitch()` only runs `node ... destroy --yes` plus `openshell sandbox delete`. The legacy shell test also runs `openshell gateway destroy -g nemoclaw` and checks that the sandbox name is no longer present in `~/.nemoclaw/sandboxes.json`.
Treat the new secret-bearing workflow job as a trusted-code boundary (.github/workflows/e2e-vitest-scenarios.yaml:777): The new free-standing job checks out branch-controlled code, installs dependencies, builds the CLI, runs the branch-controlled OpenShell installer, and then executes branch-controlled Vitest code with `NVIDIA_INFERENCE_API_KEY` available. This follows the existing live E2E pattern and uses pinned actions with `persist-credentials: false`, but it is still a new credential-bearing dispatch path.
- Recommendation: Ensure this job remains restricted to trusted-code dispatches and keep secret use as late and narrow as possible. If the workflow may be run against untrusted PR code, split static/compile validation from secret-bearing live execution or add an explicit trusted-ref guard.
- Evidence: The job runs `npm run build:cli`, `bash scripts/install-openshell.sh`, and `npx vitest ... hermes-inference-switch.test.ts` with `NVIDIA_INFERENCE_API_KEY: ${{ secrets.NVIDIA_INFERENCE_API_KEY }}`.
Cover auth-negative behavior around generated Hermes and compatible-provider keys (test/e2e-scenario/live/hermes-inference-switch-helpers.ts:131): The new scenario proves positive chat paths, but it does not verify that the Hermes API rejects missing or wrong `API_SERVER_KEY`, and the mock compatible Anthropic provider accepts requests without checking the configured credential. That limits confidence that the migration preserves credential forwarding and local API authorization boundaries.
- Recommendation: Add negative-path assertions for missing/wrong Hermes API server keys and, for the compatible Anthropic mock mode, record and assert the expected authorization header or explicit credential behavior.
- Evidence: `hermesApiCommand()` always sends `Authorization: Bearer ${API_SERVER_KEY:-}` for the positive request, while `startMockAnthropicProvider()` returns responses for `/v1/messages` without validating authorization.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-scenario/live/hermes-inference-switch.test.ts`:
- Around line 271-279: The inferenceLocalPayload and curl request are hardcoded
to use the OpenAI Chat Completions format with /v1/chat/completions endpoint,
but the test validates both anthropic-messages and openai-responses configs.
Build the probe request dynamically based on the SWITCH_API configuration being
tested instead of using a hardcoded OpenAI format, or restrict this probe to
only execute against chat-completions compatible configs to ensure the test
validates the correct endpoint and payload format for each supported switch
mode.
- Line 226: The assertion using toMatch() on model.api_key can expose the actual
API key value in test output when the assertion fails. Replace the toMatch()
assertion with a boolean predicate that checks if the API key matches the
pattern (such as using a regex test method) and assert the resulting boolean
value instead, so only true or false is printed in failed assertions rather than
the secret key itself.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ee870947-11fa-479a-bd1f-9aee2ee1dba5

📥 Commits

Reviewing files that changed from the base of the PR and between bea95ca and d965382.

📒 Files selected for processing (1)

test/e2e-scenario/live/hermes-inference-switch.test.ts

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

…es-inference-switch Signed-off-by: Carlos Villela <cvillela@nvidia.com>

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

…e-switch' into e2e-migrate/test-hermes-inference-switch Signed-off-by: Carlos Villela <cvillela@nvidia.com>

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

…es-inference-switch Signed-off-by: Carlos Villela <cvillela@nvidia.com>

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): migrate Hermes inference switch to vitest

bea95ca

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv self-assigned this Jun 19, 2026

cv mentioned this pull request Jun 19, 2026

Epic: Migrate legacy bash E2E into the Vitest E2E system #5098

Open

79 tasks

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

test(e2e): restore Hermes switch assertions

d965382

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv added the v0.0.66 Release target label Jun 19, 2026

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread test/e2e-scenario/live/hermes-inference-switch.test.ts Outdated

Comment thread test/e2e-scenario/live/hermes-inference-switch.test.ts Outdated

cv linked an issue Jun 19, 2026 that may be closed by this pull request

Epic: Migrate legacy bash E2E into the Vitest E2E system #5098

Open

79 tasks

cv added 3 commits June 19, 2026 14:07

test(e2e): cover Hermes switch API modes

ed011cd

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): avoid Hermes api key redaction false negative

3f43d8d

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

Merge remote-tracking branch 'origin/main' into e2e-migrate/test-herm…

6b6d1a9

…es-inference-switch Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv removed a link to an issue Jun 19, 2026

Epic: Migrate legacy bash E2E into the Vitest E2E system #5098

Open

79 tasks

cv added 7 commits June 19, 2026 14:44

Merge remote-tracking branch 'origin/main' into e2e-migrate/test-herm…

9bb6ab7

…es-inference-switch Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): relax workflow inventory timeout

fcedfdb

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

Merge branch 'main' into e2e-migrate/test-hermes-inference-switch

8ef4371

test(e2e): move scenario logic out of test wrapper

969a853

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

Merge remote-tracking branch 'origin/e2e-migrate/test-hermes-inferenc…

df7f8d6

…e-switch' into e2e-migrate/test-hermes-inference-switch Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): restore scenario logic to test file

c57bee3

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): move Hermes switch branches to helpers

32ce1db

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

github-code-quality Bot found potential problems Jun 20, 2026

View reviewed changes

Comment thread test/e2e-scenario/live/hermes-inference-switch-helpers.ts Fixed

github-advanced-security AI found potential problems Jun 20, 2026

View reviewed changes

Comment thread test/e2e-scenario/live/hermes-inference-switch-helpers.ts Fixed

cv and others added 2 commits June 20, 2026 11:07

Apply suggestions from code review

09b04df

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into e2e-migrate/test-herm…

c4e8b2d

…es-inference-switch Signed-off-by: Carlos Villela <cvillela@nvidia.com>

github-actions Bot mentioned this pull request Jun 20, 2026

test(e2e): migrate GPU Ollama flow to vitest #5556

Merged

14 tasks

test(e2e): add compatible Anthropic Hermes switch setup

6258816

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

Merge branch 'main' into e2e-migrate/test-hermes-inference-switch

15c05a3

cv merged commit 7bcd559 into main Jun 20, 2026
40 checks passed

cv deleted the e2e-migrate/test-hermes-inference-switch branch June 20, 2026 22:14

Conversation

cv commented Jun 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

github-code-quality Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage Overview

TypeScript / code-coverage/plugin

TypeScript / code-coverage/cli

Uh oh!

github-actions Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Vitest E2E Scenario Recommendation

Vitest E2E Scenario Advisor

Required Vitest E2E scenarios

Optional Vitest E2E scenarios

Relevant changed files

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cv commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

github-code-quality Bot commented Jun 19, 2026 •

edited

Loading

github-actions Bot commented Jun 19, 2026 •

edited

Loading

github-actions Bot commented Jun 19, 2026 •

edited

Loading

github-actions Bot commented Jun 19, 2026 •

edited

Loading