test(e2e): migrate agent turn latency to vitest by cv · Pull Request #5554 · NVIDIA/NemoClaw

cv · 2026-06-19T19:40:47Z

Summary

Migrates the real agent turn latency E2E into a typed live Vitest scenario. The new test onboards OpenClaw and Hermes against hosted inference, verifies their managed inference.local configuration, and times one real model-backed turn through each runtime.

Related Issue

Refs #5098

Changes

Add a typed live Vitest replacement for test/e2e/test-agent-turn-latency-e2e.sh.
Wire a free-standing dispatchable Vitest job into .github/workflows/e2e-vitest-scenarios.yaml.
Preserve legacy shell deletion and any legacy shell workflow cleanup for Phase 11 per Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 migration governance.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

Targeted commands run:

npx biome check --write test/e2e-scenario/live/agent-turn-latency.test.ts
NEMOCLAW_RUN_E2E_SCENARIOS=1 npx vitest run --project e2e-scenarios-live test/e2e-scenario/live/agent-turn-latency.test.ts -t __compile_only_nomatch__ --silent=false --reporter=default --passWithNoTests
npx vitest run --project e2e-vitest-support test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts
npx tsx scripts/check-test-file-size-budget.ts test/e2e-scenario/live/agent-turn-latency.test.ts
npx tsc --noEmit --strict --moduleResolution bundler --module preserve --target ES2022 --types node --allowImportingTsExtensions test/e2e-scenario/live/agent-turn-latency.test.ts
git diff --check

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

Tests
- Replaced the prior shell-based latency check with a gated Live E2E Vitest test that runs two sequential hosted-inference turns (OpenClaw, then Hermes).
- Validates routing/config, checks sandbox readiness, measures wall-clock latency against the configured limit, asserts expected model output, writes run/results artifacts, and performs best-effort sandbox cleanup.
CI/CD
- Added a dedicated free-standing Vitest CI job for the latency scenario (conditional on selected scenarios/jobs), uploads artifacts, and surfaces results in the PR’s Vitest scenario table.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

coderabbitai · 2026-06-19T19:41:02Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a new live Vitest e2e test (agent-turn-latency.test.ts) that measures agent turn latency for OpenClaw and Hermes sandboxes, including retry logic, latency assertions, and JSON artifact output. A corresponding CI workflow job (agent-turn-latency-vitest) is added and wired into the report-to-pr aggregation.

Changes

Agent Turn Latency E2E Scenario

Layer / File(s)	Summary
Test configuration and environment setup `test/e2e-scenario/live/agent-turn-latency.test.ts`	Defines repo/CLI paths, sandbox/model/provider/timeout constants from environment, and an `env()` helper that builds per-sandbox process environments with agent selection and inference API key injection.
Test utilities and validation helpers `test/e2e-scenario/live/agent-turn-latency.test.ts`	Implements test utilities (`bestEffort()`, `responseBodyAndStatus()`, `chatContent()`, `msSince()` for latency measurement), configuration validators (`assertOpenClawConfig()`, `assertHermesConfig()`), and `installSandbox()` with bounded retries and transient-failure detection.
Gated live scenario: install, execute, assert, and record latency `test/e2e-scenario/live/agent-turn-latency.test.ts`	Implements the full gated live test: orchestrates cleanup, docker check, sequential OpenClaw and Hermes sandbox install/run, per-turn latency measurement via `process.hrtime.bigint()`, integer-42 and timeout assertions, and writes `turn-latency-results.json` and legacy-path artifacts.
CI workflow job and PR report wiring `.github/workflows/e2e-vitest-scenarios.yaml`	Adds the `agent-turn-latency-vitest` job (OpenShell install, `NVIDIA_INFERENCE_API_KEY` injection, Vitest execution, artifact upload) and registers it in the `report-to-pr` `needs` array.

Sequence Diagram(s)

sequenceDiagram
  participant CI as GitHub Actions
  participant Test as agent-turn-latency.test.ts
  participant OpenClaw as OpenClaw Sandbox
  participant Hermes as Hermes Sandbox
  participant Artifacts as e2e-artifacts/

  CI->>Test: vitest run (with NVIDIA_INFERENCE_API_KEY)
  Test->>Test: shouldRunLiveE2EScenarios() gate
  Test->>OpenClaw: installSandbox() — install.sh with retries
  OpenClaw-->>Test: install success
  Test->>OpenClaw: run agent via sandbox shell (record bigint start)
  OpenClaw-->>Test: response text
  Test->>Test: assert contains 42 and within latency max
  Test->>OpenClaw: destroy sandbox
  Test->>Hermes: installSandbox() — install.sh with retries
  Hermes-->>Test: install success
  Test->>Hermes: curl /v1/chat/completions (record bigint start)
  Hermes-->>Test: JSON response
  Test->>Test: chatContent() extract, assert 42 and latency
  Test->>Artifacts: write turn-latency-results.json
  CI->>CI: upload artifacts from e2e-artifacts/vitest/agent-turn-latency/
  CI->>CI: report-to-pr includes agent-turn-latency-vitest result

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

NVIDIA/NemoClaw#5349: Adds another free-standing live Vitest job to the same e2e-vitest-scenarios.yaml workflow and extends report-to-pr.needs with the same pattern.
NVIDIA/NemoClaw#5493: Follows the same pattern — new free-standing live Vitest job in e2e-vitest-scenarios.yaml plus report-to-pr needs update — for a different scenario.
NVIDIA/NemoClaw#5494: Adds a standalone live Vitest job to the same workflow and wires it into report-to-pr for PR result aggregation.

Suggested labels

area: e2e, chore

Poem

🐇 Hop hop, the latency clock ticks away,
OpenClaw and Hermes both tested today.
Install, retry, assert — does 42 appear?
The bigint timer marks each millisecond clear.
CI uploads the proof, the PR table glows,
A bunny checks the workflow — and off it goes! 🚀

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'test(e2e): migrate agent turn latency to vitest' directly and concisely describes the main change: migrating an E2E test from shell script to Vitest.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch e2e-migrate/test-agent-turn-latency

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-code-quality · 2026-06-19T19:41:31Z

Code Coverage Overview

Languages: TypeScript

TypeScript / code-coverage/plugin

The overall coverage in the e2e-migrate/test-age... branch is 96%. Coverage data for the main branch is not yet available.

Show a code coverage summary of the most covered files.

File	main	e2e-migrate/test-age... `9d3e26e`	+/-
`nemoclaw/src/se...cret-scanner.ts`	—	100%	—
`nemoclaw/src/commands/slash.ts`	—	100%	—
`nemoclaw/src/li...bprocess-env.ts`	—	100%	—
`nemoclaw/src/bl...eprint/state.ts`	—	98%	—
`nemoclaw/src/onboard/config.ts`	—	98%	—
`nemoclaw/src/bl...int/snapshot.ts`	—	97%	—
`nemoclaw/src/bl...print/runner.ts`	—	95%	—
`nemoclaw/src/co...ration-state.ts`	—	94%	—
`nemoclaw/src/bl...ate-networks.ts`	—	94%	—
`nemoclaw/src/index.ts`	—	94%	—

TypeScript / code-coverage/cli

The overall coverage in the e2e-migrate/test-age... branch is 46%. Coverage data for the main branch is not yet available.

Show a code coverage summary of the most covered files.

File	main	e2e-migrate/test-age... `9d3e26e`	+/-
`src/lib/state/o...oard-session.ts`	—	90%	—
`src/lib/inference/local.ts`	—	76%	—
`src/lib/sandbox/config.ts`	—	72%	—
`src/lib/actions...dbox/rebuild.ts`	—	67%	—
`src/lib/onboard/preflight.ts`	—	64%	—
`src/lib/actions...licy-channel.ts`	—	56%	—
`src/lib/state/sandbox.ts`	—	55%	—
`src/lib/policy/index.ts`	—	49%	—
`src/lib/onboard...er-gpu-patch.ts`	—	44%	—
`src/lib/onboard.ts`	—	18%	—

_{Updated June 20, 2026 18:51 UTC
Code Coverage is in Public Preview. Learn more and provide us with your feedback.}

github-actions · 2026-06-19T19:41:47Z

E2E Advisor Recommendation

Required E2E: agent-turn-latency
Optional E2E: inference-routing, hermes-e2e

Dispatch hint: agent-turn-latency-vitest

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

agent-turn-latency (high; live hosted inference with two sandbox installs, timeout 110 minutes): Directly validates the newly added workflow job and live scenario covering two real sandboxes, hosted inference routing, OpenClaw agent turn latency, and Hermes API turn latency.

Optional E2E

inference-routing (medium): Useful adjacent confidence for provider route classification and inference.local behavior used by the new latency scenario.
hermes-e2e (high): Adjacent confidence for the Hermes onboarding and live inference path also exercised by the new agent-turn-latency scenario.

New E2E recommendations

None.

Dispatch hint

Workflow: .github/workflows/e2e-vitest-scenarios.yaml
jobs input: agent-turn-latency-vitest

github-actions · 2026-06-19T19:41:48Z

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: agent-turn-latency-vitest
Optional Vitest E2E scenarios: None

Dispatch required Vitest E2E scenarios:

gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=agent-turn-latency-vitest

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

agent-turn-latency-vitest: Focused free-standing Vitest job wired for changed live test test/e2e-scenario/live/agent-turn-latency.test.ts.
- Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=agent-turn-latency-vitest

Optional Vitest E2E scenarios

None.

Relevant changed files

.github/workflows/e2e-vitest-scenarios.yaml
test/e2e-scenario/live/agent-turn-latency-helpers.ts
test/e2e-scenario/live/agent-turn-latency.test.ts

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-scenario/live/agent-turn-latency.test.ts`:
- Around line 157-165: The openclawConfig variable captures raw config output
without redacting sensitive credentials, creating a potential API key leakage
risk into artifacts. The Hermes config path already implements redaction of
apiKey values, but openclawConfig does not. Add similar redaction logic to
openclawConfig to remove or mask sensitive fields like apiKey before the output
is captured in the artifact, following the same redaction pattern that is
already applied to the Hermes config to prevent credentials from leaking into
logs and artifacts.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e9e10c9c-b55e-4d56-96da-a59a80764c50

📥 Commits

Reviewing files that changed from the base of the PR and between 9d4d1f5 and 681deec.

📒 Files selected for processing (2)

.github/workflows/e2e-vitest-scenarios.yaml
test/e2e-scenario/live/agent-turn-latency.test.ts

github-actions · 2026-06-19T19:45:31Z

PR Review Advisor

Findings: 0 needs attention, 13 worth checking, 0 nice ideas
Since last review: 1 prior item resolved, 8 still apply, 0 new items found

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Source-of-truth review needed: Hosted inference compatibility setup: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: MODEL, PROVIDER, EXPECTED_ROUTE_PROVIDER, and env() do not propagate ci-compatible-inference helper outputs.
Source-of-truth review needed: OpenClaw JSON assistant-text extraction: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: collectAssistantText() omits retained keys such as reasoning_content, payload, response, data, output, outputs, items, and segments.
Source-of-truth review needed: Hermes readiness recovery: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: New hermesHealth is a single curl -sf --max-time 10 with only exit-code assertion.
Source-of-truth review needed: Transient install retry cleanup: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: installSandbox() retries after isTransientProviderValidationFailure() without invoking cleanupTurnSandboxes() or retained forward/gateway cleanup.
Source-of-truth review needed: bestEffort cleanup swallowing: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: bestEffort() catches all errors and cleanupTurnSandboxes() uses it for both destroy and sandbox delete.
Port the retained hosted-inference trust-boundary setup (test/e2e-scenario/live/agent-turn-latency-helpers.ts:24): The migrated helper derives model, provider, and expected route directly and only maps NVIDIA key aliases into the install environment. The retained shell source configures compatible hosted inference before deriving these values, including endpoint URL, compatible credential variable, compatible model, preferred API, provider key, and expected route provider. This can make the new secret-bearing live test validate a different hosted-inference boundary than the shell scenario it claims to replace.
- Recommendation: Reuse or port the retained compatible-inference decision logic so NEMOCLAW_ENDPOINT_URL, COMPATIBLE_API_KEY, NEMOCLAW_COMPAT_MODEL, NEMOCLAW_PREFERRED_API, model, provider, and expected route provider are propagated consistently. Add focused coverage for compatible-hosted mode and default-provider/model selection.
- Evidence: Retained test/e2e/test-agent-turn-latency-e2e.sh calls nemoclaw_e2e_configure_compatible_inference before deriving TURN_MODEL, TURN_PROVIDER_KEY, and EXPECTED_ROUTE_PROVIDER. New MODEL, PROVIDER, EXPECTED_ROUTE_PROVIDER, and env() do not use ci-compatible-inference helper outputs.
Harden Hermes local API token handling in artifacts (test/e2e-scenario/live/agent-turn-latency-helpers.ts:280): The Hermes command sources /sandbox/.hermes/.env and sends API_SERVER_KEY as a bearer token, but the probe explicitly redacts only the hosted inference key. The centralized redaction regex should catch common Bearer-token echoes, but a sandbox-local token echoed in an unexpected or short shape could still be preserved in uploaded artifacts. The migrated command also sends an Authorization header even when API_SERVER_KEY is empty, unlike the retained shell.
- Recommendation: Preserve the retained conditional Authorization header behavior. If Hermes response bodies remain artifacted, capture the sandbox-local API_SERVER_KEY through a redaction-safe path and add it to explicit redactionValues. Add a regression case where Hermes or middleware echoes the local token.
- Evidence: hermesTurnCommand() builds -H "Authorization: Bearer ${API_SERVER_KEY:-}" unconditionally. The hermes-api-turn invocation passes redactionValues: [apiKey], not the sandbox-local API_SERVER_KEY.
Restore clean sandbox, gateway, and forwarding cleanup before install retries (test/e2e-scenario/live/agent-turn-latency-helpers.ts:187): The retained shell scenario pre-cleans sandbox and gateway state before the first install and before retrying transient provider validation failures. The migrated installSandbox() sleeps and retries without running the retained cleanup sequence, and cleanupTurnSandboxes() does not stop forwarding or destroy the OpenShell gateway. In a credential-bearing live path, this weakens confidence that retries start from a clean sandbox, gateway, forwarding, and credential-artifact state.
- Recommendation: Run the full retained cleanup sequence before each retry, or document and test that NEMOCLAW_RECREATE_SANDBOX=1 fully clears failed-install sandbox, gateway, forwarding, and credential artifacts. Include forward-stop and gateway-destroy behavior where needed, and avoid swallowing cleanup failures without evidence.
- Evidence: Retained destroy_sandbox stops openshell forward 8642, deletes the sandbox, destroys the nemoclaw gateway, and invokes both normal and Hermes destroy paths. New installSandbox() retries after isTransientProviderValidationFailure() without cleanup, and cleanupTurnSandboxes() only calls node CLI destroy and openshell sandbox delete inside bestEffort().
Restore OpenClaw provider, transport, and policy-error rejection (test/e2e-scenario/live/agent-turn-latency.test.ts:83): The retained OpenClaw turn fails if combined stdout/stderr contains provider, transport, or policy-error signatures. The migrated test accepts a zero exit code and a parsed 42 reply without checking those signatures, so a successful-looking turn accompanied by SSRF/network-policy/provider errors could pass.
- Recommendation: Carry over the retained provider/transport error signature check against the captured OpenClaw turn output before accepting the 42 reply. Add a focused negative test where output includes 42 plus a retained error signature.
- Evidence: Retained run_openclaw_turn greps combined stdout/stderr for SsrFBlockedError, transport error, ECONNREFUSED, EAI_AGAIN, gateway unavailable, and network connection error. New assertions check only exitCode, extractOpenClawAgentText(stdout), and elapsed time.
Restore Hermes readiness retry and ok-body validation (test/e2e-scenario/live/agent-turn-latency.test.ts:104): The retained shell scenario polls Hermes health up to ten times and requires the response body to contain ok before posting the chat completion request. The migrated test performs a single curl -sf and checks only the exit code, which can make the live scenario flaky after install or proceed on a non-ok 2xx health response.
- Recommendation: Restore a bounded readiness loop that retries /health, sleeps between attempts, and requires an ok response body before the Hermes chat request. Cover initial failures and non-ok 2xx responses followed by a successful ok response.
- Evidence: Retained assert_hermes_health loops over attempts 1..10 with sleep 5 and grep -qi '"ok"'. New hermesHealth is one curl -sf --max-time 10 followed by an exit-code assertion.
Port the retained OpenClaw JSON assistant-text parser (test/e2e-scenario/live/agent-turn-latency-helpers.ts:119): The migrated OpenClaw parser accepts fewer retained envelope shapes and has no focused tests proving it avoids metadata-only false positives. This can fail valid OpenClaw output accepted by the shell source, or miss visible assistant text when OpenClaw emits a supported compatibility envelope.
- Recommendation: Port the retained openclaw-json.sh behavior into extractOpenClawAgentText(), or document why omitted envelope shapes are no longer valid. Add focused tests for all accepted envelope shapes and for metadata-only 42 not satisfying the reply assertion.
- Evidence: Retained openclaw-json.sh accepts text, content, and reasoning_content and recurses through result, payloads, payload, messages, choices, response, data, output, outputs, items, segments, and delta. New collectAssistantText() walks result, payloads, messages, choices, message, delta, content, and text.
Add a dedicated workflow-boundary check for the new secret-bearing job (.github/workflows/e2e-vitest-scenarios.yaml:665): The new agent-turn-latency-vitest job is secret-bearing and runs installer, OpenShell, sandbox lifecycle, and hosted-inference code. It follows several good workflow patterns, but the workflow-boundary source only applies generic free-standing checks and does not validate this job's exact secret scope, checkout credential persistence, dependency install command, OpenShell install boundary, artifact settings, timeout, stable artifact path, and report coverage.
- Recommendation: Add a validateAgentTurnLatencyVitestJob-style boundary check and support-test coverage mirroring comparable high-risk free-standing jobs. Assert the expected env, pinned actions, persist-credentials=false, npm ci --ignore-scripts, no secrets outside the Vitest step, artifact upload settings, and report-to-pr needs entry.
- Evidence: tools/e2e-scenarios/workflow-boundary.mts has dedicated validators for many comparable jobs, but grep found no agent-turn-latency-specific validator. The workflow diff adds NVIDIA_INFERENCE_API_KEY to the Vitest step.
Add focused support coverage for migration fidelity and secret-bearing branches (test/e2e-scenario/live/agent-turn-latency.test.ts:1): This PR adds the live scenario and workflow job, but not focused support tests for the helper branches and retained-shell guardrails that changed during migration. Static review can compare the code, but behavior depends on Docker, OpenShell, installer state, hosted inference, daemon readiness, artifact redaction, and sandbox/gateway cleanup.
- Recommendation: Add behavior-specific support or integration tests for hosted-compatible inference env mapping, OpenClaw JSON envelope compatibility, Hermes health retry/body validation, transient retry cleanup, Hermes local API token redaction/conditional header behavior, OpenClaw transport-error rejection, and agent-turn-latency workflow-boundary invariants.
- Evidence: New tolerant/recovery paths include positiveInt(), firstJsonObject(), collectAssistantText(), bestEffort(), transient install retry handling, and hermesTurnCommand(), while the PR only adds the live scenario and workflow job.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — Hosted-compatible inference env mapping uses NEMOCLAW_ENDPOINT_URL, COMPATIBLE_API_KEY, NEMOCLAW_COMPAT_MODEL, NEMOCLAW_PREFERRED_API, model, provider, and expected route provider.. The PR changes a secret-bearing live workflow and two real sandbox hosted-inference paths. Static review can assess structure and migration fidelity, but behavior depends on Docker, OpenShell, installer state, hosted inference, daemon readiness, artifact redaction, sandbox/gateway cleanup, and network-policy error surfaces.
**Runtime validation** — OpenClaw agent output containing 42 plus SsrFBlockedError or a retained transport error signature is rejected.. The PR changes a secret-bearing live workflow and two real sandbox hosted-inference paths. Static review can assess structure and migration fidelity, but behavior depends on Docker, OpenShell, installer state, hosted inference, daemon readiness, artifact redaction, sandbox/gateway cleanup, and network-policy error surfaces.
**Runtime validation** — Hermes health readiness retries initial failures and rejects non-ok 2xx responses before accepting ok.. The PR changes a secret-bearing live workflow and two real sandbox hosted-inference paths. Static review can assess structure and migration fidelity, but behavior depends on Docker, OpenShell, installer state, hosted inference, daemon readiness, artifact redaction, sandbox/gateway cleanup, and network-policy error surfaces.
**Runtime validation** — Transient install retry runs full sandbox, forward, and gateway cleanup before retry.. The PR changes a secret-bearing live workflow and two real sandbox hosted-inference paths. Static review can assess structure and migration fidelity, but behavior depends on Docker, OpenShell, installer state, hosted inference, daemon readiness, artifact redaction, sandbox/gateway cleanup, and network-policy error surfaces.
**Runtime validation** — Hermes turn omits Authorization when API_SERVER_KEY is empty and redacts echoed API_SERVER_KEY when present.. The PR changes a secret-bearing live workflow and two real sandbox hosted-inference paths. Static review can assess structure and migration fidelity, but behavior depends on Docker, OpenShell, installer state, hosted inference, daemon readiness, artifact redaction, sandbox/gateway cleanup, and network-policy error surfaces.
**Add focused support coverage for migration fidelity and secret-bearing branches** — Add behavior-specific support or integration tests for hosted-compatible inference env mapping, OpenClaw JSON envelope compatibility, Hermes health retry/body validation, transient retry cleanup, Hermes local API token redaction/conditional header behavior, OpenClaw transport-error rejection, and agent-turn-latency workflow-boundary invariants.
**Hosted inference compatibility setup** — Missing coverage for compatible-hosted mode env mapping and expected route provider.. MODEL, PROVIDER, EXPECTED_ROUTE_PROVIDER, and env() do not propagate ci-compatible-inference helper outputs.
**OpenClaw JSON assistant-text extraction** — Missing focused tests for retained text/content/reasoning_content fields, all retained container keys, and metadata-only 42 rejection.. collectAssistantText() omits retained keys such as reasoning_content, payload, response, data, output, outputs, items, and segments.

Since last review details

Current findings:

Source-of-truth review needed: Hosted inference compatibility setup: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: MODEL, PROVIDER, EXPECTED_ROUTE_PROVIDER, and env() do not propagate ci-compatible-inference helper outputs.
Source-of-truth review needed: OpenClaw JSON assistant-text extraction: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: collectAssistantText() omits retained keys such as reasoning_content, payload, response, data, output, outputs, items, and segments.
Source-of-truth review needed: Hermes readiness recovery: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: New hermesHealth is a single curl -sf --max-time 10 with only exit-code assertion.
Source-of-truth review needed: Transient install retry cleanup: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: installSandbox() retries after isTransientProviderValidationFailure() without invoking cleanupTurnSandboxes() or retained forward/gateway cleanup.
Source-of-truth review needed: bestEffort cleanup swallowing: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: bestEffort() catches all errors and cleanupTurnSandboxes() uses it for both destroy and sandbox delete.
Port the retained hosted-inference trust-boundary setup (test/e2e-scenario/live/agent-turn-latency-helpers.ts:24): The migrated helper derives model, provider, and expected route directly and only maps NVIDIA key aliases into the install environment. The retained shell source configures compatible hosted inference before deriving these values, including endpoint URL, compatible credential variable, compatible model, preferred API, provider key, and expected route provider. This can make the new secret-bearing live test validate a different hosted-inference boundary than the shell scenario it claims to replace.
- Recommendation: Reuse or port the retained compatible-inference decision logic so NEMOCLAW_ENDPOINT_URL, COMPATIBLE_API_KEY, NEMOCLAW_COMPAT_MODEL, NEMOCLAW_PREFERRED_API, model, provider, and expected route provider are propagated consistently. Add focused coverage for compatible-hosted mode and default-provider/model selection.
- Evidence: Retained test/e2e/test-agent-turn-latency-e2e.sh calls nemoclaw_e2e_configure_compatible_inference before deriving TURN_MODEL, TURN_PROVIDER_KEY, and EXPECTED_ROUTE_PROVIDER. New MODEL, PROVIDER, EXPECTED_ROUTE_PROVIDER, and env() do not use ci-compatible-inference helper outputs.
Harden Hermes local API token handling in artifacts (test/e2e-scenario/live/agent-turn-latency-helpers.ts:280): The Hermes command sources /sandbox/.hermes/.env and sends API_SERVER_KEY as a bearer token, but the probe explicitly redacts only the hosted inference key. The centralized redaction regex should catch common Bearer-token echoes, but a sandbox-local token echoed in an unexpected or short shape could still be preserved in uploaded artifacts. The migrated command also sends an Authorization header even when API_SERVER_KEY is empty, unlike the retained shell.
- Recommendation: Preserve the retained conditional Authorization header behavior. If Hermes response bodies remain artifacted, capture the sandbox-local API_SERVER_KEY through a redaction-safe path and add it to explicit redactionValues. Add a regression case where Hermes or middleware echoes the local token.
- Evidence: hermesTurnCommand() builds -H "Authorization: Bearer ${API_SERVER_KEY:-}" unconditionally. The hermes-api-turn invocation passes redactionValues: [apiKey], not the sandbox-local API_SERVER_KEY.
Restore clean sandbox, gateway, and forwarding cleanup before install retries (test/e2e-scenario/live/agent-turn-latency-helpers.ts:187): The retained shell scenario pre-cleans sandbox and gateway state before the first install and before retrying transient provider validation failures. The migrated installSandbox() sleeps and retries without running the retained cleanup sequence, and cleanupTurnSandboxes() does not stop forwarding or destroy the OpenShell gateway. In a credential-bearing live path, this weakens confidence that retries start from a clean sandbox, gateway, forwarding, and credential-artifact state.
- Recommendation: Run the full retained cleanup sequence before each retry, or document and test that NEMOCLAW_RECREATE_SANDBOX=1 fully clears failed-install sandbox, gateway, forwarding, and credential artifacts. Include forward-stop and gateway-destroy behavior where needed, and avoid swallowing cleanup failures without evidence.
- Evidence: Retained destroy_sandbox stops openshell forward 8642, deletes the sandbox, destroys the nemoclaw gateway, and invokes both normal and Hermes destroy paths. New installSandbox() retries after isTransientProviderValidationFailure() without cleanup, and cleanupTurnSandboxes() only calls node CLI destroy and openshell sandbox delete inside bestEffort().
Restore OpenClaw provider, transport, and policy-error rejection (test/e2e-scenario/live/agent-turn-latency.test.ts:83): The retained OpenClaw turn fails if combined stdout/stderr contains provider, transport, or policy-error signatures. The migrated test accepts a zero exit code and a parsed 42 reply without checking those signatures, so a successful-looking turn accompanied by SSRF/network-policy/provider errors could pass.
- Recommendation: Carry over the retained provider/transport error signature check against the captured OpenClaw turn output before accepting the 42 reply. Add a focused negative test where output includes 42 plus a retained error signature.
- Evidence: Retained run_openclaw_turn greps combined stdout/stderr for SsrFBlockedError, transport error, ECONNREFUSED, EAI_AGAIN, gateway unavailable, and network connection error. New assertions check only exitCode, extractOpenClawAgentText(stdout), and elapsed time.
Restore Hermes readiness retry and ok-body validation (test/e2e-scenario/live/agent-turn-latency.test.ts:104): The retained shell scenario polls Hermes health up to ten times and requires the response body to contain ok before posting the chat completion request. The migrated test performs a single curl -sf and checks only the exit code, which can make the live scenario flaky after install or proceed on a non-ok 2xx health response.
- Recommendation: Restore a bounded readiness loop that retries /health, sleeps between attempts, and requires an ok response body before the Hermes chat request. Cover initial failures and non-ok 2xx responses followed by a successful ok response.
- Evidence: Retained assert_hermes_health loops over attempts 1..10 with sleep 5 and grep -qi '"ok"'. New hermesHealth is one curl -sf --max-time 10 followed by an exit-code assertion.
Port the retained OpenClaw JSON assistant-text parser (test/e2e-scenario/live/agent-turn-latency-helpers.ts:119): The migrated OpenClaw parser accepts fewer retained envelope shapes and has no focused tests proving it avoids metadata-only false positives. This can fail valid OpenClaw output accepted by the shell source, or miss visible assistant text when OpenClaw emits a supported compatibility envelope.
- Recommendation: Port the retained openclaw-json.sh behavior into extractOpenClawAgentText(), or document why omitted envelope shapes are no longer valid. Add focused tests for all accepted envelope shapes and for metadata-only 42 not satisfying the reply assertion.
- Evidence: Retained openclaw-json.sh accepts text, content, and reasoning_content and recurses through result, payloads, payload, messages, choices, response, data, output, outputs, items, segments, and delta. New collectAssistantText() walks result, payloads, messages, choices, message, delta, content, and text.
Add a dedicated workflow-boundary check for the new secret-bearing job (.github/workflows/e2e-vitest-scenarios.yaml:665): The new agent-turn-latency-vitest job is secret-bearing and runs installer, OpenShell, sandbox lifecycle, and hosted-inference code. It follows several good workflow patterns, but the workflow-boundary source only applies generic free-standing checks and does not validate this job's exact secret scope, checkout credential persistence, dependency install command, OpenShell install boundary, artifact settings, timeout, stable artifact path, and report coverage.
- Recommendation: Add a validateAgentTurnLatencyVitestJob-style boundary check and support-test coverage mirroring comparable high-risk free-standing jobs. Assert the expected env, pinned actions, persist-credentials=false, npm ci --ignore-scripts, no secrets outside the Vitest step, artifact upload settings, and report-to-pr needs entry.
- Evidence: tools/e2e-scenarios/workflow-boundary.mts has dedicated validators for many comparable jobs, but grep found no agent-turn-latency-specific validator. The workflow diff adds NVIDIA_INFERENCE_API_KEY to the Vitest step.
Add focused support coverage for migration fidelity and secret-bearing branches (test/e2e-scenario/live/agent-turn-latency.test.ts:1): This PR adds the live scenario and workflow job, but not focused support tests for the helper branches and retained-shell guardrails that changed during migration. Static review can compare the code, but behavior depends on Docker, OpenShell, installer state, hosted inference, daemon readiness, artifact redaction, and sandbox/gateway cleanup.
- Recommendation: Add behavior-specific support or integration tests for hosted-compatible inference env mapping, OpenClaw JSON envelope compatibility, Hermes health retry/body validation, transient retry cleanup, Hermes local API token redaction/conditional header behavior, OpenClaw transport-error rejection, and agent-turn-latency workflow-boundary invariants.
- Evidence: New tolerant/recovery paths include positiveInt(), firstJsonObject(), collectAssistantText(), bestEffort(), transient install retry handling, and hermesTurnCommand(), while the PR only adds the live scenario and workflow job.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

…t-turn-latency Signed-off-by: Carlos Villela <cvillela@nvidia.com>

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

…ncy' into e2e-migrate/test-agent-turn-latency Signed-off-by: Carlos Villela <cvillela@nvidia.com>

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

…t-turn-latency Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): migrate agent turn latency to vitest

681deec

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv self-assigned this Jun 19, 2026

cv mentioned this pull request Jun 19, 2026

Epic: Migrate legacy bash E2E into the Vitest E2E system #5098

Open

79 tasks

github-advanced-security AI found potential problems Jun 19, 2026

View reviewed changes

Comment thread test/e2e-scenario/live/agent-turn-latency.test.ts Fixed

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread test/e2e-scenario/live/agent-turn-latency.test.ts

test(e2e): restore turn latency assertions

e38b9bf

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv added the v0.0.66 Release target label Jun 19, 2026

cv linked an issue Jun 19, 2026 that may be closed by this pull request

Epic: Migrate legacy bash E2E into the Vitest E2E system #5098

Open

79 tasks

cv added 3 commits June 19, 2026 14:07

test(e2e): tighten turn latency route checks

5d834a9

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): mirror OpenClaw turn parsing

a65cbbd

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

Merge remote-tracking branch 'origin/main' into e2e-migrate/test-agen…

7bc1484

…t-turn-latency Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv removed a link to an issue Jun 19, 2026

Epic: Migrate legacy bash E2E into the Vitest E2E system #5098

Open

79 tasks

cv added 7 commits June 19, 2026 14:44

Merge remote-tracking branch 'origin/main' into e2e-migrate/test-agen…

afe6ea3

…t-turn-latency Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): relax workflow inventory timeout

2d97c65

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

Merge branch 'main' into e2e-migrate/test-agent-turn-latency

48208e9

test(e2e): move scenario logic out of test wrapper

99e524f

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

Merge remote-tracking branch 'origin/e2e-migrate/test-agent-turn-late…

b976ac6

…ncy' into e2e-migrate/test-agent-turn-latency Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): restore scenario logic to test file

81df3b4

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): move turn latency branches to helpers

31edaa7

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

github-code-quality Bot found potential problems Jun 20, 2026

View reviewed changes

Comment thread test/e2e-scenario/live/agent-turn-latency-helpers.ts Fixed

github-advanced-security AI found potential problems Jun 20, 2026

View reviewed changes

Comment thread test/e2e-scenario/live/agent-turn-latency-helpers.ts Fixed

cv added 2 commits June 20, 2026 11:34

test(e2e): remove unused turn latency import

29cd6d9

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

Merge remote-tracking branch 'origin/main' into e2e-migrate/test-agen…

9d3e26e

…t-turn-latency Signed-off-by: Carlos Villela <cvillela@nvidia.com>

github-actions Bot mentioned this pull request Jun 20, 2026

test(e2e): migrate GPU Ollama flow to vitest #5556

Open

14 tasks

Conversation

cv commented Jun 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

github-code-quality Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage Overview

TypeScript / code-coverage/plugin

TypeScript / code-coverage/cli

Uh oh!

github-actions Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Vitest E2E Scenario Recommendation

Vitest E2E Scenario Advisor

Required Vitest E2E scenarios

Optional Vitest E2E scenarios

Relevant changed files

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cv commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

github-code-quality Bot commented Jun 19, 2026 •

edited

Loading

github-actions Bot commented Jun 19, 2026 •

edited

Loading

github-actions Bot commented Jun 19, 2026 •

edited

Loading

github-actions Bot commented Jun 19, 2026 •

edited

Loading