diff --git a/.agents/skills/onboarding-test/SKILL.md b/.agents/skills/onboarding-test/SKILL.md index dc18828c1..8aae0ddec 100644 --- a/.agents/skills/onboarding-test/SKILL.md +++ b/.agents/skills/onboarding-test/SKILL.md @@ -1,9 +1,22 @@ --- name: onboarding-test -description: Pre-release onboarding test via Chrome browser automation. Tests the full new-user flow — provider selection, workflow picker, and streaming all three starter workflows. Use when asked to test onboarding, first-run experience, or starter workflows. +description: Plain-English browser walkthrough for pre-release onboarding verification. Drive a real Chrome via MCP, eyeball the starter workflows, confirm the product *feels* right. Use for human-in-the-loop sanity passes before a tag. NOT a substitute for the automated product-tests suite — that's the CI gate. --- -# Onboarding Browser Test +# Onboarding Browser Test (human verification) + +## When to use this skill vs the automated suite + +This repo has two test surfaces with different jobs: + +| You want to answer... | Use... | +|---|---| +| "Is this correct and is it regressing product quality?" (every PR, machine-readable) | `product-tests/` — pytest + Playwright + retry-counter gates. Run it in CI and locally. See `product-tests/WRITING_TESTS.md`. | +| "Does it *feel* right?" (pre-release, eyeballs-on, capture institutional knowledge in plain English) | This skill. | + +Keep both. They complement each other: the automated suite catches regressions the moment they land; this skill catches the "looked green in CI, still feels broken when a human uses it" class — and the plain-English walkthrough here is the documentation new team members read to understand the product's shape. + +If you're here to **add a regression test for a past bug**, you want the `product-test-writer` skill, not this one. ## Prerequisites @@ -35,13 +48,13 @@ Navigate to `http://localhost:8080`. The onboarding screens appear in this order - **Mythical Creature** (Style LoRA) - **Dissolving Sunflower** (Depth Map) - **LTX 2.3** (Text to Video) - + Select one, click **Get Started**. 5. **Graph editor with onboarding tooltips** — Two tooltip popups appear sequentially over the Sink/Run area: - Tooltip 1: "Click Play to start generation" (1 of 2) — click **Next** - Tooltip 2: "Explore Workflows" (2 of 2) — click **Done** - + **IMPORTANT:** These tooltips intercept clicks on the Run button. You MUST dismiss both tooltips (using `read_page` to find the Next/Done button refs) BEFORE clicking Run. 6. **Click Run** — use `read_page(filter="interactive")` to find the Run button ref and click it. Do NOT click by coordinates near the tooltip area. @@ -65,9 +78,26 @@ Click **Workflows** in the top nav bar to reopen the workflow panel. The "Gettin | Dissolving Sunflower | Source, video-depth-anything, VACE, LoRA, longlive, rife, Sink | Depth map, video input | | LTX 2.3 | Primitive (String), ltx2, Sink | Text-to-video, no Source node | +## What to look for (eyeballs, not selectors) + +These are the things the automated suite cannot catch: + +- Do the loading states feel responsive, or do they just... sit there? +- Are error messages legible when things fail? Does the user know what to do next? +- When the first frame lands, does it look *right* for the workflow? (The automated test confirms "a frame rendered"; you confirm "it looks like a mythical creature, not noise.") +- Does switching workflows feel snappy or does the UI hang visibly? +- Tooltip ordering, z-index quirks, focus rings — anything that a human would call out in a design review. + +Write up a short note per run: what you tested, what felt off, what matches expectations. This is the plain-English institutional knowledge the automated suite cannot replace. + ## Cleanup ```bash lsof -ti:8080 | xargs kill -9 2>/dev/null rm -rf /tmp/scope-onboarding-test ``` + +## See also + +- `product-tests/WRITING_TESTS.md` — how to encode what you observed into a runnable regression test. +- `.agents/skills/product-test-writer/` — Claude skill that writes those regression tests from a plain-English bug description. diff --git a/.agents/skills/product-test-writer/SKILL.md b/.agents/skills/product-test-writer/SKILL.md new file mode 100644 index 000000000..cc3a77bfd --- /dev/null +++ b/.agents/skills/product-test-writer/SKILL.md @@ -0,0 +1,264 @@ +--- +name: product-test-writer +description: Turn a plain-English bug description (or PR URL) into a runnable regression test under product-tests/regression/. Use when asked to write a regression, add a test for a past bug, reproduce an issue, or "add a product-test for #NNN". +--- + +# Product Test Writer + +## What this skill does + +You are given a plain-English description of a past bug (often: a PR number, a Linear issue, a Slack message). You produce **one file** at `product-tests/regression/test_pr__.py` that: + +1. Documents the bug in its docstring (what the user did, what should have happened, what did, root cause, fix). +2. Uses the `@scenario` decorator — never raw fixtures. +3. Drives the reproduction via the `ctx` API. +4. Relies on the decorator's automatic gates for assertion (retries, unexpected closes, UI errors). + +If the bug needs a different mode, different workflow, or a non-default timeout, say so in the code — not in a separate doc. + +## Before writing anything + +1. **Read the bug context.** If the user gave a PR number, `gh pr view ` it. If they gave a Linear ticket, ask them to paste the description. If they gave a brief sentence, ask 1–2 clarifying questions only if the mode/workflow/repro would be genuinely ambiguous. +2. **Read `product-tests/WRITING_TESTS.md`.** That's the source-of-truth for the `ctx` surface, testid map, and gotchas. It may have been updated since this skill was written. +3. **Grep for a similar existing test.** `product-tests/regression/` probably has one; `product-tests/scenarios/` might. If one already covers this failure mode, extend or dedupe — don't duplicate. + +## The decision tree + +| Question | If yes | If no | +|---|---|---| +| Does the bug only repro in cloud mode? | `mode="cloud"` | `mode="local"` (default; keeps PR ring fast) | +| Is it workflow-specific (a particular pipeline)? | `workflow="starter-..."` | `workflow="local-passthrough"` | +| Does it need chaotic timing to trigger? | Add `pytest.mark.chaos` and use `ctx.chaos()` | Linear reproduction in the body | +| Was the symptom a 5xx / crash? | Default gates catch it | Default gates catch it | +| Was the symptom silently-wrong output (no crash)? | Add an explicit assertion (e.g. compare `ctx.metrics()` or read a frame) | — | +| Is the symptom about **how it looks** — UI layout, cut-off element, tooltip mispositioned, error toast copy, workflow card missing, stream output showing black/frozen/pixelated frames, recorded MP4 showing visible artifacts? | Add `@pytest.mark.multimodal` **and** `feature="ui"` (or `"recording"`), capture artifacts via `ctx.screenshot_testid()` / `ctx.capture_live_frame()` / `harness.media.sample_frames()`, then assert with `ctx.multimodal_check(imgs, question=...)`. See the **Multimodal patterns** section below. | Regular testid / metric asserts are enough | + +### Pick the right feature tag + +Every `@scenario` test should carry a `feature=` kwarg so `pytest -m ` +slices the right tests in the feature index. Canonical set: + +| Feature | When to use | +|---|---| +| `onboarding` | Provider pick, telemetry, workflow picker, tour, state persistence | +| `recording` | Record-node start/stop, download, timestamp/FPS correctness | +| `params` | Parameter updates — HTTP API, schema, round-trip, spam | +| `lifecycle` | Stream start/stop/restart, session teardown, cycle tests | +| `networking` | Cloud connectivity, offline cycles, retry-counter behavior | +| `input` | Camera / video-file / NDI source switching, device-lost | +| `graph` | Graph editor, node mutation, workflow switching | +| `ui` | UI chrome — toolbars, modals, tooltips, error toasts, visuals | + +Pass multiple when applicable: `feature=("ui", "onboarding")`. + +## The template (copy this, then fill in) + +```python +"""Regression for #: . + +- What the user did: +- What should happen: +- What did happen: +- Root cause: +- Fix: +""" + +from __future__ import annotations + +from harness.scenario import scenario + + +@scenario(mode="local", workflow="local-passthrough") +def test_pr__(ctx): + ctx.complete_onboarding() + ctx.run_and_wait_first_frame() + + # -- reproduction -- + # Replace with the precise actions that reproduced the bug, using + # ctx helpers (not raw page/driver) so stops are properly attributed. + pass +``` + +## `ctx` surface you can use (memorize these, don't invent new ones) + +| Action | Call | +|---|---| +| Onboard to graph view | `ctx.complete_onboarding()` | +| Run + wait first frame (records `first_frame_time_ms`) | `ctx.run_and_wait_first_frame(timeout_ms=60_000)` | +| Stop cleanly (marks + clicks, idempotent) | `ctx.stop_stream()` | +| Toggle Run/Stop without waiting | `ctx.toggle_run()` | +| Set a parameter over HTTP (returns status) | `ctx.set_parameter("name", value)` | +| Read current parameters | `ctx.get_parameters()` | +| Fetch session metrics | `ctx.metrics()` | +| Click/wait a `data-testid` | `ctx.click("testid")`, `ctx.wait("testid")` | +| Browser sleep (avoid unless you must) | `ctx.sleep(ms)` | +| Seeded chaos driver | `ctx.chaos()` | +| Record a dimension | `ctx.measure("name", value)` | +| **Start headless recording** | `ctx.start_recording(node_id="record")` | +| **Stop + download recording** (returns `Path`) | `ctx.stop_and_download_recording(node_id="record")` | +| **Snapshot live sink frame** (returns `Path`) | `ctx.capture_live_frame(sink_node_id=None)` | +| **Grab short MP4 slice of live output** | `ctx.capture_sink_video_slice(seconds=3)` | +| **Full-page browser screenshot** | `ctx.screenshot("name.png")` | +| **Element-scoped screenshot** | `ctx.screenshot_testid("stream-run-stop")` | +| **Multimodal visual assertion** | `ctx.multimodal_check(imgs, question=..., must_contain=[...])` | +| Raw access when you must | `ctx.driver`, `ctx.page`, `ctx.base_url`, `ctx.retry_probe`, `ctx.failure_watcher`, `ctx.report` | + +## Testid anchors (stable set; if you need one not listed, grep `frontend/src` for `data-testid`) + +- `inference-mode-local`, `inference-mode-cloud`, `inference-mode-continue` +- `telemetry-accept`, `telemetry-decline` +- `workflow-card-`, `workflow-get-started`, `workflow-import-load` +- `tour-next`, `tour-skip` +- `stream-run-stop` (attr `data-streaming="true"` when active) +- `sink-video` +- `cloud-toggle` + +Workflow IDs: `local-passthrough` (CPU / PR-gate-safe), `starter-mythical-creature`, `starter-ref-image`, `starter-ltx-text-to-video` (GPU / nightly). + +## Gotchas — do NOT violate these + +1. **Never apply `@pytest.mark.cloud` manually.** Pass `mode="cloud"` to `@scenario`. The decorator applies the marker AND makes `ctx.complete_onboarding()` dispatch cloud. +2. **Never call `failure_watcher.mark_initiated_stop()` directly.** Use `ctx.stop_stream()` or `ctx.toggle_run()` — they handle it. +3. **Never call `gates.enforce_all_gates()` manually.** The decorator's teardown does it. Calling it twice is safe but signals you don't trust the contract — fix the root issue instead. +4. **Do not import raw fixtures (`scope_harness`, `driver`, `retry_probe`, etc.) in a new test.** If you think you need one, ask: can this use `ctx.` instead? Almost always yes. +5. **Do not reset retry counters mid-test** unless you're also going to write a comment explaining exactly why the warmup legitimately ticks them. Otherwise you're hiding evidence. +6. **File name must start with `test_`.** pytest collection rule. +7. **If the PR ring is CPU-only, the test must be too.** Use `local-passthrough` or a different PR-ring-safe workflow. GPU-specific bugs → nightly ring. + +## Worked example + +**Input:** "Add a regression for PR #1234 — users spamming the prompt slider during a cloud stream could crash the session. Fix was to debounce parameter updates." + +**Output file:** `product-tests/regression/test_pr_1234_prompt_spam_during_cloud_stream.py` + +```python +"""Regression for #1234: prompt spam during cloud stream crashed the session. + +- What the user did: On a running cloud stream, dragged the prompt slider + back and forth for ~10s (roughly 30–50 updates/sec). +- What should happen: Each parameter update is accepted or coalesced; the + stream continues rendering. +- What did happen: WebRTC data channel overflowed, session closed with + 'forcibly closed' in scope.log, UI showed an error toast. +- Root cause: Unbounded HTTP → data-channel fan-out in the parameter + broadcast path; backpressure was not enforced. +- Fix: Debounce + rate-limit parameter updates before + broadcasting (webrtc.py::broadcast_parameter_update). +""" + +from __future__ import annotations + +from harness.scenario import scenario + + +@scenario(mode="cloud", workflow="starter-mythical-creature") +def test_pr_1234_prompt_spam_during_cloud_stream(ctx): + """Spam 200 parameter updates over HTTP; cloud session must survive.""" + ctx.complete_onboarding() + ctx.run_and_wait_first_frame(timeout_ms=90_000) + + for i in range(200): + ctx.set_parameter("__prompt", f"variant-{i}") + + # Give the pipeline a moment to process the tail of the spam. + ctx.sleep(2000) + + # No explicit assertion needed. Decorator teardown will fail this test + # if any retry fired, the session closed unexpectedly, or a UI error + # toast appeared — which is exactly what happened pre-fix. +``` + +Notice what's NOT there: no fixture imports, no `failure_watcher.mark_initiated_stop()`, no `gates.enforce_all_gates()`, no `assert report.passed`. The decorator owns all of that. + +## After writing + +1. Run it: `uv run pytest product-tests/regression/test_pr__.py -v`. Report to the user whether it passed. +2. If the bug was not yet fixed on the current branch, expect it to **red**. That's correct — it proves the test actually reproduces the bug. Mention this to the user; they may want to gate the merge on this test. +3. If the test greens on an unfixed branch, the repro isn't tight enough — tighten it before landing. +4. Do NOT run `gh pr create` unless the user explicitly asks you to ship it. + +## Multimodal patterns (use when the bug is about "how it looks") + +The multimodal pathway is the bridge between the Chrome-MCP `onboarding-test` +skill (a human / Claude looking at the UI) and automated CI coverage. Use it +when a testid assertion can't capture the symptom: + +- "the third workflow card is clipped on a 1440px viewport" +- "the tour popover is pointing at empty space instead of the Run button" +- "the recorded MP4 shows visible pixelation" +- "the sink is rendering all-black frames" + +Four reference tests ship in the repo — **copy from the closest match**, don't +reinvent: + +| Pattern | Reference | +|---|---| +| UI element absent/clipped/mislaid | `scenarios/test_ui_workflow_picker_visual.py` | +| UI tooltip/modal/button-state positioning | `scenarios/test_ui_tooltip_placement.py` | +| Stream output frames look wrong (black / frozen / artifacted) | `scenarios/test_stream_output_looks_right.py` | +| Recorded MP4 timestamps/visual quality | `regression/test_recording_timestamp_drift.py` | + +### The multimodal test shape + +```python +@scenario( + mode="local", + workflow="local-passthrough", + feature="ui", + marks=(pytest.mark.multimodal,), +) +def test_pr_NNN_workflow_card_clipped(ctx): + # 1. Drive the UI to the state where the bug is visible. + ctx.complete_onboarding() + # (or: ctx.wait(testids.WORKFLOW_GET_STARTED) if you need to stop mid-flow) + + # 2. Capture evidence. Prefer element-scoped over full-page when a single + # component is the subject — it gives the reviewer more signal per token. + shot = ctx.screenshot_testid(testids.workflow_card("local-passthrough")) + full = ctx.screenshot(name="workflow_picker_full.png") + + # 3. Ask. Phrase the question with a clear pass bar and must_contain items. + verdict = ctx.multimodal_check( + [full, shot], + question="Are all three workflow cards fully visible and un-clipped?", + must_contain=[ + "three workflow cards in a row", + "no card is clipped at the viewport edge", + ], + ) + + # 4. Branch on the three-valued verdict. + if verdict.status == "fail": + ctx.report.fail( + f"multimodal UI check failed: {verdict.reasoning}" + ) + # "uncertain" is silent — usually means SCOPE_MULTIMODAL_EVAL=0 locally. + # "pass" falls through; the auto-teardown gates still run. +``` + +### Gates + gotchas for multimodal + +- **Always add `pytest.mark.multimodal`** via the decorator's `marks=` kwarg. + That's what makes CI's nightly ring pick it up and PR ring skip it by default. +- **Assets go into `ctx.test_report_dir`** automatically when you use the ctx + helpers. Don't write to `tmp_path` for images you want a human to see after + a failure — the report dir is what CI uploads. +- **Prefer element-scoped over full-page** for layout/positioning questions — + less noise for the model, more signal. +- **Combine with cheap machine checks when you can.** `harness.media.looks_black` + and `looks_monochrome` catch the obvious cases for free; only reach for the + API when the signal isn't in the pixels alone. +- **Write a `must_contain` list when possible.** It forces the model into a + structured `missing_required` list on failure, which produces actionable + triage output. +- **Never block on multimodal in the PR ring.** If the bug CAN be caught by a + testid assertion or a cheap pixel stat, use that path for the PR ring and + reserve multimodal for nightly. + +## If the bug cannot be expressed in `ctx` + +It's rare but real. Examples: the bug is in raw WebRTC negotiation (not covered by `ctx`); the bug only fires on a specific graph topology (needs a custom HTTP `session/start` body). In those cases: + +1. Use `ctx.base_url` + raw `requests` for HTTP control-plane operations. +2. Use `ctx.page` for raw Playwright when a testid doesn't exist. +3. If you find yourself reaching for `ctx.failure_watcher` / `ctx.retry_probe` directly — stop. That's the decorator's job. If the decorator is in the way, the escape hatch is *not* to write a raw-fixture test; it's to improve `ctx` and re-target. File a note and ask. diff --git a/.agents/skills/testing-livepeer-fal-deploy/SKILL.md b/.agents/skills/testing-livepeer-fal-deploy/SKILL.md new file mode 100644 index 000000000..6350cba46 --- /dev/null +++ b/.agents/skills/testing-livepeer-fal-deploy/SKILL.md @@ -0,0 +1,266 @@ +--- +name: testing-livepeer-fal-deploy +description: End-to-end test harness for Scope's Livepeer cloud path against a deployed fal.ai app — the only supported cloud path going forward (the old cloud-relay / direct mode using `fal_app.py` + `CloudConnectionManager` is being deprecated). Primary path is a Playwright browser test that drives the full UI flow (camera → local scope WebRTC → livepeer trickle → fal runner → back), producing every session-lifecycle Kafka event. Secondary path is `test-cloud-connect.sh` — a bash/curl smoke test for the `/api/v1/cloud/connect` path only. TRIGGER any time a user says "test cloud", "test the fal deploy", "test cloud streaming", "run the e2e test", "run playwright", "verify cloud connect", "verify kafka events", "diagnose fal", "debug fal deploy", "did my stream work", "deploy-staging.sh", OR pastes any of these errors — "All orchestrators failed (N tried)", "ACCESS_DENIED", "did not receive ready message from websocket", "discover_orchestrators requires discovery_url", "cold start" — OR has just changed `src/scope/cloud/livepeer_fal_app.py` / `src/scope/cloud/livepeer_app.py` / `src/scope/server/livepeer.py` / `src/scope/server/livepeer_client.py`. Use `testing-livepeer` instead for a fully-local livepeer stack (prebuilt go-livepeer binary, no fal involvement). +--- + +# Testing Livepeer fal Deploy + +## When to use + +Use when testing the **deployed** livepeer path end-to-end — local Scope +client → daydream orchestrator → deployed fal app. This exercises: + +- The wrapper in `src/scope/cloud/livepeer_fal_app.py` that fal runs +- The runner in `src/scope/cloud/livepeer_app.py` that spawns inside the + fal container +- The orchestrator → fal handshake (headers, auth, cold start) +- Kafka event publishing across wrapper + runner (full lifecycle) + +**Two paths, pick the right one:** + +- **Playwright (primary)** — real browser drives the Perform-mode UI + with a synthetic camera, streams through, verifies the output video + comes back from the cloud. This is the only path that exercises the + full livepeer trickle round-trip and produces every lifecycle Kafka + event (`pipeline_loaded`, `session_created`, `stream_started`, + `stream_heartbeat`, `session_closed`). Takes 2–5 minutes. +- **`test-cloud-connect.sh` (secondary, HTTP-only)** — bash script that + POSTs `/api/v1/cloud/connect` and polls `/api/v1/cloud/status`. Only + verifies the `websocket_connected` / `websocket_disconnected` pair at + the wrapper layer. Useful as a fast smoke test ("did the container + come up?") or in `git bisect run` against cloud-connect regressions. + Does not produce pipeline/session/stream events. + +Do **not** use this skill for local-only livepeer testing — that's +`testing-livepeer` (prebuilt go-livepeer + local runner, no fal). + +## One-time setup + +1. **`.env.local`**: copy `.env.example` to `.env.local` (gitignored) + and fill in real values: + - `SCOPE_CLOUD_APP_ID` — your fal app URL. For the default `main` + env, the URL does **not** include a `--main` suffix (e.g. + `daydream/scope-livepeer-emran/ws`). Non-default envs do include + the suffix (e.g. `--preview/ws`). + - `SCOPE_CLOUD_API_KEY` — daydream cloud API key (sk_...). Without + this the scope client can't hit `signer.daydream.live` and fails + with `discover_orchestrators requires discovery_url or signer_url`. + - `SCOPE_USER_ID` — daydream user id. The runner's + `validate_user_access` rejects with `ACCESS_DENIED` when missing. + Find it in `~/.daydream-scope/logs/scope-logs-*.log` after a + successful UI connect, or in devtools Network on + `/api/v1/cloud/connect`. + - (Optional) `LIVEPEER_DEBUG=1` — surfaces per-orchestrator + rejection reasons in scope.log; essential for diagnosing + `All orchestrators failed (N tried)`. +2. **product-tests setup** (once per machine): + ```bash + uv sync --group product-tests + uv run playwright install --with-deps chromium + ``` + This installs pytest, Playwright, and Chromium with the right + system deps. Without `--with-deps`, the browser fails to launch + with `error while loading shared libraries: libnspr4.so`. + + The `@scenario(mode="cloud")` decorator on the test handles auth + via a localStorage bypass (see ``harness/cloud_auth.py``), so no + frontend rebuild with `VITE_DAYDREAM_API_KEY` is needed. + +## Running the Playwright test (primary) + +When the user says "test cloud" (or any trigger in the description), +**always deploy their current working tree before running Playwright**. +Otherwise the test runs against whatever stale code was last deployed +and can false-positive on their change. + +### Step 0 — Ask the user where to deploy + +Before anything else, confirm the deploy target. Use AskUserQuestion +(or plain text prompts) and persist answers for the session: + +1. **Fal app name** — required. If `SCOPE_FAL_APP_NAME` is set in + `.env.local`, show that value and ask the user to confirm or + override. Otherwise ask outright (e.g. `scope-livepeer-`). +2. **Fal env** — defaults to `main`. If `SCOPE_FAL_ENV` is set in + `.env.local`, show and offer to override. Non-default envs (e.g. + `preview`) change the URL suffix in `SCOPE_CLOUD_APP_ID` — see + below. + +Once confirmed, export both for the current shell, and derive / +overwrite `SCOPE_CLOUD_APP_ID`: + +| Env | `SCOPE_CLOUD_APP_ID` | +|---|---| +| `main` | `daydream//ws` (no suffix) | +| anything else | `daydream/--/ws` (with suffix) | + +This is a fal convention — the default `main` env is exposed without +a suffix; all other envs include `--` in the URL. Getting this +wrong produces `did not receive ready message from websocket`. + +### Step 1 — Sanity-check `.env.local` + +- `SCOPE_CLOUD_API_KEY` must be set (otherwise: + `discover_orchestrators requires discovery_url or signer_url`) +- `SCOPE_USER_ID` must be set (otherwise the runner's + `validate_user_access` rejects with `ACCESS_DENIED`) + +If either is missing, stop and ask the user before deploying. + +### Step 2 — Kill any scope already on :8000 + +If another scope process is bound to the port, stop it (or ask the +user) before continuing. The run-app.sh the script starts must be the +one under test. + +### Step 3 — Deploy + +```bash +SCOPE_FAL_APP_NAME= SCOPE_FAL_ENV= ./deploy-staging.sh +``` + +Abort with a clear error if this fails — don't run Playwright against +stale deployed code. Common failure: the `{git-short-sha}-cloud` +Docker base image isn't built yet (CI for the current commit is still +running). If that's the case, either wait for CI or have the user +confirm they want to deploy against an older base image. + +### Step 4 — Run the cloud-streaming test + +The test spins up its own fresh scope subprocess per test (via the +``scope_harness`` fixture), so you don't run ``./run-app.sh`` first — +just point ``SCOPE_CLOUD_APP_ID`` at the deploy and let pytest do it. + +```bash +SCOPE_CLOUD_APP_ID= \ + uv run pytest product-tests/release/test_cloud_streaming.py -v -m cloud +``` + +Reports land in ``product-tests/reports//`` (per-test +``report.json``, ``trace.zip``, video, ``scope.log``, plus a +top-level ``summary.md``). + +Expected on success (≤5 min cold, ~20 s warm): + +``` +product-tests/release/test_cloud_streaming.py::test_cloud_streaming_perform_mode_passthrough PASSED +============ 1 passed in ============ +``` + +The summary.md at ``product-tests/reports//summary.md`` +records ``retry_count``, ``unexpected_close_count``, and +``ui_error_events`` — all should be zero for a clean run. + +**What the test does in livepeer terms:** + +1. Navigates to `localhost:8000`, switches the UI to Perform mode. +2. Opens settings, flips Remote Inference on, waits for Connection ID + (proves the fal WebSocket handshake completed and + `websocket_connected` fired in Kafka). +3. Selects the `passthrough` pipeline — triggers `pipeline/load`, which + runs on the fal runner and emits `pipeline_load_start` + + `pipeline_loaded`. +4. Switches the input source to Camera — Playwright's launch args + `--use-fake-device-for-media-stream` and + `--use-fake-ui-for-media-stream` (configured in the ``driver`` + fixture in ``product-tests/conftest.py``) give ``getUserMedia()`` + a synthetic feed. + This is essential: without a real MediaStream, the browser↔local + scope WebRTC ICE never completes, `CloudTrack._start()` is never + called, and the runner never gets `start_stream`. +5. Clicks the play overlay (`[data-testid="start-stream-button"]`). + Frames flow via livepeer trickle through the orchestrator to the + fal runner; the runner emits `session_created` and `stream_started`. +6. Waits 15 s so at least one `stream_heartbeat` fires on the runner. +7. Asserts the **output** `