DGX re-diarization + resilience, LanceDB search parity, viewer Tier-3 validation + test push#969
Merged
Merged
Conversation
…he (#947) + DGX whisper resilience Three related pieces for the #876 prod corpus re-diarization, plus a DGX investigation writeup. All unit tests pass; the only open item is an operational DGX-box CUDA issue (see #948), NOT a code defect. #946 — strict existing-only migration mode (reprocess_existing_only + GUID filter in scraping.py + extract_item_guid + migrate-diarization target). Also fixes a per-feed _build_config bug that dropped ADR-096/#814/#926 DGX routing fields (transcription_fallback_provider/diarization_provider/dgx_diarize_*). #947 — durable GUID-keyed raw-audio cache (utils/audio_cache.py) + cache-aware _download_or_reuse_media + pipeline_stage=download_only + download-audio target. Validated: 10 cached, reprocess = 10 cache HITs / 0 feed fetches. DGX whisper resilience (whisper_provider.py): duration-scaled timeout + single-flight lock + smart retry (conn-blip retries w/ backoff; timeout falls back without re-queuing). New dgx_* config knobs. DGX compute-type investigation (#948): fp16/bf16 error on GB10 build, int8 slow, default(->fp32) is the working path; nvidia-smi unreliable on GB10. Tests + test-ordering flake fix; docs/wip/REHEARSAL-876-findings-20260609.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Source-builds ctranslate2 4.8.0 against CUDA 12.8 with sm_120 PTX so
the GB10 (compute_cap=12.1 / sm_121) driver can JIT-compile forward at
first kernel launch. Replaces the upstream :latest-cuda tag's bundled
CPU-only ctranslate2 4.5.0 (the upstream tag is misleading — the .so
has no CUDA symbols, faster_whisper raises "not compiled with CUDA"
when device=cuda).
Validated: 5-min clip transcribes in 8.25s wall-time (36.4× realtime)
vs ~103s on CPU (2.9× realtime).
New: infra/dgx/speaches-gb10/{Dockerfile,README.md}
- Two-stage Dockerfile (nvidia/cuda:12.8.0-devel builder → speaches
runtime). Builds in ~3 min on DGX once apt/git cache populated.
- README explains the incident timeline, build flags, validation
commands, operational notes.
deploy.py: pyinfra recipe now ships the Dockerfile, runs docker build
locally (idempotent via layer cache), regenerates /etc/cdi/nvidia.yaml
(nvidia-container-toolkit 1.19.1 needs CDI spec to inject GPU correctly
in mode=auto), and references the local-built image tag. Drops the
upstream `docker compose pull` step since we build locally now.
verify.py: adds two assertions that would catch the silent CPU
fallback that triggered #948:
- /etc/cdi/nvidia.yaml present + nvidia-ctk lists nvidia.com/gpu=all
- ctranslate2 inside faster-whisper sees the GB10 on cuda
(assert get_cuda_device_count() >= 1)
Idempotency tested: `make dgx-deploy` re-run is a no-op via Docker
layer cache. `make dgx-verify` passes all 11 assertions.
Why this lands on this branch: the #946 work (audio cache + duration-
scaled timeouts + single-flight provider) was the consumer-side
resilience that made the silent CPU fallback survivable. This is the
producer-side fix that closes the loop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pyannote diarize client hung ~2h during a co-tenant vLLM crash-loop on the shared GB10: httpx's read/write timeout never fired because the 60MB multipart upload trickled and kept resetting the per-write timeout. Rather than patch the diarize client alone, extract a shared DGX resilience layer both it and the #946 Whisper client consume, so they can't drift. New `providers/tailnet_dgx/resilience.py`: - `run_with_watchdog()` — hard wall-clock deadline in a daemon thread. The actual hang-fix: guarantees fail-over even when httpx's own timeout doesn't fire. Orphaned worker is a daemon (holds one connection, never blocks exit). - `CircuitBreaker` — trimmed-down port of `rss/http_policy.CircuitBreaker` (rolling-window -> open(cooldown) -> half-open probe), with a `hard` immediate-trip for definitive timeouts so a wedged batch doesn't pay the full timeout on every episode. Named + logs open/close for ops visibility. - `effective_timeout_sec()` / `probe_audio_duration_sec()` — duration-scaled budget (soundfile-based; graceful None without the [ml] extra). - `TimeoutLike` — shared timeout-class tuple. Both providers now: duration-scaled timeout + single-flight + bounded retries (timeout -> fail-over without re-queue; connection blip -> backoff) + watchdog + per-endpoint breaker. Whisper gets the same treatment for parity (same GPU, same hang vector). Diarization-specific tighter timeout via new profile-only config `dgx_diarize_request_timeout_sec` (180) / `dgx_diarize_timeout_per_audio_minute_sec` (6) — pyannote is far faster than Whisper so the budget (and a half-open probe) stays cheap. Tests: `test_tailnet_dgx_resilience.py` (20: breaker FSM, watchdog real-thread, timeout math, probe-None) + diarize/whisper provider suites extended with breaker-open / watchdog-hang / timeout-no-requeue cases. Fixed a pre-existing 15s whisper test (unmocked backoff sleep) -> 0.7s. flake8/mypy/test-policy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…953) The GB10 Speaches/ctranslate2 path is being sorted out, so the DGX Whisper client is temporarily pointed at the openai-whisper service on :8002 (model id `large-v3`) instead of Speaches on :8000 (`Systran/faster-whisper-large-v3`). Validated end-to-end: a 91.6s clip transcribes in ~9.8s; health green. Revert tracked in #955 (restore :8000 / Systran model when Speaches is back). A dated comment block in the profile flags exactly which two lines to revert. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Complements the mocked-transport unit suites: runs the real httpx path over a loopback socket against a throwaway stub mimicking the faster-whisper + pyannote services. Proves the production failure end-to-end — a *hanging* socket (httpx's own timeout never fires because the upload trickles) is abandoned by the hard watchdog and fails over — plus happy-path round-trip, HTTP 503 fail-over, and the breaker tripping so the next call skips the socket. Self-contained (own http.server, not the shared e2e server) so it can't perturb the e2e suite. Component-level (provider vs local server) → integration tier. 7 tests, ~5s. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Closes the last gap: no e2e test drove the tailnet DGX providers at all (the
existing whisper/diarization e2e tests use the local/cloud paths). Adds the DGX
endpoints to the shared e2e mock server and a provider e2e suite that runs the
real TailnetDgx{Whisper,Diarization}Provider over real httpx against it with a
real audio payload.
e2e_http_server.py (additive + one correctness fix):
- GET /v1/models + /health — faster-whisper + pyannote health/model probe.
- POST /v1/diarize — canned two-speaker pyannote result.
- /v1/audio/transcriptions now honours the requested response_format: json /
verbose_json get a proper JSON body (what the OpenAI SDK *and* the DGX client
both request and parse) instead of text/plain; text/unspecified unchanged.
Verified the openai/basic/capabilities transcription e2e suites still pass.
- dgx_host_port() URL helper.
test_tailnet_dgx_e2e.py: happy round-trip (transcribe verbose_json + diarize)
plus the production failure modes via the server's set_error_behavior injection
— a hanging socket (delay) is abandoned by the watchdog and a 5xx fails over,
asserting the breaker trips. 6 tests.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…913) OpenAI `verbose_json` returns timestamped Whisper-format segments and the audio is downloaded locally (for the API upload), so the existing local pyannote second pass — `apply_diarization_to_result(result, media_for_transcription, ...)`, which is provider-agnostic — can diarize + align it, exactly like whisper / tailnet_dgx_whisper. Previously `diarize`/`screenplay` were silently coerced off for openai by design; this makes them work. - `_DIARIZATION_ELIGIBLE_TRANSCRIPTION_PROVIDERS` gains `openai` (Gemini/Mistral stay out — they emit plain text, no segments; deepgram self-diarizes). - **Opt-in, not default-on.** Since `diarize` defaults True, simply making openai eligible would flip diarization ON for *every* openai run (21 e2e/integration files + the production cloud profiles) — a broad, surprising change needing an HF token. New `_DIARIZATION_DEFAULT_ON_TRANSCRIPTION_PROVIDERS = {whisper, tailnet_dgx_whisper}`: eligible-but-not-default-on providers (openai) keep diarize OFF unless explicitly set. So existing openai behavior is unchanged; cloud_balanced / cloud_thin opt in with `diarize: true` (documented in both). - Coercion log messages now reference the eligible set dynamically (no drift). - Tests: new test_diarize_openai_eligibility.py (eligible + opt-in default-off + explicit-true respected + whisper still default-on + gemini/mistral coerced). Repointed 3 #562 screenplay-coercion tests from openai → gemini (openai no longer coerces). flake8/mypy/openai-integration(96) green; openai e2e unchanged. Distinct from the #876/#946/#954 work also on this branch — self-contained, cherry-pickable to its own PR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Unit layer of a graph-UI test push (follows the #914 coverage gate). Adds/extends Vitest tests for the lowest-covered graph modules and fixes a latent bug surfaced along the way: - NEW store tests: graphNavigation (25%→~100%), graphExpansion (40%→100%), graphExplorer (52%→100%). - EXTENDED: graphHandoff store (57%→100% stmt), cyGraphLabelTier (33%→100%), graphLensLabels (37%→100%), graphEpisodeSelection (58%→97%), graphEpisodeMetadata (65%→100% stmt). - +241 tests (847→1088); overall viewer coverage 77→82.5% stmt / 68→72.9% br / 76.5→85% fn / 79→84.3% ln — comfortably above the #914 gate. Bug fix (graphEpisodeMetadata.ts): `resolveEpisodeMetadataFromLoadedArtifacts` threw a TypeError (`(... ).trim()` on undefined) when an artifact had no `sourceCorpusRelPath` and the parallel `selectedRelPaths[i]` was missing. Both operands are already trimmed, so the outer `.trim()` was redundant; coalesce to '' and skip gracefully. Test updated to assert the graceful null. Logic-only layer (the repo tests UI behavior via Playwright e2e, not @vue/test-utils mounting). Component + e2e layers to follow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… tests Component layer of the graph-UI test push. Introduces the repo's first real @vue/test-utils mount harness (the prior "component test" only read .vue source as text) and mount-tests the cheap/high-value graph components; the cytoscape-heavy containers (GraphCanvas, GraphTabPanel, GraphNodeRailPanel) are left to the e2e layer. - New dev dep: @vue/test-utils. Pattern: happy-dom + setActivePinia + mount with data-testid queries + real stores; heavy children/cytoscape stubbed via global.stubs (see GraphDegreeChip.test.ts as the reference). - ~146 component tests across: chips (Degree/Edges/Types/Feed/Sources), GraphStatusLine (both variants), HandoffErrorStrip, GraphBottomBar, GraphConnectionsSection, NodeDetail, GraphFilterBar, GraphGestureOverlay. Bug fix (GraphGestureOverlay.vue): the overlay root <div> never bound `ref="overlayRootRef"`, yet the Escape-key handler reads `overlayRootRef.value` and bails (`if (!root) return`) — so Escape-to-dismiss silently never worked in production. Bound the ref; the new test asserts Escape now dismisses when focus is outside the dialog. Note: mounting components pulls the .vue files (and transitive imports) into the v8 coverage denominator (5229→7004 statements), so overall % shifts to a more honest, lower baseline that exposes undertested components — still above the #914 gate (stmt 77.9/br 68.3/fn 82.1/ln 79.5 vs 75/65/73/76). 1234 tests green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
First slice of the app-wide test push (after the graph-UI push). The api/ layer was the biggest gap; all eight wrappers are now fully covered: - exploreApi 0→100%, artifactsApi 10→100%, corpusMetricsApi 32→100%, operatorConfigApi 40→100%, searchApi 58→100%, corpusLibraryApi 71→100%, cilApi 73→100%, relationalApi 82→100%. - +~123 tests (1234→1357). Each covers request URL/method/body building, query-param trimming/clamping/omission, response parsing/normalization, pagination/cursor, and every error branch (non-ok + text, HTTP-status fallback, network throw, malformed JSON). Overall viewer coverage: stmt 77.9→79.6 / br 68.3→69.7 / fn 82.1→83.7 / ln 79.5→81.1 — above the #914 gate. No bugs found; no source changed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Second app-wide slice. Lifts the lowest-covered Pinia stores: - corpusLens 44→100%, explore 60→100%, subject 76→100% (stmt + functions), artifacts 33→95% stmt / 81% br / 98% fn (the large central store — load paths, display-artifact build, selection, sibling-merge, topic-cluster overlay; api modules mocked, local-file paths via constructed File objects). - +~143 tests (1357→1500). Overall viewer coverage: stmt 79.6→83.7 / br 69.7→72.5 / fn 83.7→87.7 / ln 81.1→85.2. No bugs found; no source changed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Third app-wide slice. Lifts the lowest-covered util/helper modules: - formatDuration 47→100%, corpusFeedRowDisplay 73→100%, readApiErrorMessage 75→100%, feedRunLinking 77→100%, cyCoseLayoutOptions 76→100%, pipelineJobLogSummary 78→98%, humanizeJsonDocument 79→99%. - +~175 tests (1500→1675). Pure-function coverage: every branch + edge case (zero/negative/huge/fractional/non-finite, null/undefined/empty, malformed JSON, circular refs, all formatting/parse branches). Overall viewer coverage: stmt 83.7→84.9 / br 72.5→74.3 / fn 87.7→89.0 / ln 85.2→86.5. No bugs found; no source changed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The graph-UI + app-wide test push lifted viewer coverage well past the original #914 floor (75/65/73/76). Ratchet the gate to a floor a few points below the new baseline (stmt 84.9 / br 74.3 / fn 89.0 / ln 86.5) so the gains can't silently regress: statements 75→82 · branches 65→71 · functions 73→86 · lines 76→84 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fourth app-wide slice: mount-tests the cheap/high-value non-graph .vue components via the @vue/test-utils harness (chips, filter bars, feature panels, shared widgets). Deferred: the dashboard chart-lib wrappers (low-value mount tests; their data transforms live in tested utils) and the big container views (DashboardView/DigestView/LibraryView/SearchPanel — already e2e-covered). - +266 tests across 22 components: explore (More/Text chips + FilterBar), library (Clustered/Feed chips + FilterBar), search (DocTypes/More/TopK chips + FilterBar + ResultCard + SemanticSearchTip), shared (DateChip, CollapsibleSection, DiagnosticRow, CilTopicPillsRow, PodcastCover 0→covered, HelpTip 4→covered), episode (BridgePartition, DetailPanel), subject (TopicEntityView). +252 net (1675→1927). - api modules mocked, charts/cytoscape/heavy children stubbed, real stores. Mounting these pulls more .vue + transitive imports into the v8 denominator (7004→8088 statements), so overall % holds while honestly covering far more of the component tree — still above the ratcheted #914 gate (stmt 84.6/br 74.2/ fn 89.2/ln 86.2 vs 82/71/86/84). 1927 tests green, typecheck clean. No bugs found. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Final component slice. Mount-tests the central shell orchestrators + the remaining shared dialogs/widgets: - shell: StatusBar (corpus-path input, health/index dots, sources+config+feeds dialogs, version-warning), LeftPanel (search/explore surface switch, focusQuery), SubjectRail (kind-routing to episode/graph-node/topic/person panels + event re-emission + main-tab neighbourhood wiring). - shared: TranscriptViewerDialog (4→covered: load/highlight/error/audio/segments, dismiss paths), TopicTimelineDialog (open/sort/states/dismiss), HoverRichTip (hover/focus show-hide timers, Esc, teleport teardown). - +135 tests across 6 components. api/cil/feeds modules mocked, heavy children stubbed, real stores. No bugs found; no source changed. Gate green at the ratcheted floor. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The earlier ratchet (82/71/86/84) was set against the logic-layer baseline, before the component mount-test waves. Mounting container components pulls their whole transitive .vue import tree into v8's (no-`all`) denominator, so the headline % settled lower at the full-tree scope (stmt 81.1 / br 69.4 / fn 80.5 / ln 82.8) even though absolute coverage grew. Set the floor a few points below that honest baseline so CI's viewer-unit coverage gate passes and still guards against regression: statements 82→78 · branches 71→66 · functions 86→77 · lines 84→80 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The viewer's test tiers (Vitest unit/component + #914 coverage gate, mocked Playwright e2e, Tier-3 real-corpus validation walk) and corpora were scattered across ADR-095 / RFC-086 / e2e/validation/README — hard to answer "what graph tests exist and how do I run them?" quickly. - New web/gi-kg-viewer/TESTING.md: operator quick-ref — every tier, command, graph-subset filters, corpora (synthetic validation vs BYOC), the coverage gate, and the RFC-086 matrix rule. Links the existing E2E_SURFACE_MAP + validation README (no duplication). - TESTING_STRATEGY.md (Browser UI E2E) + E2E_TESTING_GUIDE.md: add the missing Vitest coverage gate + Tier-3 validation walk and point to TESTING.md. - viewer README: add a Testing section pointing to TESTING.md. make docs passes (strict). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The committed synthetic validation corpus ships only the pre-built top-level API JSONs and is missing the raw feeds/*/metadata/*.gi.json artifacts that the live serve-api computes episodes from. The real-corpus walk against it returns an empty Library and fails ~30 handoff specs. Documented as a known gap (with the regenerate-and-commit fix) so operators run the walk against a BYOC/prod corpus until the fixture is regenerated. Tracked separately. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The synthetic-corpus real-backend walk failed 30/34 specs with an empty
Library. Root cause: the Makefile + TESTING.md pointed CORPUS_PATH at the
version-LESS parent `tests/fixtures/viewer-validation-corpus`, but the raw
`feeds/<feed>/metadata/*.{metadata,gi,kg,bridge}.json` artifacts that serve-api
computes episodes from live under the FIXTURES_VERSION subdir (`.../v2`).
`discover_metadata_files()` returns 0 at the parent and 23 at `.../v2`, so the
Library rendered no episodes and every handoff spec failed on the first
row-click. The corpus itself was always valid — nothing was missing.
- Makefile: new `VIEWER_VALIDATION_CORPUS` var derived from FIXTURES_VERSION,
wired into ci-ui-validation / serve-for-validation / build-validation-index
so the documented path always includes the version dir.
- TESTING.md: corrected the run command (derives the version), the corpus-table
row (it's two layers — raw feeds/ + pre-built corpus/*.json), and replaced the
earlier incorrect "missing artifacts" note with the version-dir caveat.
Verified: `make ci-ui-validation CORPUS=.../v2` → 30 passed / 1 skipped (was 30
failed). After `make build-validation-index`, the two index-dependent specs
(P1.3 digest topic-band, P4.2 digest band) also pass → 32/33. The remaining V4
(dashboard topic-cluster chip) fails identically on the real prod-v2 corpus too
— a pre-existing handoff gap, tracked separately, not a corpus issue.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ema-version self-heal Two coupled fixes so the LanceDB two-tier hybrid path behaves like FAISS across every surface that applies a date bound. **publish_date parity (the digest topic-band regression).** The shared `_hit_passes_cli_filters` drops any hit lacking `publish_date` when `since` is set. FAISS rows carry it; lance rows did not — so every hybrid hit was dropped whenever a `since` bound was passed. The digest topic-band search ALWAYS passes one (window=all → since=1970-01-01), so a corpus served via lance returned 0 topic-bands where FAISS returned them (Tier-3 P1.3/P4.2). Masked in prod only because prod still ships the legacy `lance_native/` dir → silent FAISS fallback. Fix: store `publish_date` on the segment/insight/aux docs (schema + dataclass), populate it in both index paths (native `index-two-tier` + the FAISS→lance migration), and surface it from the row payload into hit metadata. **Schema-version self-heal.** Adding a column makes pre-existing lance indexes incompatible, and there was no version on the index to detect that. Add `LANCE_SCHEMA_VERSION` (now 2) stamped into `index_meta.json`, plus `stored_schema_version()` / `lance_index_is_stale()`. Wire the staleness check so old indexes self-heal: - read path (`hybrid_candidates`) skips a stale index → FAISS fallback (never serves results from an incompatible schema); - (re)index moments rebuild rather than upsert into incompatible tables: `build_two_tier_index` + `migrate_faiss_to_lance` wipe-if-stale, and migration 0002 rebuilds a stale index instead of no-op. Staleness requires positive evidence (meta present, version < code); a missing meta is treated as not-stale so the read path's own try/except governs. Tests: publish_date carried through hybrid hits; stale index detected + read falls back to FAISS; 0002 rebuilds a schema-stale index. Full search + upgrade suites green (240 passed); mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…index The Tier-3 validation harness was FAISS-only; reflect the current search layer so the walk exercises the real hybrid (BM25 + dense + RRF) path and matches prod's two-tier layout. - Makefile build-validation-index: add `cli index-two-tier` (lance_index) between the FAISS build and topic_clusters. Now builds all three, documented inline with what each unlocks (V3 semantic search; V2/V4 topic clusters; hybrid serving). - TESTING.md: document the one-time `make build-validation-index` step, the three search artifacts + what each unlocks, and that the "1 skip" is V3 when no vector index is present. Corpus table now notes the raw `feeds/` + pre-built layers. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…r-3 V3)
Second FAISS-parity gap on the hybrid path, same shape as the publish_date one.
The viewer's "Show on graph" affordance (ResultCard) is gated on
`graphNodeIdFromSearchHit`, which reads `metadata.source_id` — the canonical graph
node id (`topic:…` / `entity:…` / `insight:…` / `quote:…`) — for the focusable
tiers (insight / quote / kg_topic / kg_entity). FAISS rows carry it; lance rows did
not, so a search served via LanceDB rendered no graph handoff and Tier-3 V3
("Search → Show on graph") timed out waiting for the button.
Fix: store `source_id` on the insight + aux docs (schema + dataclass), populate it
in both index paths (native `index-two-tier` + the FAISS→lance migration), and
surface it from the row payload into hit metadata. Folded into LANCE_SCHEMA_VERSION
2 (the unshipped parity bump) alongside publish_date.
Verified: `source_id` flows through hybrid hits (topic:technology, entity:…,
insight:…); Tier-3 V3 passes (handoff status "applied"). Search + upgrade suites
green (238 passed); parity regression test asserts both publish_date + source_id.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… FAISS
Test-evolution for "less FAISS, more LanceDB": the Tier-3 walk exercised the search
path but never asserted WHICH backend served it, so a stale/broken lance_index that
silently degraded to FAISS would pass unnoticed. V6 hits /api/search directly and
proves the LanceDB two-tier hybrid (BM25 + dense + RRF) is live:
- RRF score signature: max(score) < 0.1. FAISS returns a cosine similarity (top hit
≈ 1.0); LanceDB returns fused RRF scores 1/(60+rank). This is the definitive
discriminator.
- two-tier provenance: every hit carries source_tier ∈ {insight, segment, aux}.
- hybrid-only response fields present: lift_stats, query_type.
Verified both ways: passes on the lance index (maxScore 0.031); FAILS when
lance_index is removed (FAISS fallback → maxScore 1.0), so it's a real provenance
guard, not a no-op. Full Tier-3 walk now 34/34 green (V1–V6 + handoff matrix).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… provenance Fold this session's lance/search learnings into their existing owner docs (no new files): - RFC-090 §10 (new): the metadata-parity contract (any consumer field must be on both backends — publish_date for date filters, source_id for graph handoff), LANCE_SCHEMA_VERSION self-heal (stale → read skips to FAISS, reindex rebuilds), and the V6 provenance guard. - SEMANTIC_SEARCH_GUIDE: operator subsections "Telling which backend served a query" (RRF score < 0.1 vs FAISS cosine ≈ 1.0; source_tier/lift_stats/query_type) and "Metadata parity + schema versioning", incl. the legacy lance_native/ → FAISS-fallback gotcha and the reindex fix. - e2e/validation/README: add the V6 scenario row. make docs (mkdocs strict) passes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The default corpus-graph auto-load cap bounds usable graph size because the `cose` layout is ~O(n²) (stress test: 100 episodes ≈ 2861 nodes ≈ 134s layout). Bump the interim ceiling to 25 (~930 nodes, ~6-8s) and document the tradeoff + the stress-test numbers inline. Raising it meaningfully is gated on the large-graph layout (cose→fcose) + selection-persistence work tracked in #967. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…close (#956) #956's core symptom (a DGX request hanging after the server returned 200) is already handled by the #954 `run_with_watchdog` hard wall-clock deadline. This adds the two transport-level defences from #956 that actually FIT a long-blocking upload, via a shared `dgx_http_client()` factory both providers now use: - **TCP keepalive** (`keepalive_socket_options`): SO_KEEPALIVE + a ~30s/15s/4-probe schedule so a socket whose underlying path died mid-request (e.g. a Tailscale path switch) is reaped in ~90s instead of the OS default (2h on macOS) — turning an indefinite hang into a prompt connection error the provider fails over on. Built defensively (Linux TCP_KEEPIDLE vs macOS TCP_KEEPALIVE, each hasattr-guarded). - **Connection: close** so the server tears the socket down after the response. Deliberately NOT added (don't fit long-blocking DGX calls): - per-read timeout — these POSTs stream zero bytes during the multi-minute GPU run, so any read deadline shorter than processing false-aborts a healthy call; the duration-scaled watchdog is the correct backstop. - urllib3 Retry adapter — we're on httpx, and retries are already handled by the provider loops + circuit breaker. - Connection-reuse concerns — moot; each request uses a fresh client. Both providers migrated to `dgx_http_client`; whisper's now-redundant httpx import-guard removed. #956 Tier-1 (async job submission) is server-side, deferred to the DGX chapter. 38 DGX unit/integration/e2e tests green; mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
) Follow-up to the cap bump: graphEpisodeSelection.test.ts pinned the constant at 15. Update to 25 (the interim ceiling pending the large-graph layout work #967). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Tier-3 validation walk writes generated Playwright artifacts (incl. error-context.md) to web/gi-kg-viewer/validation-results/ — the same class as test-results/ and playwright-report/, which lint-markdown already ignores. It was missing from the ignore globs, so a left-over artifact (e.g. from a deliberately failing run) failed `make lint-markdown`/`make ci`. Already gitignored; align the markdownlint ignores to match. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ilure interrogate (the `make docstrings` 100% gate) flagged these two methods from the #954 resilience layer — they predate the first full `make ci` on this branch (targeted test runs don't run interrogate). Document the state transitions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Removing whisper_provider's local httpx import-guard (now that the client is built in dgx_http_client) lost the friendly "httpx required" RuntimeError that test_transcribe_dgx_requires_httpx asserts — the raw ImportError surfaced instead. Move the guard into the shared factory so both providers get the actionable error. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…obal pollution) The #947 GUID-keyed audio cache defaults to a repo-relative global dir (`.cache/audio`, enabled by default). With no test isolation, tests both pollute that shared cache and read each other's downloads — and a stale GUID hit silently masks failure-injection tests: `test_chaos_run_index_records_failed_episode` 404s e03's audio expecting the episode to fail, but a previously-cached e03 (from a sibling test / prior run) produced a cache HIT, so e03 transcribed and was never recorded as failed (`episodes_failed == 0`). Passed on main (no cache existed), broke here — caught by the first full `make ci` on the branch. Add an autouse conftest fixture redirecting `DEFAULT_AUDIO_CACHE_DIR` to a per-test `tmp_path/audio` (basename kept `audio` so `test_default_dir` holds). Tests passing an explicit `audio_cache_dir` are unaffected. Verified: full e2e error-handling class + both #947 cache suites green (27 passed), and tests no longer write to the global `.cache/audio`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lears-focus) The viewer test push added `ref="overlayRootRef"` to activate the overlay's capture-phase Escape listener (Escape-to-dismiss). That was untested-in-e2e and conflicts with the core "Escape clears focus" contract: on cold-start the overlay is up, so its Escape handler intercepts the key (preventDefault + dismiss) before `graphHandoff.focusCleared()` runs — H1.12 (Escape bumps the handoff generation) failed on firefox (post-handoff activeElement lets the overlay claim the Escape). Revert the one-line ref add so GraphGestureOverlay.vue matches main exactly (overlay still dismissed via its button / backdrop click), and drop the unit test that asserted the reverted Escape-dismiss path. H1.12 passes on firefox again; the overlay's other 14 unit tests stay green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ction) CodeQL flagged the os.path.isfile / open sinks in the new stored_schema_version helper: meta_path derived from a corpus path (user-provided via the API) without the sanitiser chain the rest of this file uses. Confine the CONSTANT "index_meta.json" subpath under the resolved root via safe_relpath_under_corpus_root → normpath_if_under_root (same Type-1 pattern as read_index_meta), so CodeQL sees the path as sanitised. Behaviour unchanged (v2→not-stale, v1→stale, absent→None); 18 search/upgrade tests + mypy green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ersion (#969) CodeQL #360/#361/#362 — the os.path.isfile / open sinks in stored_schema_version. Identical Type-1 cross-function pattern to read_index_meta (#338/#342): meta_path sanitised via safe_resolve_directory -> safe_relpath_under_corpus_root (constant index_meta.json) -> normpath_if_under_root, corpus root route-confined by resolve_corpus_path_param. CodeQL can't model helper-based sanitisers. Dismissed via gh api per the registry policy; code already uses the prescribed chain. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ift FP, #969) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bundles several related workstreams. Full
make ciis green locally (fast gates → full pytest 4444 + integration + e2e → test-ui + test-ui-e2e → build-viewer → coverage-enforce → docs → build → Docker stack-test).DGX re-diarization + resilience
:8002.resilience.py:run_with_watchdog(hard wall-clock deadline — the real fix for hangs-after-200, since httpx's own timeout never fires under a co-tenant GPU stall),CircuitBreaker, duration-scaled timeouts; e2e (mock-server) + real-socket integration tests.dgx_http_clientfactory: TCP keepalive (~90s dead-socket detection vs the 2h OS default) +Connection: close. Reconciled against Diarization client needs duration-scaled timeout + single-flight (parallel to #946 whisper) #954 (per-read timeout rejected — it false-aborts long zero-byte POSTs; Tier-1 async-jobs deferred to the DGX chapter).LanceDB search parity + self-heal
publish_date+source_idFAISS-parity on the two-tier hybrid index — fixes digest topic-bands returning 0 under lance (the sharedsincefilter dropped hits lackingpublish_date) and the viewer "Show on graph" affordance (readsmetadata.source_id).LANCE_SCHEMA_VERSIONself-heal: stamped inindex_meta.json; stale index → read path skips to FAISS, (re)index moments rebuild instead of upserting into incompatible tables.source_tier/lift_stats/query_type).Viewer Tier-3 validation + test push (#914)
build-validation-indexbuilds FAISS + lance two-tier + topic_clusters; new V6 spec asserts lance (not FAISS) is serving. Tier-3 walk 34/34.Notable regressions caught + fixed by the first full
make ci.cache/audiomasked the chaos failure-injection e2e test; isolated per-test in conftest.Follow-up chapters (NOT this PR)
🤖 Generated with Claude Code