DGX re-diarization + resilience, LanceDB search parity, viewer Tier-3 validation + test push by chipi · Pull Request #969 · chipi/podcast_scraper

chipi · 2026-06-11T16:14:26Z

Bundles several related workstreams. Full make ci is green locally (fast gates → full pytest 4444 + integration + e2e → test-ui + test-ui-e2e → build-viewer → coverage-enforce → docs → build → Docker stack-test).

DGX re-diarization + resilience

Reprocess whisper_transcription episodes for corpus-wide SPOKEN_BY once new diarization lands #876/Strict existing-only re-diarization migration mode (#876 enabler) #946/Durable audio cache for reprocessing (re-diarization without feed dependency) #947 existing-only re-diarization migration + GUID-keyed durable audio cache + DGX whisper resilience.
DGX GB10: faster-whisper compute-type / GPU-acceleration investigation (int8 crawls, fp16/bf16 unsupported, nvidia-smi unreliable) #948/Deploy openai-whisper service on DGX (parallel to speaches/faster-whisper) #953 speaches-gb10 build + point the whisper client at openai-whisper :8002.
Diarization client needs duration-scaled timeout + single-flight (parallel to #946 whisper) #954 shared resilience.py: run_with_watchdog (hard wall-clock deadline — the real fix for hangs-after-200, since httpx's own timeout never fires under a co-tenant GPU stall), CircuitBreaker, duration-scaled timeouts; e2e (mock-server) + real-socket integration tests.
DGX-over-Tailscale client resilience: long-blocking HTTP hangs after server returns 200 #956 shared dgx_http_client factory: TCP keepalive (~90s dead-socket detection vs the 2h OS default) + Connection: close. Reconciled against Diarization client needs duration-scaled timeout + single-flight (parallel to #946 whisper) #954 (per-read timeout rejected — it false-aborts long zero-byte POSTs; Tier-1 async-jobs deferred to the DGX chapter).
Allow pyannote diarization on the OpenAI Whisper transcription path #913 allow pyannote diarization on the OpenAI-Whisper path.

LanceDB search parity + self-heal

publish_date + source_id FAISS-parity on the two-tier hybrid index — fixes digest topic-bands returning 0 under lance (the shared since filter dropped hits lacking publish_date) and the viewer "Show on graph" affordance (reads metadata.source_id).
LANCE_SCHEMA_VERSION self-heal: stamped in index_meta.json; stale index → read path skips to FAISS, (re)index moments rebuild instead of upserting into incompatible tables.
Docs: RFC-090 §10 + SEMANTIC_SEARCH_GUIDE operator diagnostics (RRF vs cosine, source_tier/lift_stats/query_type).

Viewer Tier-3 validation + test push (#914)

Tier-3 walk fixed (CORPUS must be the FIXTURES_VERSION dir); build-validation-index builds FAISS + lance two-tier + topic_clusters; new V6 spec asserts lance (not FAISS) is serving. Tier-3 walk 34/34.
~+1200 viewer unit/component tests; coverage gate enforced.

Notable regressions caught + fixed by the first full `make ci`

Audio-cache (Durable audio cache for reprocessing (re-diarization without feed dependency) #947) test isolation — the global .cache/audio masked the chaos failure-injection e2e test; isolated per-test in conftest.
Gesture-overlay Escape — a viewer-push ref-add hijacked Escape from the focus-clear contract (firefox H1.12); reverted to main's behavior.

Follow-up chapters (NOT this PR)

Large-graph viewer: replace cose→fcose layout + persist selection across ego→full reload (raise graph cap) #967 — large-graph viewer: cose (O(n²), 134s @ 2861 nodes) → fcose + persist selection across the ego→full reload. Episode cap interim 25.
DGX-over-Tailscale client resilience: long-blocking HTTP hangs after server returns 200 #956 (remaining) — DGX Tier-1 async job submission.
Reprocess whisper_transcription episodes for corpus-wide SPOKEN_BY once new diarization lands #876 re-diarization batch — paused on free GPU.

🤖 Generated with Claude Code

…he (#947) + DGX whisper resilience Three related pieces for the #876 prod corpus re-diarization, plus a DGX investigation writeup. All unit tests pass; the only open item is an operational DGX-box CUDA issue (see #948), NOT a code defect. #946 — strict existing-only migration mode (reprocess_existing_only + GUID filter in scraping.py + extract_item_guid + migrate-diarization target). Also fixes a per-feed _build_config bug that dropped ADR-096/#814/#926 DGX routing fields (transcription_fallback_provider/diarization_provider/dgx_diarize_*). #947 — durable GUID-keyed raw-audio cache (utils/audio_cache.py) + cache-aware _download_or_reuse_media + pipeline_stage=download_only + download-audio target. Validated: 10 cached, reprocess = 10 cache HITs / 0 feed fetches. DGX whisper resilience (whisper_provider.py): duration-scaled timeout + single-flight lock + smart retry (conn-blip retries w/ backoff; timeout falls back without re-queuing). New dgx_* config knobs. DGX compute-type investigation (#948): fp16/bf16 error on GB10 build, int8 slow, default(->fp32) is the working path; nvidia-smi unreliable on GB10. Tests + test-ordering flake fix; docs/wip/REHEARSAL-876-findings-20260609.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Source-builds ctranslate2 4.8.0 against CUDA 12.8 with sm_120 PTX so the GB10 (compute_cap=12.1 / sm_121) driver can JIT-compile forward at first kernel launch. Replaces the upstream :latest-cuda tag's bundled CPU-only ctranslate2 4.5.0 (the upstream tag is misleading — the .so has no CUDA symbols, faster_whisper raises "not compiled with CUDA" when device=cuda). Validated: 5-min clip transcribes in 8.25s wall-time (36.4× realtime) vs ~103s on CPU (2.9× realtime). New: infra/dgx/speaches-gb10/{Dockerfile,README.md} - Two-stage Dockerfile (nvidia/cuda:12.8.0-devel builder → speaches runtime). Builds in ~3 min on DGX once apt/git cache populated. - README explains the incident timeline, build flags, validation commands, operational notes. deploy.py: pyinfra recipe now ships the Dockerfile, runs docker build locally (idempotent via layer cache), regenerates /etc/cdi/nvidia.yaml (nvidia-container-toolkit 1.19.1 needs CDI spec to inject GPU correctly in mode=auto), and references the local-built image tag. Drops the upstream `docker compose pull` step since we build locally now. verify.py: adds two assertions that would catch the silent CPU fallback that triggered #948: - /etc/cdi/nvidia.yaml present + nvidia-ctk lists nvidia.com/gpu=all - ctranslate2 inside faster-whisper sees the GB10 on cuda (assert get_cuda_device_count() >= 1) Idempotency tested: `make dgx-deploy` re-run is a no-op via Docker layer cache. `make dgx-verify` passes all 11 assertions. Why this lands on this branch: the #946 work (audio cache + duration- scaled timeouts + single-flight provider) was the consumer-side resilience that made the silent CPU fallback survivable. This is the producer-side fix that closes the loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The pyannote diarize client hung ~2h during a co-tenant vLLM crash-loop on the shared GB10: httpx's read/write timeout never fired because the 60MB multipart upload trickled and kept resetting the per-write timeout. Rather than patch the diarize client alone, extract a shared DGX resilience layer both it and the #946 Whisper client consume, so they can't drift. New `providers/tailnet_dgx/resilience.py`: - `run_with_watchdog()` — hard wall-clock deadline in a daemon thread. The actual hang-fix: guarantees fail-over even when httpx's own timeout doesn't fire. Orphaned worker is a daemon (holds one connection, never blocks exit). - `CircuitBreaker` — trimmed-down port of `rss/http_policy.CircuitBreaker` (rolling-window -> open(cooldown) -> half-open probe), with a `hard` immediate-trip for definitive timeouts so a wedged batch doesn't pay the full timeout on every episode. Named + logs open/close for ops visibility. - `effective_timeout_sec()` / `probe_audio_duration_sec()` — duration-scaled budget (soundfile-based; graceful None without the [ml] extra). - `TimeoutLike` — shared timeout-class tuple. Both providers now: duration-scaled timeout + single-flight + bounded retries (timeout -> fail-over without re-queue; connection blip -> backoff) + watchdog + per-endpoint breaker. Whisper gets the same treatment for parity (same GPU, same hang vector). Diarization-specific tighter timeout via new profile-only config `dgx_diarize_request_timeout_sec` (180) / `dgx_diarize_timeout_per_audio_minute_sec` (6) — pyannote is far faster than Whisper so the budget (and a half-open probe) stays cheap. Tests: `test_tailnet_dgx_resilience.py` (20: breaker FSM, watchdog real-thread, timeout math, probe-None) + diarize/whisper provider suites extended with breaker-open / watchdog-hang / timeout-no-requeue cases. Fixed a pre-existing 15s whisper test (unmocked backoff sleep) -> 0.7s. flake8/mypy/test-policy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…953) The GB10 Speaches/ctranslate2 path is being sorted out, so the DGX Whisper client is temporarily pointed at the openai-whisper service on :8002 (model id `large-v3`) instead of Speaches on :8000 (`Systran/faster-whisper-large-v3`). Validated end-to-end: a 91.6s clip transcribes in ~9.8s; health green. Revert tracked in #955 (restore :8000 / Systran model when Speaches is back). A dated comment block in the profile flags exactly which two lines to revert. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Complements the mocked-transport unit suites: runs the real httpx path over a loopback socket against a throwaway stub mimicking the faster-whisper + pyannote services. Proves the production failure end-to-end — a *hanging* socket (httpx's own timeout never fires because the upload trickles) is abandoned by the hard watchdog and fails over — plus happy-path round-trip, HTTP 503 fail-over, and the breaker tripping so the next call skips the socket. Self-contained (own http.server, not the shared e2e server) so it can't perturb the e2e suite. Component-level (provider vs local server) → integration tier. 7 tests, ~5s. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Closes the last gap: no e2e test drove the tailnet DGX providers at all (the existing whisper/diarization e2e tests use the local/cloud paths). Adds the DGX endpoints to the shared e2e mock server and a provider e2e suite that runs the real TailnetDgx{Whisper,Diarization}Provider over real httpx against it with a real audio payload. e2e_http_server.py (additive + one correctness fix): - GET /v1/models + /health — faster-whisper + pyannote health/model probe. - POST /v1/diarize — canned two-speaker pyannote result. - /v1/audio/transcriptions now honours the requested response_format: json / verbose_json get a proper JSON body (what the OpenAI SDK *and* the DGX client both request and parse) instead of text/plain; text/unspecified unchanged. Verified the openai/basic/capabilities transcription e2e suites still pass. - dgx_host_port() URL helper. test_tailnet_dgx_e2e.py: happy round-trip (transcribe verbose_json + diarize) plus the production failure modes via the server's set_error_behavior injection — a hanging socket (delay) is abandoned by the watchdog and a 5xx fails over, asserting the breaker trips. 6 tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…913) OpenAI `verbose_json` returns timestamped Whisper-format segments and the audio is downloaded locally (for the API upload), so the existing local pyannote second pass — `apply_diarization_to_result(result, media_for_transcription, ...)`, which is provider-agnostic — can diarize + align it, exactly like whisper / tailnet_dgx_whisper. Previously `diarize`/`screenplay` were silently coerced off for openai by design; this makes them work. - `_DIARIZATION_ELIGIBLE_TRANSCRIPTION_PROVIDERS` gains `openai` (Gemini/Mistral stay out — they emit plain text, no segments; deepgram self-diarizes). - **Opt-in, not default-on.** Since `diarize` defaults True, simply making openai eligible would flip diarization ON for *every* openai run (21 e2e/integration files + the production cloud profiles) — a broad, surprising change needing an HF token. New `_DIARIZATION_DEFAULT_ON_TRANSCRIPTION_PROVIDERS = {whisper, tailnet_dgx_whisper}`: eligible-but-not-default-on providers (openai) keep diarize OFF unless explicitly set. So existing openai behavior is unchanged; cloud_balanced / cloud_thin opt in with `diarize: true` (documented in both). - Coercion log messages now reference the eligible set dynamically (no drift). - Tests: new test_diarize_openai_eligibility.py (eligible + opt-in default-off + explicit-true respected + whisper still default-on + gemini/mistral coerced). Repointed 3 #562 screenplay-coercion tests from openai → gemini (openai no longer coerces). flake8/mypy/openai-integration(96) green; openai e2e unchanged. Distinct from the #876/#946/#954 work also on this branch — self-contained, cherry-pickable to its own PR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Unit layer of a graph-UI test push (follows the #914 coverage gate). Adds/extends Vitest tests for the lowest-covered graph modules and fixes a latent bug surfaced along the way: - NEW store tests: graphNavigation (25%→~100%), graphExpansion (40%→100%), graphExplorer (52%→100%). - EXTENDED: graphHandoff store (57%→100% stmt), cyGraphLabelTier (33%→100%), graphLensLabels (37%→100%), graphEpisodeSelection (58%→97%), graphEpisodeMetadata (65%→100% stmt). - +241 tests (847→1088); overall viewer coverage 77→82.5% stmt / 68→72.9% br / 76.5→85% fn / 79→84.3% ln — comfortably above the #914 gate. Bug fix (graphEpisodeMetadata.ts): `resolveEpisodeMetadataFromLoadedArtifacts` threw a TypeError (`(... ).trim()` on undefined) when an artifact had no `sourceCorpusRelPath` and the parallel `selectedRelPaths[i]` was missing. Both operands are already trimmed, so the outer `.trim()` was redundant; coalesce to '' and skip gracefully. Test updated to assert the graceful null. Logic-only layer (the repo tests UI behavior via Playwright e2e, not @vue/test-utils mounting). Component + e2e layers to follow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… tests Component layer of the graph-UI test push. Introduces the repo's first real @vue/test-utils mount harness (the prior "component test" only read .vue source as text) and mount-tests the cheap/high-value graph components; the cytoscape-heavy containers (GraphCanvas, GraphTabPanel, GraphNodeRailPanel) are left to the e2e layer. - New dev dep: @vue/test-utils. Pattern: happy-dom + setActivePinia + mount with data-testid queries + real stores; heavy children/cytoscape stubbed via global.stubs (see GraphDegreeChip.test.ts as the reference). - ~146 component tests across: chips (Degree/Edges/Types/Feed/Sources), GraphStatusLine (both variants), HandoffErrorStrip, GraphBottomBar, GraphConnectionsSection, NodeDetail, GraphFilterBar, GraphGestureOverlay. Bug fix (GraphGestureOverlay.vue): the overlay root <div> never bound `ref="overlayRootRef"`, yet the Escape-key handler reads `overlayRootRef.value` and bails (`if (!root) return`) — so Escape-to-dismiss silently never worked in production. Bound the ref; the new test asserts Escape now dismisses when focus is outside the dialog. Note: mounting components pulls the .vue files (and transitive imports) into the v8 coverage denominator (5229→7004 statements), so overall % shifts to a more honest, lower baseline that exposes undertested components — still above the #914 gate (stmt 77.9/br 68.3/fn 82.1/ln 79.5 vs 75/65/73/76). 1234 tests green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

First slice of the app-wide test push (after the graph-UI push). The api/ layer was the biggest gap; all eight wrappers are now fully covered: - exploreApi 0→100%, artifactsApi 10→100%, corpusMetricsApi 32→100%, operatorConfigApi 40→100%, searchApi 58→100%, corpusLibraryApi 71→100%, cilApi 73→100%, relationalApi 82→100%. - +~123 tests (1234→1357). Each covers request URL/method/body building, query-param trimming/clamping/omission, response parsing/normalization, pagination/cursor, and every error branch (non-ok + text, HTTP-status fallback, network throw, malformed JSON). Overall viewer coverage: stmt 77.9→79.6 / br 68.3→69.7 / fn 82.1→83.7 / ln 79.5→81.1 — above the #914 gate. No bugs found; no source changed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Second app-wide slice. Lifts the lowest-covered Pinia stores: - corpusLens 44→100%, explore 60→100%, subject 76→100% (stmt + functions), artifacts 33→95% stmt / 81% br / 98% fn (the large central store — load paths, display-artifact build, selection, sibling-merge, topic-cluster overlay; api modules mocked, local-file paths via constructed File objects). - +~143 tests (1357→1500). Overall viewer coverage: stmt 79.6→83.7 / br 69.7→72.5 / fn 83.7→87.7 / ln 81.1→85.2. No bugs found; no source changed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Third app-wide slice. Lifts the lowest-covered util/helper modules: - formatDuration 47→100%, corpusFeedRowDisplay 73→100%, readApiErrorMessage 75→100%, feedRunLinking 77→100%, cyCoseLayoutOptions 76→100%, pipelineJobLogSummary 78→98%, humanizeJsonDocument 79→99%. - +~175 tests (1500→1675). Pure-function coverage: every branch + edge case (zero/negative/huge/fractional/non-finite, null/undefined/empty, malformed JSON, circular refs, all formatting/parse branches). Overall viewer coverage: stmt 83.7→84.9 / br 72.5→74.3 / fn 87.7→89.0 / ln 85.2→86.5. No bugs found; no source changed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The graph-UI + app-wide test push lifted viewer coverage well past the original #914 floor (75/65/73/76). Ratchet the gate to a floor a few points below the new baseline (stmt 84.9 / br 74.3 / fn 89.0 / ln 86.5) so the gains can't silently regress: statements 75→82 · branches 65→71 · functions 73→86 · lines 76→84 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Fourth app-wide slice: mount-tests the cheap/high-value non-graph .vue components via the @vue/test-utils harness (chips, filter bars, feature panels, shared widgets). Deferred: the dashboard chart-lib wrappers (low-value mount tests; their data transforms live in tested utils) and the big container views (DashboardView/DigestView/LibraryView/SearchPanel — already e2e-covered). - +266 tests across 22 components: explore (More/Text chips + FilterBar), library (Clustered/Feed chips + FilterBar), search (DocTypes/More/TopK chips + FilterBar + ResultCard + SemanticSearchTip), shared (DateChip, CollapsibleSection, DiagnosticRow, CilTopicPillsRow, PodcastCover 0→covered, HelpTip 4→covered), episode (BridgePartition, DetailPanel), subject (TopicEntityView). +252 net (1675→1927). - api modules mocked, charts/cytoscape/heavy children stubbed, real stores. Mounting these pulls more .vue + transitive imports into the v8 denominator (7004→8088 statements), so overall % holds while honestly covering far more of the component tree — still above the ratcheted #914 gate (stmt 84.6/br 74.2/ fn 89.2/ln 86.2 vs 82/71/86/84). 1927 tests green, typecheck clean. No bugs found. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Final component slice. Mount-tests the central shell orchestrators + the remaining shared dialogs/widgets: - shell: StatusBar (corpus-path input, health/index dots, sources+config+feeds dialogs, version-warning), LeftPanel (search/explore surface switch, focusQuery), SubjectRail (kind-routing to episode/graph-node/topic/person panels + event re-emission + main-tab neighbourhood wiring). - shared: TranscriptViewerDialog (4→covered: load/highlight/error/audio/segments, dismiss paths), TopicTimelineDialog (open/sort/states/dismiss), HoverRichTip (hover/focus show-hide timers, Esc, teleport teardown). - +135 tests across 6 components. api/cil/feeds modules mocked, heavy children stubbed, real stores. No bugs found; no source changed. Gate green at the ratcheted floor. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The earlier ratchet (82/71/86/84) was set against the logic-layer baseline, before the component mount-test waves. Mounting container components pulls their whole transitive .vue import tree into v8's (no-`all`) denominator, so the headline % settled lower at the full-tree scope (stmt 81.1 / br 69.4 / fn 80.5 / ln 82.8) even though absolute coverage grew. Set the floor a few points below that honest baseline so CI's viewer-unit coverage gate passes and still guards against regression: statements 82→78 · branches 71→66 · functions 86→77 · lines 84→80 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The viewer's test tiers (Vitest unit/component + #914 coverage gate, mocked Playwright e2e, Tier-3 real-corpus validation walk) and corpora were scattered across ADR-095 / RFC-086 / e2e/validation/README — hard to answer "what graph tests exist and how do I run them?" quickly. - New web/gi-kg-viewer/TESTING.md: operator quick-ref — every tier, command, graph-subset filters, corpora (synthetic validation vs BYOC), the coverage gate, and the RFC-086 matrix rule. Links the existing E2E_SURFACE_MAP + validation README (no duplication). - TESTING_STRATEGY.md (Browser UI E2E) + E2E_TESTING_GUIDE.md: add the missing Vitest coverage gate + Tier-3 validation walk and point to TESTING.md. - viewer README: add a Testing section pointing to TESTING.md. make docs passes (strict). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The committed synthetic validation corpus ships only the pre-built top-level API JSONs and is missing the raw feeds/*/metadata/*.gi.json artifacts that the live serve-api computes episodes from. The real-corpus walk against it returns an empty Library and fails ~30 handoff specs. Documented as a known gap (with the regenerate-and-commit fix) so operators run the walk against a BYOC/prod corpus until the fixture is regenerated. Tracked separately. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The synthetic-corpus real-backend walk failed 30/34 specs with an empty Library. Root cause: the Makefile + TESTING.md pointed CORPUS_PATH at the version-LESS parent `tests/fixtures/viewer-validation-corpus`, but the raw `feeds/<feed>/metadata/*.{metadata,gi,kg,bridge}.json` artifacts that serve-api computes episodes from live under the FIXTURES_VERSION subdir (`.../v2`). `discover_metadata_files()` returns 0 at the parent and 23 at `.../v2`, so the Library rendered no episodes and every handoff spec failed on the first row-click. The corpus itself was always valid — nothing was missing. - Makefile: new `VIEWER_VALIDATION_CORPUS` var derived from FIXTURES_VERSION, wired into ci-ui-validation / serve-for-validation / build-validation-index so the documented path always includes the version dir. - TESTING.md: corrected the run command (derives the version), the corpus-table row (it's two layers — raw feeds/ + pre-built corpus/*.json), and replaced the earlier incorrect "missing artifacts" note with the version-dir caveat. Verified: `make ci-ui-validation CORPUS=.../v2` → 30 passed / 1 skipped (was 30 failed). After `make build-validation-index`, the two index-dependent specs (P1.3 digest topic-band, P4.2 digest band) also pass → 32/33. The remaining V4 (dashboard topic-cluster chip) fails identically on the real prod-v2 corpus too — a pre-existing handoff gap, tracked separately, not a corpus issue. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ema-version self-heal Two coupled fixes so the LanceDB two-tier hybrid path behaves like FAISS across every surface that applies a date bound. **publish_date parity (the digest topic-band regression).** The shared `_hit_passes_cli_filters` drops any hit lacking `publish_date` when `since` is set. FAISS rows carry it; lance rows did not — so every hybrid hit was dropped whenever a `since` bound was passed. The digest topic-band search ALWAYS passes one (window=all → since=1970-01-01), so a corpus served via lance returned 0 topic-bands where FAISS returned them (Tier-3 P1.3/P4.2). Masked in prod only because prod still ships the legacy `lance_native/` dir → silent FAISS fallback. Fix: store `publish_date` on the segment/insight/aux docs (schema + dataclass), populate it in both index paths (native `index-two-tier` + the FAISS→lance migration), and surface it from the row payload into hit metadata. **Schema-version self-heal.** Adding a column makes pre-existing lance indexes incompatible, and there was no version on the index to detect that. Add `LANCE_SCHEMA_VERSION` (now 2) stamped into `index_meta.json`, plus `stored_schema_version()` / `lance_index_is_stale()`. Wire the staleness check so old indexes self-heal: - read path (`hybrid_candidates`) skips a stale index → FAISS fallback (never serves results from an incompatible schema); - (re)index moments rebuild rather than upsert into incompatible tables: `build_two_tier_index` + `migrate_faiss_to_lance` wipe-if-stale, and migration 0002 rebuilds a stale index instead of no-op. Staleness requires positive evidence (meta present, version < code); a missing meta is treated as not-stale so the read path's own try/except governs. Tests: publish_date carried through hybrid hits; stale index detected + read falls back to FAISS; 0002 rebuilds a schema-stale index. Full search + upgrade suites green (240 passed); mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…index The Tier-3 validation harness was FAISS-only; reflect the current search layer so the walk exercises the real hybrid (BM25 + dense + RRF) path and matches prod's two-tier layout. - Makefile build-validation-index: add `cli index-two-tier` (lance_index) between the FAISS build and topic_clusters. Now builds all three, documented inline with what each unlocks (V3 semantic search; V2/V4 topic clusters; hybrid serving). - TESTING.md: document the one-time `make build-validation-index` step, the three search artifacts + what each unlocks, and that the "1 skip" is V3 when no vector index is present. Corpus table now notes the raw `feeds/` + pre-built layers. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…r-3 V3) Second FAISS-parity gap on the hybrid path, same shape as the publish_date one. The viewer's "Show on graph" affordance (ResultCard) is gated on `graphNodeIdFromSearchHit`, which reads `metadata.source_id` — the canonical graph node id (`topic:…` / `entity:…` / `insight:…` / `quote:…`) — for the focusable tiers (insight / quote / kg_topic / kg_entity). FAISS rows carry it; lance rows did not, so a search served via LanceDB rendered no graph handoff and Tier-3 V3 ("Search → Show on graph") timed out waiting for the button. Fix: store `source_id` on the insight + aux docs (schema + dataclass), populate it in both index paths (native `index-two-tier` + the FAISS→lance migration), and surface it from the row payload into hit metadata. Folded into LANCE_SCHEMA_VERSION 2 (the unshipped parity bump) alongside publish_date. Verified: `source_id` flows through hybrid hits (topic:technology, entity:…, insight:…); Tier-3 V3 passes (handoff status "applied"). Search + upgrade suites green (238 passed); parity regression test asserts both publish_date + source_id. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… FAISS Test-evolution for "less FAISS, more LanceDB": the Tier-3 walk exercised the search path but never asserted WHICH backend served it, so a stale/broken lance_index that silently degraded to FAISS would pass unnoticed. V6 hits /api/search directly and proves the LanceDB two-tier hybrid (BM25 + dense + RRF) is live: - RRF score signature: max(score) < 0.1. FAISS returns a cosine similarity (top hit ≈ 1.0); LanceDB returns fused RRF scores 1/(60+rank). This is the definitive discriminator. - two-tier provenance: every hit carries source_tier ∈ {insight, segment, aux}. - hybrid-only response fields present: lift_stats, query_type. Verified both ways: passes on the lance index (maxScore 0.031); FAILS when lance_index is removed (FAISS fallback → maxScore 1.0), so it's a real provenance guard, not a no-op. Full Tier-3 walk now 34/34 green (V1–V6 + handoff matrix). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… provenance Fold this session's lance/search learnings into their existing owner docs (no new files): - RFC-090 §10 (new): the metadata-parity contract (any consumer field must be on both backends — publish_date for date filters, source_id for graph handoff), LANCE_SCHEMA_VERSION self-heal (stale → read skips to FAISS, reindex rebuilds), and the V6 provenance guard. - SEMANTIC_SEARCH_GUIDE: operator subsections "Telling which backend served a query" (RRF score < 0.1 vs FAISS cosine ≈ 1.0; source_tier/lift_stats/query_type) and "Metadata parity + schema versioning", incl. the legacy lance_native/ → FAISS-fallback gotcha and the reindex fix. - e2e/validation/README: add the V6 scenario row. make docs (mkdocs strict) passes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The default corpus-graph auto-load cap bounds usable graph size because the `cose` layout is ~O(n²) (stress test: 100 episodes ≈ 2861 nodes ≈ 134s layout). Bump the interim ceiling to 25 (~930 nodes, ~6-8s) and document the tradeoff + the stress-test numbers inline. Raising it meaningfully is gated on the large-graph layout (cose→fcose) + selection-persistence work tracked in #967. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…close (#956) #956's core symptom (a DGX request hanging after the server returned 200) is already handled by the #954 `run_with_watchdog` hard wall-clock deadline. This adds the two transport-level defences from #956 that actually FIT a long-blocking upload, via a shared `dgx_http_client()` factory both providers now use: - **TCP keepalive** (`keepalive_socket_options`): SO_KEEPALIVE + a ~30s/15s/4-probe schedule so a socket whose underlying path died mid-request (e.g. a Tailscale path switch) is reaped in ~90s instead of the OS default (2h on macOS) — turning an indefinite hang into a prompt connection error the provider fails over on. Built defensively (Linux TCP_KEEPIDLE vs macOS TCP_KEEPALIVE, each hasattr-guarded). - **Connection: close** so the server tears the socket down after the response. Deliberately NOT added (don't fit long-blocking DGX calls): - per-read timeout — these POSTs stream zero bytes during the multi-minute GPU run, so any read deadline shorter than processing false-aborts a healthy call; the duration-scaled watchdog is the correct backstop. - urllib3 Retry adapter — we're on httpx, and retries are already handled by the provider loops + circuit breaker. - Connection-reuse concerns — moot; each request uses a fresh client. Both providers migrated to `dgx_http_client`; whisper's now-redundant httpx import-guard removed. #956 Tier-1 (async job submission) is server-side, deferred to the DGX chapter. 38 DGX unit/integration/e2e tests green; mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

) Follow-up to the cap bump: graphEpisodeSelection.test.ts pinned the constant at 15. Update to 25 (the interim ceiling pending the large-graph layout work #967). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The Tier-3 validation walk writes generated Playwright artifacts (incl. error-context.md) to web/gi-kg-viewer/validation-results/ — the same class as test-results/ and playwright-report/, which lint-markdown already ignores. It was missing from the ignore globs, so a left-over artifact (e.g. from a deliberately failing run) failed `make lint-markdown`/`make ci`. Already gitignored; align the markdownlint ignores to match. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ilure interrogate (the `make docstrings` 100% gate) flagged these two methods from the #954 resilience layer — they predate the first full `make ci` on this branch (targeted test runs don't run interrogate). Document the state transitions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Removing whisper_provider's local httpx import-guard (now that the client is built in dgx_http_client) lost the friendly "httpx required" RuntimeError that test_transcribe_dgx_requires_httpx asserts — the raw ImportError surfaced instead. Move the guard into the shared factory so both providers get the actionable error. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Branch state, the immediate push+PR step (awaiting operator go), what's on the branch, and the future chapters (#967 large-graph, #956 Tier-1, #876 batch) so a fresh session can pick up cleanly. docs/wip is excluded from mkdocs + markdownlint. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…obal pollution) The #947 GUID-keyed audio cache defaults to a repo-relative global dir (`.cache/audio`, enabled by default). With no test isolation, tests both pollute that shared cache and read each other's downloads — and a stale GUID hit silently masks failure-injection tests: `test_chaos_run_index_records_failed_episode` 404s e03's audio expecting the episode to fail, but a previously-cached e03 (from a sibling test / prior run) produced a cache HIT, so e03 transcribed and was never recorded as failed (`episodes_failed == 0`). Passed on main (no cache existed), broke here — caught by the first full `make ci` on the branch. Add an autouse conftest fixture redirecting `DEFAULT_AUDIO_CACHE_DIR` to a per-test `tmp_path/audio` (basename kept `audio` so `test_default_dir` holds). Tests passing an explicit `audio_cache_dir` are unaffected. Verified: full e2e error-handling class + both #947 cache suites green (27 passed), and tests no longer write to the global `.cache/audio`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…lears-focus) The viewer test push added `ref="overlayRootRef"` to activate the overlay's capture-phase Escape listener (Escape-to-dismiss). That was untested-in-e2e and conflicts with the core "Escape clears focus" contract: on cold-start the overlay is up, so its Escape handler intercepts the key (preventDefault + dismiss) before `graphHandoff.focusCleared()` runs — H1.12 (Escape bumps the handoff generation) failed on firefox (post-handoff activeElement lets the overlay claim the Escape). Revert the one-line ref add so GraphGestureOverlay.vue matches main exactly (overlay still dismissed via its button / backdrop click), and drop the unit test that asserted the reverted Escape-dismiss path. H1.12 passes on firefox again; the overlay's other 14 unit tests stay green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ction) CodeQL flagged the os.path.isfile / open sinks in the new stored_schema_version helper: meta_path derived from a corpus path (user-provided via the API) without the sanitiser chain the rest of this file uses. Confine the CONSTANT "index_meta.json" subpath under the resolved root via safe_relpath_under_corpus_root → normpath_if_under_root (same Type-1 pattern as read_index_meta), so CodeQL sees the path as sanitised. Behaviour unchanged (v2→not-stale, v1→stale, absent→None); 18 search/upgrade tests + mypy green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ersion (#969) CodeQL #360/#361/#362 — the os.path.isfile / open sinks in stored_schema_version. Identical Type-1 cross-function pattern to read_index_meta (#338/#342): meta_path sanitised via safe_resolve_directory -> safe_relpath_under_corpus_root (constant index_meta.json) -> normpath_if_under_root, corpus root route-confined by resolve_corpus_path_param. CodeQL can't model helper-based sanitisers. Dismissed via gh api per the registry policy; code already uses the prescribed chain. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ift FP, #969) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

codecov · 2026-06-11T17:30:12Z

Codecov Report

❌ Patch coverage is 84.83685% with 79 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/podcast_scraper/utils/audio_cache.py	78.16%	15 Missing and 4 partials ⚠️
src/podcast_scraper/workflow/episode_processor.py	61.76%	11 Missing and 2 partials ⚠️
...odcast_scraper/providers/tailnet_dgx/resilience.py	90.26%	5 Missing and 6 partials ⚠️
...podcast_scraper/search/backends/lancedb_backend.py	68.75%	6 Missing and 4 partials ⚠️
src/podcast_scraper/search/hybrid_search.py	12.50%	4 Missing and 3 partials ⚠️
src/podcast_scraper/config.py	92.45%	1 Missing and 3 partials ⚠️
src/podcast_scraper/rss/parser.py	73.33%	3 Missing and 1 partial ⚠️
...aper/providers/tailnet_dgx/diarization_provider.py	92.50%	3 Missing ⚠️
src/podcast_scraper/search/migration.py	25.00%	2 Missing and 1 partial ⚠️
src/podcast_scraper/search/two_tier_indexer.py	25.00%	2 Missing and 1 partial ⚠️
... and 1 more

📢 Thoughts on this report? Let us know!

chipi and others added 30 commits June 11, 2026 14:32

chipi and others added 3 commits June 11, 2026 16:28

github-advanced-security AI found potential problems Jun 11, 2026

View reviewed changes

Comment thread src/podcast_scraper/search/backends/lancedb_backend.py Fixed

Comment thread src/podcast_scraper/search/backends/lancedb_backend.py Dismissed

Comment thread src/podcast_scraper/search/backends/lancedb_backend.py Dismissed

github-advanced-security AI found potential problems Jun 11, 2026

View reviewed changes

Comment thread src/podcast_scraper/search/backends/lancedb_backend.py Dismissed

chipi and others added 2 commits June 11, 2026 18:32

docs(ci): log dismissal of CodeQL #363 (stored_schema_version line-sh…

21b7b3b

…ift FP, #969) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chipi merged commit 217e9dd into main Jun 11, 2026
29 checks passed

chipi deleted the feat/946-existing-only-rediarization branch June 11, 2026 17:31

chipi mentioned this pull request Jun 12, 2026

Build patched ctranslate2 from source with sm_121 gencode (unblocks bf16 on GB10) #973

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DGX re-diarization + resilience, LanceDB search parity, viewer Tier-3 validation + test push#969

DGX re-diarization + resilience, LanceDB search parity, viewer Tier-3 validation + test push#969
chipi merged 36 commits into
mainfrom
feat/946-existing-only-rediarization

chipi commented Jun 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chipi commented Jun 11, 2026