Skip to content

DGX re-diarization + resilience, LanceDB search parity, viewer Tier-3 validation + test push#969

Merged
chipi merged 36 commits into
mainfrom
feat/946-existing-only-rediarization
Jun 11, 2026
Merged

DGX re-diarization + resilience, LanceDB search parity, viewer Tier-3 validation + test push#969
chipi merged 36 commits into
mainfrom
feat/946-existing-only-rediarization

Conversation

@chipi

@chipi chipi commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Bundles several related workstreams. Full make ci is green locally (fast gates → full pytest 4444 + integration + e2e → test-ui + test-ui-e2e → build-viewer → coverage-enforce → docs → build → Docker stack-test).

DGX re-diarization + resilience

LanceDB search parity + self-heal

  • publish_date + source_id FAISS-parity on the two-tier hybrid index — fixes digest topic-bands returning 0 under lance (the shared since filter dropped hits lacking publish_date) and the viewer "Show on graph" affordance (reads metadata.source_id).
  • LANCE_SCHEMA_VERSION self-heal: stamped in index_meta.json; stale index → read path skips to FAISS, (re)index moments rebuild instead of upserting into incompatible tables.
  • Docs: RFC-090 §10 + SEMANTIC_SEARCH_GUIDE operator diagnostics (RRF vs cosine, source_tier/lift_stats/query_type).

Viewer Tier-3 validation + test push (#914)

  • Tier-3 walk fixed (CORPUS must be the FIXTURES_VERSION dir); build-validation-index builds FAISS + lance two-tier + topic_clusters; new V6 spec asserts lance (not FAISS) is serving. Tier-3 walk 34/34.
  • ~+1200 viewer unit/component tests; coverage gate enforced.

Notable regressions caught + fixed by the first full make ci

Follow-up chapters (NOT this PR)

🤖 Generated with Claude Code

chipi and others added 30 commits June 11, 2026 14:32
…he (#947) + DGX whisper resilience

Three related pieces for the #876 prod corpus re-diarization, plus a DGX
investigation writeup. All unit tests pass; the only open item is an
operational DGX-box CUDA issue (see #948), NOT a code defect.

#946 — strict existing-only migration mode (reprocess_existing_only + GUID
filter in scraping.py + extract_item_guid + migrate-diarization target). Also
fixes a per-feed _build_config bug that dropped ADR-096/#814/#926 DGX routing
fields (transcription_fallback_provider/diarization_provider/dgx_diarize_*).

#947 — durable GUID-keyed raw-audio cache (utils/audio_cache.py) + cache-aware
_download_or_reuse_media + pipeline_stage=download_only + download-audio target.
Validated: 10 cached, reprocess = 10 cache HITs / 0 feed fetches.

DGX whisper resilience (whisper_provider.py): duration-scaled timeout +
single-flight lock + smart retry (conn-blip retries w/ backoff; timeout falls
back without re-queuing). New dgx_* config knobs.

DGX compute-type investigation (#948): fp16/bf16 error on GB10 build, int8 slow,
default(->fp32) is the working path; nvidia-smi unreliable on GB10.

Tests + test-ordering flake fix; docs/wip/REHEARSAL-876-findings-20260609.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Source-builds ctranslate2 4.8.0 against CUDA 12.8 with sm_120 PTX so
the GB10 (compute_cap=12.1 / sm_121) driver can JIT-compile forward at
first kernel launch. Replaces the upstream :latest-cuda tag's bundled
CPU-only ctranslate2 4.5.0 (the upstream tag is misleading — the .so
has no CUDA symbols, faster_whisper raises "not compiled with CUDA"
when device=cuda).

Validated: 5-min clip transcribes in 8.25s wall-time (36.4× realtime)
vs ~103s on CPU (2.9× realtime).

New: infra/dgx/speaches-gb10/{Dockerfile,README.md}
  - Two-stage Dockerfile (nvidia/cuda:12.8.0-devel builder → speaches
    runtime). Builds in ~3 min on DGX once apt/git cache populated.
  - README explains the incident timeline, build flags, validation
    commands, operational notes.

deploy.py: pyinfra recipe now ships the Dockerfile, runs docker build
locally (idempotent via layer cache), regenerates /etc/cdi/nvidia.yaml
(nvidia-container-toolkit 1.19.1 needs CDI spec to inject GPU correctly
in mode=auto), and references the local-built image tag. Drops the
upstream `docker compose pull` step since we build locally now.

verify.py: adds two assertions that would catch the silent CPU
fallback that triggered #948:
  - /etc/cdi/nvidia.yaml present + nvidia-ctk lists nvidia.com/gpu=all
  - ctranslate2 inside faster-whisper sees the GB10 on cuda
    (assert get_cuda_device_count() >= 1)

Idempotency tested: `make dgx-deploy` re-run is a no-op via Docker
layer cache. `make dgx-verify` passes all 11 assertions.

Why this lands on this branch: the #946 work (audio cache + duration-
scaled timeouts + single-flight provider) was the consumer-side
resilience that made the silent CPU fallback survivable. This is the
producer-side fix that closes the loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pyannote diarize client hung ~2h during a co-tenant vLLM crash-loop on the
shared GB10: httpx's read/write timeout never fired because the 60MB multipart
upload trickled and kept resetting the per-write timeout. Rather than patch the
diarize client alone, extract a shared DGX resilience layer both it and the
#946 Whisper client consume, so they can't drift.

New `providers/tailnet_dgx/resilience.py`:
- `run_with_watchdog()` — hard wall-clock deadline in a daemon thread. The
  actual hang-fix: guarantees fail-over even when httpx's own timeout doesn't
  fire. Orphaned worker is a daemon (holds one connection, never blocks exit).
- `CircuitBreaker` — trimmed-down port of `rss/http_policy.CircuitBreaker`
  (rolling-window -> open(cooldown) -> half-open probe), with a `hard`
  immediate-trip for definitive timeouts so a wedged batch doesn't pay the full
  timeout on every episode. Named + logs open/close for ops visibility.
- `effective_timeout_sec()` / `probe_audio_duration_sec()` — duration-scaled
  budget (soundfile-based; graceful None without the [ml] extra).
- `TimeoutLike` — shared timeout-class tuple.

Both providers now: duration-scaled timeout + single-flight + bounded retries
(timeout -> fail-over without re-queue; connection blip -> backoff) + watchdog
+ per-endpoint breaker. Whisper gets the same treatment for parity (same GPU,
same hang vector).

Diarization-specific tighter timeout via new profile-only config
`dgx_diarize_request_timeout_sec` (180) / `dgx_diarize_timeout_per_audio_minute_sec`
(6) — pyannote is far faster than Whisper so the budget (and a half-open probe)
stays cheap.

Tests: `test_tailnet_dgx_resilience.py` (20: breaker FSM, watchdog real-thread,
timeout math, probe-None) + diarize/whisper provider suites extended with
breaker-open / watchdog-hang / timeout-no-requeue cases. Fixed a pre-existing
15s whisper test (unmocked backoff sleep) -> 0.7s. flake8/mypy/test-policy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…953)

The GB10 Speaches/ctranslate2 path is being sorted out, so the DGX Whisper
client is temporarily pointed at the openai-whisper service on :8002 (model id
`large-v3`) instead of Speaches on :8000 (`Systran/faster-whisper-large-v3`).
Validated end-to-end: a 91.6s clip transcribes in ~9.8s; health green.

Revert tracked in #955 (restore :8000 / Systran model when Speaches is back).
A dated comment block in the profile flags exactly which two lines to revert.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Complements the mocked-transport unit suites: runs the real httpx path over a
loopback socket against a throwaway stub mimicking the faster-whisper + pyannote
services. Proves the production failure end-to-end — a *hanging* socket (httpx's
own timeout never fires because the upload trickles) is abandoned by the hard
watchdog and fails over — plus happy-path round-trip, HTTP 503 fail-over, and
the breaker tripping so the next call skips the socket. Self-contained (own
http.server, not the shared e2e server) so it can't perturb the e2e suite.
Component-level (provider vs local server) → integration tier. 7 tests, ~5s.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Closes the last gap: no e2e test drove the tailnet DGX providers at all (the
existing whisper/diarization e2e tests use the local/cloud paths). Adds the DGX
endpoints to the shared e2e mock server and a provider e2e suite that runs the
real TailnetDgx{Whisper,Diarization}Provider over real httpx against it with a
real audio payload.

e2e_http_server.py (additive + one correctness fix):
- GET /v1/models + /health — faster-whisper + pyannote health/model probe.
- POST /v1/diarize — canned two-speaker pyannote result.
- /v1/audio/transcriptions now honours the requested response_format: json /
  verbose_json get a proper JSON body (what the OpenAI SDK *and* the DGX client
  both request and parse) instead of text/plain; text/unspecified unchanged.
  Verified the openai/basic/capabilities transcription e2e suites still pass.
- dgx_host_port() URL helper.

test_tailnet_dgx_e2e.py: happy round-trip (transcribe verbose_json + diarize)
plus the production failure modes via the server's set_error_behavior injection
— a hanging socket (delay) is abandoned by the watchdog and a 5xx fails over,
asserting the breaker trips. 6 tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…913)

OpenAI `verbose_json` returns timestamped Whisper-format segments and the audio
is downloaded locally (for the API upload), so the existing local pyannote second
pass — `apply_diarization_to_result(result, media_for_transcription, ...)`, which
is provider-agnostic — can diarize + align it, exactly like whisper /
tailnet_dgx_whisper. Previously `diarize`/`screenplay` were silently coerced off
for openai by design; this makes them work.

- `_DIARIZATION_ELIGIBLE_TRANSCRIPTION_PROVIDERS` gains `openai` (Gemini/Mistral
  stay out — they emit plain text, no segments; deepgram self-diarizes).
- **Opt-in, not default-on.** Since `diarize` defaults True, simply making openai
  eligible would flip diarization ON for *every* openai run (21 e2e/integration
  files + the production cloud profiles) — a broad, surprising change needing an
  HF token. New `_DIARIZATION_DEFAULT_ON_TRANSCRIPTION_PROVIDERS = {whisper,
  tailnet_dgx_whisper}`: eligible-but-not-default-on providers (openai) keep
  diarize OFF unless explicitly set. So existing openai behavior is unchanged;
  cloud_balanced / cloud_thin opt in with `diarize: true` (documented in both).
- Coercion log messages now reference the eligible set dynamically (no drift).
- Tests: new test_diarize_openai_eligibility.py (eligible + opt-in default-off +
  explicit-true respected + whisper still default-on + gemini/mistral coerced).
  Repointed 3 #562 screenplay-coercion tests from openai → gemini (openai no
  longer coerces). flake8/mypy/openai-integration(96) green; openai e2e unchanged.

Distinct from the #876/#946/#954 work also on this branch — self-contained,
cherry-pickable to its own PR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Unit layer of a graph-UI test push (follows the #914 coverage gate). Adds/extends
Vitest tests for the lowest-covered graph modules and fixes a latent bug surfaced
along the way:

- NEW store tests: graphNavigation (25%→~100%), graphExpansion (40%→100%),
  graphExplorer (52%→100%).
- EXTENDED: graphHandoff store (57%→100% stmt), cyGraphLabelTier (33%→100%),
  graphLensLabels (37%→100%), graphEpisodeSelection (58%→97%),
  graphEpisodeMetadata (65%→100% stmt).
- +241 tests (847→1088); overall viewer coverage 77→82.5% stmt / 68→72.9% br /
  76.5→85% fn / 79→84.3% ln — comfortably above the #914 gate.

Bug fix (graphEpisodeMetadata.ts): `resolveEpisodeMetadataFromLoadedArtifacts`
threw a TypeError (`(... ).trim()` on undefined) when an artifact had no
`sourceCorpusRelPath` and the parallel `selectedRelPaths[i]` was missing. Both
operands are already trimmed, so the outer `.trim()` was redundant; coalesce to
'' and skip gracefully. Test updated to assert the graceful null.

Logic-only layer (the repo tests UI behavior via Playwright e2e, not @vue/test-utils
mounting). Component + e2e layers to follow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… tests

Component layer of the graph-UI test push. Introduces the repo's first real
@vue/test-utils mount harness (the prior "component test" only read .vue source
as text) and mount-tests the cheap/high-value graph components; the
cytoscape-heavy containers (GraphCanvas, GraphTabPanel, GraphNodeRailPanel) are
left to the e2e layer.

- New dev dep: @vue/test-utils. Pattern: happy-dom + setActivePinia + mount with
  data-testid queries + real stores; heavy children/cytoscape stubbed via
  global.stubs (see GraphDegreeChip.test.ts as the reference).
- ~146 component tests across: chips (Degree/Edges/Types/Feed/Sources),
  GraphStatusLine (both variants), HandoffErrorStrip, GraphBottomBar,
  GraphConnectionsSection, NodeDetail, GraphFilterBar, GraphGestureOverlay.

Bug fix (GraphGestureOverlay.vue): the overlay root <div> never bound
`ref="overlayRootRef"`, yet the Escape-key handler reads
`overlayRootRef.value` and bails (`if (!root) return`) — so Escape-to-dismiss
silently never worked in production. Bound the ref; the new test asserts Escape
now dismisses when focus is outside the dialog.

Note: mounting components pulls the .vue files (and transitive imports) into the
v8 coverage denominator (5229→7004 statements), so overall % shifts to a more
honest, lower baseline that exposes undertested components — still above the
#914 gate (stmt 77.9/br 68.3/fn 82.1/ln 79.5 vs 75/65/73/76). 1234 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
First slice of the app-wide test push (after the graph-UI push). The api/ layer
was the biggest gap; all eight wrappers are now fully covered:

- exploreApi 0→100%, artifactsApi 10→100%, corpusMetricsApi 32→100%,
  operatorConfigApi 40→100%, searchApi 58→100%, corpusLibraryApi 71→100%,
  cilApi 73→100%, relationalApi 82→100%.
- +~123 tests (1234→1357). Each covers request URL/method/body building,
  query-param trimming/clamping/omission, response parsing/normalization,
  pagination/cursor, and every error branch (non-ok + text, HTTP-status
  fallback, network throw, malformed JSON).

Overall viewer coverage: stmt 77.9→79.6 / br 68.3→69.7 / fn 82.1→83.7 /
ln 79.5→81.1 — above the #914 gate. No bugs found; no source changed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Second app-wide slice. Lifts the lowest-covered Pinia stores:

- corpusLens 44→100%, explore 60→100%, subject 76→100% (stmt + functions),
  artifacts 33→95% stmt / 81% br / 98% fn (the large central store — load paths,
  display-artifact build, selection, sibling-merge, topic-cluster overlay; api
  modules mocked, local-file paths via constructed File objects).
- +~143 tests (1357→1500).

Overall viewer coverage: stmt 79.6→83.7 / br 69.7→72.5 / fn 83.7→87.7 /
ln 81.1→85.2. No bugs found; no source changed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Third app-wide slice. Lifts the lowest-covered util/helper modules:

- formatDuration 47→100%, corpusFeedRowDisplay 73→100%, readApiErrorMessage
  75→100%, feedRunLinking 77→100%, cyCoseLayoutOptions 76→100%,
  pipelineJobLogSummary 78→98%, humanizeJsonDocument 79→99%.
- +~175 tests (1500→1675). Pure-function coverage: every branch + edge case
  (zero/negative/huge/fractional/non-finite, null/undefined/empty, malformed
  JSON, circular refs, all formatting/parse branches).

Overall viewer coverage: stmt 83.7→84.9 / br 72.5→74.3 / fn 87.7→89.0 /
ln 85.2→86.5. No bugs found; no source changed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The graph-UI + app-wide test push lifted viewer coverage well past the original
#914 floor (75/65/73/76). Ratchet the gate to a floor a few points below the new
baseline (stmt 84.9 / br 74.3 / fn 89.0 / ln 86.5) so the gains can't silently
regress:

  statements 75→82 · branches 65→71 · functions 73→86 · lines 76→84

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fourth app-wide slice: mount-tests the cheap/high-value non-graph .vue
components via the @vue/test-utils harness (chips, filter bars, feature panels,
shared widgets). Deferred: the dashboard chart-lib wrappers (low-value mount
tests; their data transforms live in tested utils) and the big container views
(DashboardView/DigestView/LibraryView/SearchPanel — already e2e-covered).

- +266 tests across 22 components: explore (More/Text chips + FilterBar),
  library (Clustered/Feed chips + FilterBar), search (DocTypes/More/TopK chips +
  FilterBar + ResultCard + SemanticSearchTip), shared (DateChip,
  CollapsibleSection, DiagnosticRow, CilTopicPillsRow, PodcastCover 0→covered,
  HelpTip 4→covered), episode (BridgePartition, DetailPanel), subject
  (TopicEntityView). +252 net (1675→1927).
- api modules mocked, charts/cytoscape/heavy children stubbed, real stores.

Mounting these pulls more .vue + transitive imports into the v8 denominator
(7004→8088 statements), so overall % holds while honestly covering far more of
the component tree — still above the ratcheted #914 gate (stmt 84.6/br 74.2/
fn 89.2/ln 86.2 vs 82/71/86/84). 1927 tests green, typecheck clean. No bugs found.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Final component slice. Mount-tests the central shell orchestrators + the
remaining shared dialogs/widgets:

- shell: StatusBar (corpus-path input, health/index dots, sources+config+feeds
  dialogs, version-warning), LeftPanel (search/explore surface switch, focusQuery),
  SubjectRail (kind-routing to episode/graph-node/topic/person panels + event
  re-emission + main-tab neighbourhood wiring).
- shared: TranscriptViewerDialog (4→covered: load/highlight/error/audio/segments,
  dismiss paths), TopicTimelineDialog (open/sort/states/dismiss), HoverRichTip
  (hover/focus show-hide timers, Esc, teleport teardown).
- +135 tests across 6 components. api/cil/feeds modules mocked, heavy children
  stubbed, real stores.

No bugs found; no source changed. Gate green at the ratcheted floor.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The earlier ratchet (82/71/86/84) was set against the logic-layer baseline,
before the component mount-test waves. Mounting container components pulls their
whole transitive .vue import tree into v8's (no-`all`) denominator, so the
headline % settled lower at the full-tree scope (stmt 81.1 / br 69.4 / fn 80.5 /
ln 82.8) even though absolute coverage grew. Set the floor a few points below
that honest baseline so CI's viewer-unit coverage gate passes and still guards
against regression:

  statements 82→78 · branches 71→66 · functions 86→77 · lines 84→80

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The viewer's test tiers (Vitest unit/component + #914 coverage gate, mocked
Playwright e2e, Tier-3 real-corpus validation walk) and corpora were scattered
across ADR-095 / RFC-086 / e2e/validation/README — hard to answer "what graph
tests exist and how do I run them?" quickly.

- New web/gi-kg-viewer/TESTING.md: operator quick-ref — every tier, command,
  graph-subset filters, corpora (synthetic validation vs BYOC), the coverage
  gate, and the RFC-086 matrix rule. Links the existing E2E_SURFACE_MAP +
  validation README (no duplication).
- TESTING_STRATEGY.md (Browser UI E2E) + E2E_TESTING_GUIDE.md: add the missing
  Vitest coverage gate + Tier-3 validation walk and point to TESTING.md.
- viewer README: add a Testing section pointing to TESTING.md.

make docs passes (strict).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The committed synthetic validation corpus ships only the pre-built top-level
API JSONs and is missing the raw feeds/*/metadata/*.gi.json artifacts that the
live serve-api computes episodes from. The real-corpus walk against it returns
an empty Library and fails ~30 handoff specs. Documented as a known gap (with
the regenerate-and-commit fix) so operators run the walk against a BYOC/prod
corpus until the fixture is regenerated. Tracked separately.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The synthetic-corpus real-backend walk failed 30/34 specs with an empty
Library. Root cause: the Makefile + TESTING.md pointed CORPUS_PATH at the
version-LESS parent `tests/fixtures/viewer-validation-corpus`, but the raw
`feeds/<feed>/metadata/*.{metadata,gi,kg,bridge}.json` artifacts that serve-api
computes episodes from live under the FIXTURES_VERSION subdir (`.../v2`).
`discover_metadata_files()` returns 0 at the parent and 23 at `.../v2`, so the
Library rendered no episodes and every handoff spec failed on the first
row-click. The corpus itself was always valid — nothing was missing.

- Makefile: new `VIEWER_VALIDATION_CORPUS` var derived from FIXTURES_VERSION,
  wired into ci-ui-validation / serve-for-validation / build-validation-index
  so the documented path always includes the version dir.
- TESTING.md: corrected the run command (derives the version), the corpus-table
  row (it's two layers — raw feeds/ + pre-built corpus/*.json), and replaced the
  earlier incorrect "missing artifacts" note with the version-dir caveat.

Verified: `make ci-ui-validation CORPUS=.../v2` → 30 passed / 1 skipped (was 30
failed). After `make build-validation-index`, the two index-dependent specs
(P1.3 digest topic-band, P4.2 digest band) also pass → 32/33. The remaining V4
(dashboard topic-cluster chip) fails identically on the real prod-v2 corpus too
— a pre-existing handoff gap, tracked separately, not a corpus issue.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ema-version self-heal

Two coupled fixes so the LanceDB two-tier hybrid path behaves like FAISS across
every surface that applies a date bound.

**publish_date parity (the digest topic-band regression).** The shared
`_hit_passes_cli_filters` drops any hit lacking `publish_date` when `since` is
set. FAISS rows carry it; lance rows did not — so every hybrid hit was dropped
whenever a `since` bound was passed. The digest topic-band search ALWAYS passes
one (window=all → since=1970-01-01), so a corpus served via lance returned 0
topic-bands where FAISS returned them (Tier-3 P1.3/P4.2). Masked in prod only
because prod still ships the legacy `lance_native/` dir → silent FAISS fallback.
Fix: store `publish_date` on the segment/insight/aux docs (schema + dataclass),
populate it in both index paths (native `index-two-tier` + the FAISS→lance
migration), and surface it from the row payload into hit metadata.

**Schema-version self-heal.** Adding a column makes pre-existing lance indexes
incompatible, and there was no version on the index to detect that. Add
`LANCE_SCHEMA_VERSION` (now 2) stamped into `index_meta.json`, plus
`stored_schema_version()` / `lance_index_is_stale()`. Wire the staleness check so
old indexes self-heal:
- read path (`hybrid_candidates`) skips a stale index → FAISS fallback (never
  serves results from an incompatible schema);
- (re)index moments rebuild rather than upsert into incompatible tables:
  `build_two_tier_index` + `migrate_faiss_to_lance` wipe-if-stale, and migration
  0002 rebuilds a stale index instead of no-op.
Staleness requires positive evidence (meta present, version < code); a missing
meta is treated as not-stale so the read path's own try/except governs.

Tests: publish_date carried through hybrid hits; stale index detected + read
falls back to FAISS; 0002 rebuilds a schema-stale index. Full search + upgrade
suites green (240 passed); mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…index

The Tier-3 validation harness was FAISS-only; reflect the current search layer so
the walk exercises the real hybrid (BM25 + dense + RRF) path and matches prod's
two-tier layout.

- Makefile build-validation-index: add `cli index-two-tier` (lance_index) between
  the FAISS build and topic_clusters. Now builds all three, documented inline with
  what each unlocks (V3 semantic search; V2/V4 topic clusters; hybrid serving).
- TESTING.md: document the one-time `make build-validation-index` step, the three
  search artifacts + what each unlocks, and that the "1 skip" is V3 when no vector
  index is present. Corpus table now notes the raw `feeds/` + pre-built layers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…r-3 V3)

Second FAISS-parity gap on the hybrid path, same shape as the publish_date one.
The viewer's "Show on graph" affordance (ResultCard) is gated on
`graphNodeIdFromSearchHit`, which reads `metadata.source_id` — the canonical graph
node id (`topic:…` / `entity:…` / `insight:…` / `quote:…`) — for the focusable
tiers (insight / quote / kg_topic / kg_entity). FAISS rows carry it; lance rows did
not, so a search served via LanceDB rendered no graph handoff and Tier-3 V3
("Search → Show on graph") timed out waiting for the button.

Fix: store `source_id` on the insight + aux docs (schema + dataclass), populate it
in both index paths (native `index-two-tier` + the FAISS→lance migration), and
surface it from the row payload into hit metadata. Folded into LANCE_SCHEMA_VERSION
2 (the unshipped parity bump) alongside publish_date.

Verified: `source_id` flows through hybrid hits (topic:technology, entity:…,
insight:…); Tier-3 V3 passes (handoff status "applied"). Search + upgrade suites
green (238 passed); parity regression test asserts both publish_date + source_id.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… FAISS

Test-evolution for "less FAISS, more LanceDB": the Tier-3 walk exercised the search
path but never asserted WHICH backend served it, so a stale/broken lance_index that
silently degraded to FAISS would pass unnoticed. V6 hits /api/search directly and
proves the LanceDB two-tier hybrid (BM25 + dense + RRF) is live:

- RRF score signature: max(score) < 0.1. FAISS returns a cosine similarity (top hit
  ≈ 1.0); LanceDB returns fused RRF scores 1/(60+rank). This is the definitive
  discriminator.
- two-tier provenance: every hit carries source_tier ∈ {insight, segment, aux}.
- hybrid-only response fields present: lift_stats, query_type.

Verified both ways: passes on the lance index (maxScore 0.031); FAILS when
lance_index is removed (FAISS fallback → maxScore 1.0), so it's a real provenance
guard, not a no-op. Full Tier-3 walk now 34/34 green (V1–V6 + handoff matrix).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… provenance

Fold this session's lance/search learnings into their existing owner docs (no new
files):

- RFC-090 §10 (new): the metadata-parity contract (any consumer field must be on
  both backends — publish_date for date filters, source_id for graph handoff),
  LANCE_SCHEMA_VERSION self-heal (stale → read skips to FAISS, reindex rebuilds),
  and the V6 provenance guard.
- SEMANTIC_SEARCH_GUIDE: operator subsections "Telling which backend served a
  query" (RRF score < 0.1 vs FAISS cosine ≈ 1.0; source_tier/lift_stats/query_type)
  and "Metadata parity + schema versioning", incl. the legacy lance_native/ →
  FAISS-fallback gotcha and the reindex fix.
- e2e/validation/README: add the V6 scenario row.

make docs (mkdocs strict) passes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The default corpus-graph auto-load cap bounds usable graph size because the
`cose` layout is ~O(n²) (stress test: 100 episodes ≈ 2861 nodes ≈ 134s layout).
Bump the interim ceiling to 25 (~930 nodes, ~6-8s) and document the tradeoff +
the stress-test numbers inline. Raising it meaningfully is gated on the
large-graph layout (cose→fcose) + selection-persistence work tracked in #967.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…close (#956)

#956's core symptom (a DGX request hanging after the server returned 200) is already
handled by the #954 `run_with_watchdog` hard wall-clock deadline. This adds the two
transport-level defences from #956 that actually FIT a long-blocking upload, via a
shared `dgx_http_client()` factory both providers now use:

- **TCP keepalive** (`keepalive_socket_options`): SO_KEEPALIVE + a ~30s/15s/4-probe
  schedule so a socket whose underlying path died mid-request (e.g. a Tailscale path
  switch) is reaped in ~90s instead of the OS default (2h on macOS) — turning an
  indefinite hang into a prompt connection error the provider fails over on. Built
  defensively (Linux TCP_KEEPIDLE vs macOS TCP_KEEPALIVE, each hasattr-guarded).
- **Connection: close** so the server tears the socket down after the response.

Deliberately NOT added (don't fit long-blocking DGX calls):
- per-read timeout — these POSTs stream zero bytes during the multi-minute GPU run, so
  any read deadline shorter than processing false-aborts a healthy call; the
  duration-scaled watchdog is the correct backstop.
- urllib3 Retry adapter — we're on httpx, and retries are already handled by the
  provider loops + circuit breaker.
- Connection-reuse concerns — moot; each request uses a fresh client.

Both providers migrated to `dgx_http_client`; whisper's now-redundant httpx
import-guard removed. #956 Tier-1 (async job submission) is server-side, deferred to
the DGX chapter. 38 DGX unit/integration/e2e tests green; mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
)

Follow-up to the cap bump: graphEpisodeSelection.test.ts pinned the constant at
15. Update to 25 (the interim ceiling pending the large-graph layout work #967).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Tier-3 validation walk writes generated Playwright artifacts (incl.
error-context.md) to web/gi-kg-viewer/validation-results/ — the same class as
test-results/ and playwright-report/, which lint-markdown already ignores. It was
missing from the ignore globs, so a left-over artifact (e.g. from a deliberately
failing run) failed `make lint-markdown`/`make ci`. Already gitignored; align the
markdownlint ignores to match.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ilure

interrogate (the `make docstrings` 100% gate) flagged these two methods from the
#954 resilience layer — they predate the first full `make ci` on this branch
(targeted test runs don't run interrogate). Document the state transitions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Removing whisper_provider's local httpx import-guard (now that the client is built
in dgx_http_client) lost the friendly "httpx required" RuntimeError that
test_transcribe_dgx_requires_httpx asserts — the raw ImportError surfaced instead.
Move the guard into the shared factory so both providers get the actionable error.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
chipi and others added 3 commits June 11, 2026 16:28
Branch state, the immediate push+PR step (awaiting operator go), what's on the
branch, and the future chapters (#967 large-graph, #956 Tier-1, #876 batch) so a
fresh session can pick up cleanly. docs/wip is excluded from mkdocs + markdownlint.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…obal pollution)

The #947 GUID-keyed audio cache defaults to a repo-relative global dir
(`.cache/audio`, enabled by default). With no test isolation, tests both pollute
that shared cache and read each other's downloads — and a stale GUID hit silently
masks failure-injection tests: `test_chaos_run_index_records_failed_episode` 404s
e03's audio expecting the episode to fail, but a previously-cached e03 (from a
sibling test / prior run) produced a cache HIT, so e03 transcribed and was never
recorded as failed (`episodes_failed == 0`). Passed on main (no cache existed),
broke here — caught by the first full `make ci` on the branch.

Add an autouse conftest fixture redirecting `DEFAULT_AUDIO_CACHE_DIR` to a per-test
`tmp_path/audio` (basename kept `audio` so `test_default_dir` holds). Tests passing
an explicit `audio_cache_dir` are unaffected.

Verified: full e2e error-handling class + both #947 cache suites green (27 passed),
and tests no longer write to the global `.cache/audio`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lears-focus)

The viewer test push added `ref="overlayRootRef"` to activate the overlay's
capture-phase Escape listener (Escape-to-dismiss). That was untested-in-e2e and
conflicts with the core "Escape clears focus" contract: on cold-start the overlay
is up, so its Escape handler intercepts the key (preventDefault + dismiss) before
`graphHandoff.focusCleared()` runs — H1.12 (Escape bumps the handoff generation)
failed on firefox (post-handoff activeElement lets the overlay claim the Escape).

Revert the one-line ref add so GraphGestureOverlay.vue matches main exactly
(overlay still dismissed via its button / backdrop click), and drop the unit test
that asserted the reverted Escape-dismiss path. H1.12 passes on firefox again; the
overlay's other 14 unit tests stay green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread src/podcast_scraper/search/backends/lancedb_backend.py Fixed
Comment thread src/podcast_scraper/search/backends/lancedb_backend.py Dismissed
Comment thread src/podcast_scraper/search/backends/lancedb_backend.py Dismissed
…ction)

CodeQL flagged the os.path.isfile / open sinks in the new stored_schema_version
helper: meta_path derived from a corpus path (user-provided via the API) without
the sanitiser chain the rest of this file uses. Confine the CONSTANT
"index_meta.json" subpath under the resolved root via
safe_relpath_under_corpus_root → normpath_if_under_root (same Type-1 pattern as
read_index_meta), so CodeQL sees the path as sanitised. Behaviour unchanged
(v2→not-stale, v1→stale, absent→None); 18 search/upgrade tests + mypy green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread src/podcast_scraper/search/backends/lancedb_backend.py Dismissed
chipi and others added 2 commits June 11, 2026 18:32
…ersion (#969)

CodeQL #360/#361/#362 — the os.path.isfile / open sinks in stored_schema_version.
Identical Type-1 cross-function pattern to read_index_meta (#338/#342): meta_path
sanitised via safe_resolve_directory -> safe_relpath_under_corpus_root (constant
index_meta.json) -> normpath_if_under_root, corpus root route-confined by
resolve_corpus_path_param. CodeQL can't model helper-based sanitisers. Dismissed
via gh api per the registry policy; code already uses the prescribed chain.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ift FP, #969)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@chipi chipi merged commit 217e9dd into main Jun 11, 2026
29 checks passed
@chipi chipi deleted the feat/946-existing-only-rediarization branch June 11, 2026 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants