From 13bfa12e50121170289f99bd934188d89e4ba674 Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Fri, 22 May 2026 13:21:09 -0500
Subject: [PATCH 01/21] feat(multigraph): projections + schema-aware loader +
 stable edge identity + internal keyed build path
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Internal, opt-in MultiDiGraph foundation. No default user-visible behavior changes.

Part 1: Projection API (graphify/projections.py)

Explicit projection helpers so consumers can declare what graph semantics they want and
what they intentionally lose:

  - project_for_community (configurable weight mode: confidence or count)
  - project_for_path (simple undirected topology for shortest-path)
  - project_for_callflow (directed projection with optional relation filter)
  - project_for_context (filter by context value)
  - edge_records_between (raw edge record iteration)
  - edge_summary_between (bundle/summary formatting)
  - distinct_neighbor_degree (for god-node and hub thresholds)
  - normalize_to_multidigraph (simple-to-multidigraph lift)

Tests in tests/test_projections.py cover MultiDiGraph fixtures plus property-style
invariants (bundle counts equal total edge records; weighted projection weight equals
multiplicity).

Part 2: Schema-aware loader + stable edge identity

Centralizes graph loading and reserves the edge "key" field as schema, not as an ordinary
attribute.

  - graphify/edge_identity.py: SCHEMA_KEY_FIELD constant,
    make_stable_key(relation, source_file, source_location), strip_schema_key(attrs)
  - graphify/graph_loader.py: load_graph and load_graph_file handling
      legacy simple JSON with "links",
      legacy simple JSON with "edges",
      valid multigraph node-link JSON (multigraph: true) with keyed parallel edges,
      malformed multigraph JSON with missing/non-string keys (deterministic repair via
        full-attribute payload hash, never silent downgrade),
      conflicting schema markers.
  - Profile metadata stored in G.graph["graphify_profile"] (preserved by node-link
    serialization).
  - Multigraph loads gated behind require_multigraph_capabilities() from PR #956.

Tests in tests/test_graph_loader.py and tests/test_edge_identity.py cover all seven
loader scenarios plus the schema-key reservation contract.

Part 3: Internal keyed MultiDiGraph build path

Opt-in nx.MultiDiGraph build support in build_from_json and build.

  - multigraph: bool = False parameter
  - Stable edge keys generated after dedup/remap and source normalization.
  - Serialized edge attrs cannot pass duplicate key= kwargs into G.add_edge.
  - Exact duplicates collapse only with diagnostics; non-exact key collisions fire
    deterministic bounded repair (full-payload hash, not identity-field-only).
  - Node-link JSON written with explicit edges="links" compatibility.
  - Default simple-graph output is unchanged.

Adversarial-input resilience (verified against malformed extraction inputs):

  - Hashable non-string node IDs and edge endpoints are preserved.
  - Unhashable node IDs and endpoints do not crash validation or build.
  - Non-dict node entries and nodes missing "id" are skipped safely after validation
    warns.
  - Non-dict edge entries are skipped safely after validation warns.
  - Explicit empty-string schema keys are preserved.
  - Collision-repair keys are deterministic and do not overwrite explicit keys.
  - Exact duplicate detection remains O(n) within a (source, target, key) group.

Out of scope:

  - No public --multigraph CLI flag (planned for a later slice; only programmatic
    activation here).
  - No watch/cache/global-graph/MCP/export surface changes.
  - No producer widening.
  - No dedup/remap MultiDiGraph contract changes (separate concern, separate review).

Test coverage:

  pytest tests/test_projections.py tests/test_graph_loader.py tests/test_edge_identity.py \
         tests/test_multigraph_diagnostics.py tests/test_build.py tests/test_validate.py
  → 130 passed.

This is a collapse of an earlier 12-commit stack on wave3-pr3-internal-build into a
single commit so that every commit in origin history passes Copilot review individually.
The pre-collapse stack is preserved as the tag
archive/2026-05-22-wave3-pr3-internal-build.
---
 graphify/build.py                    | 428 ++++++++++++--
 graphify/edge_identity.py            |  58 ++
 graphify/graph_loader.py             | 301 ++++++++++
 graphify/projections.py              | 214 +++++++
 graphify/symbol_resolution.py        |   4 +-
 graphify/validate.py                 |  44 +-
 tests/test_build.py                  | 838 ++++++++++++++++++++++++++-
 tests/test_edge_identity.py          |  85 +++
 tests/test_graph_loader.py           | 557 ++++++++++++++++++
 tests/test_multigraph_diagnostics.py |   8 +-
 tests/test_projections.py            | 202 +++++++
 tests/test_validate.py               |  95 ++-
 12 files changed, 2727 insertions(+), 107 deletions(-)
 create mode 100644 graphify/edge_identity.py
 create mode 100644 graphify/graph_loader.py
 create mode 100644 graphify/projections.py
 create mode 100644 tests/test_edge_identity.py
 create mode 100644 tests/test_graph_loader.py
 create mode 100644 tests/test_projections.py

diff --git a/graphify/build.py b/graphify/build.py
index 07fbb0340..386560005 100644
--- a/graphify/build.py
+++ b/graphify/build.py
@@ -22,18 +22,32 @@
 #
 from __future__ import annotations
 import json
+import hashlib
 import os
 import re
 import sys
 import unicodedata
+from collections.abc import Hashable
 from pathlib import Path
 import networkx as nx
-from .validate import validate_extraction
+from .edge_identity import make_stable_key, strip_schema_key
+from .validate import is_hashable, validate_extraction
 
 
 # Synonym mapper for known invalid file_type values that LLM subagents commonly
 # emit. Keeps semantic intent close (markdown→document, tool→code) and falls
 # back to "concept" for any other invalid value (see #840).
+_LANG_FAMILY: dict[str, str] = {
+    ".py": "py", ".pyi": "py",
+    ".js": "js", ".mjs": "js", ".cjs": "js", ".jsx": "js",
+    ".ts": "js", ".tsx": "js",
+    ".go": "go", ".rs": "rs",
+    ".java": "jvm", ".kt": "jvm", ".scala": "jvm", ".groovy": "jvm",
+    ".c": "c", ".h": "c", ".cc": "cpp", ".cpp": "cpp", ".hpp": "cpp",
+    ".rb": "rb", ".php": "php", ".cs": "cs", ".swift": "swift", ".lua": "lua",
+}
+
+
 _FILE_TYPE_SYNONYMS = {
     "markdown": "document",
     "text": "document",
@@ -83,7 +97,56 @@ def _norm_source_file(p: str | None, root: str | None = None) -> str | None:
     return p
 
 
-def edge_data(G: nx.Graph, u: str, v: str) -> dict:
+def _stable_identity_component(value: object) -> str | None:
+    """Normalize malformed edge identity values before stable-key hashing."""
+    if value is None:
+        return None
+    if isinstance(value, str):
+        return value
+    if isinstance(value, os.PathLike):
+        # os.fspath can return bytes for bytes-flavored PathLike; coerce to str
+        # so downstream json.dumps / hashing always sees text.
+        fs_value = os.fspath(value)
+        return fs_value.decode("utf-8", errors="replace") if isinstance(fs_value, bytes) else fs_value
+    if isinstance(value, (set, frozenset)):
+        return json.dumps(sorted(str(item) for item in value), ensure_ascii=False)
+    try:
+        return json.dumps(value, sort_keys=True, ensure_ascii=False, default=str)
+    except (TypeError, ValueError):
+        return str(value)
+
+
+def _make_collision_key(base_key: str, attrs: dict, *, salt: int = 0) -> str:
+    payload = {
+        "base_key": base_key,
+        "attrs": attrs,
+    }
+    if salt:
+        payload["salt"] = salt
+    repair_payload = json.dumps(payload, sort_keys=True, ensure_ascii=False, default=str)
+    repair_digest = hashlib.sha256(repair_payload.encode()).hexdigest()
+    return f"{base_key}:alt:{repair_digest}"
+
+
+def _list_field(data: dict, key: str) -> list:
+    """Return ``data[key]`` if it is a list; otherwise warn to stderr and return ``[]``.
+
+    Extraction dicts come from LLM subagents and can contain malformed shapes;
+    matching the rest of build_from_json's skip+warn policy keeps a single bad
+    field from crashing the whole build.
+    """
+    value = data.get(key, [])
+    if isinstance(value, list):
+        return value
+    print(
+        f"[graphify] WARNING: extraction field '{key}' must be a list, "
+        f"got {type(value).__name__}; treating as empty.",
+        file=sys.stderr,
+    )
+    return []
+
+
+def edge_data(G: nx.Graph, u: Hashable, v: Hashable) -> dict:
     """Return one edge attribute dict for (u, v), tolerating MultiGraph.
 
     For MultiGraph/MultiDiGraph there can be multiple parallel edges;
@@ -96,7 +159,7 @@ def edge_data(G: nx.Graph, u: str, v: str) -> dict:
     return raw
 
 
-def edge_datas(G: nx.Graph, u: str, v: str) -> list[dict]:
+def edge_datas(G: nx.Graph, u: Hashable, v: Hashable) -> list[dict]:
     """Return every edge attribute dict for (u, v); always a list."""
     raw = G[u][v]
     if isinstance(G, (nx.MultiGraph, nx.MultiDiGraph)):
@@ -104,29 +167,47 @@ def edge_datas(G: nx.Graph, u: str, v: str) -> list[dict]:
     return [raw]
 
 
-def build_from_json(extraction: dict, *, directed: bool = False, root: str | Path | None = None) -> nx.Graph:
+def build_from_json(
+    extraction: dict,
+    *,
+    directed: bool = False,
+    root: str | Path | None = None,
+    multigraph: bool = False,
+) -> nx.Graph | nx.DiGraph | nx.MultiDiGraph:
     """Build a NetworkX graph from an extraction dict.
 
     directed=True produces a DiGraph that preserves edge direction (source→target).
     directed=False (default) produces an undirected Graph for backward compatibility.
+    multigraph=True produces a directed MultiDiGraph with keyed parallel edges for
+        internal tests/callers; public CLI exposure is intentionally deferred.
+        In this mode, directed is ignored because MultiDiGraph is always directed.
     root: if given, absolute source_file paths from semantic subagents are made
         relative to root so all nodes share a consistent path key (#932).
     """
+    if not isinstance(extraction, dict):
+        raise TypeError("extraction must be a JSON object")
+
     _root = str(Path(root).resolve()) if root else None
     # NetworkX <= 3.1 serialised edges as "links"; remap to "edges" for compatibility.
     if "edges" not in extraction and "links" in extraction:
         extraction = dict(extraction, edges=extraction["links"])
 
+    nodes = _list_field(extraction, "nodes")
+    edges = _list_field(extraction, "edges")
+    extraction = dict(extraction, nodes=nodes, edges=edges)
+
     # Canonicalize legacy node/edge schema before validation.
-    for node in extraction.get("nodes", []):
+    for node in nodes:
         if not isinstance(node, dict):
             continue
         if "source" in node and "source_file" not in node:
             # Count edges that reference this node so the warning is actionable (#479)
             node_id = node.get("id", "?")
             affected_edges = sum(
-                1 for e in extraction.get("edges", [])
-                if e.get("source") == node_id or e.get("target") == node_id
+                1
+                for e in edges
+                if isinstance(e, dict)
+                and (e.get("source") == node_id or e.get("target") == node_id)
             )
             print(
                 f"[graphify] WARNING: node '{node_id}' uses field 'source' instead of "
@@ -149,29 +230,78 @@ def build_from_json(extraction: dict, *, directed: bool = False, root: str | Pat
     # Dangling edges (stdlib/external imports) are expected - only warn about real schema errors.
     real_errors = [e for e in errors if "does not match any node id" not in e]
     if real_errors:
-        print(f"[graphify] Extraction warning ({len(real_errors)} issues): {real_errors[0]}", file=sys.stderr)
-    G: nx.Graph = nx.DiGraph() if directed else nx.Graph()
-    for node in extraction.get("nodes", []):
+        print(
+            f"[graphify] Extraction warning ({len(real_errors)} issues): {real_errors[0]}",
+            file=sys.stderr,
+        )
+    if multigraph:
+        from .multigraph_compat import require_multigraph_capabilities
+
+        require_multigraph_capabilities()
+    G: nx.Graph = nx.MultiDiGraph() if multigraph else nx.DiGraph() if directed else nx.Graph()
+    for node in nodes:
+        if not isinstance(node, dict) or "id" not in node:
+            continue
+        node_id = node["id"]
+        if not is_hashable(node_id):
+            continue
         if "source_file" in node:
-            node["source_file"] = _norm_source_file(node["source_file"], _root)
-        G.add_node(node["id"], **{k: v for k, v in node.items() if k != "id"})
+            node["source_file"] = _norm_source_file(
+                _stable_identity_component(node["source_file"]), _root
+            )
+        node_attrs = {k: v for k, v in node.items() if k != "id"}
+        # Reject node ids that JSON-serialize but won't round-trip to the same
+        # hashable type. Tuples serialize as JSON arrays and come back as lists
+        # (unhashable), so they cannot be used as NetworkX node ids after a
+        # save/load cycle even though json.dumps would accept them.
+        if isinstance(node_id, (list, tuple, set, frozenset, dict)):
+            print(
+                f"[graphify] WARNING: node id {node_id!r} ({type(node_id).__name__}) "
+                f"would not round-trip through JSON as the same hashable type; skipping.",
+                file=sys.stderr,
+            )
+            continue
+        # Check id AND attrs are JSON-serializable. NetworkX allows hashable but
+        # non-JSON-safe ids (e.g., custom objects); accepting them here would
+        # break later node_link_data + json.dump.
+        try:
+            json.dumps({"id": node_id, **node_attrs}, ensure_ascii=False)
+        except (TypeError, ValueError) as exc:
+            print(
+                f"[graphify] WARNING: node {node_id!r} has non-JSON-serializable "
+                f"id or attrs ({exc}); skipping.",
+                file=sys.stderr,
+            )
+            continue
+        G.add_node(node_id, **node_attrs)
     node_set = set(G.nodes())
     # Normalized ID map: lets edges survive when the LLM generates IDs with
     # slightly different casing or punctuation than the AST extractor.
     # e.g. "Session_ValidateToken" maps to "session_validatetoken".
-    norm_to_id: dict[str, str] = {_normalize_id(nid): nid for nid in node_set}
+    norm_to_id: dict[str, Hashable] = {
+        _normalize_id(nid): nid for nid in node_set if isinstance(nid, str)
+    }
+    multigraph_groups: dict[tuple[Hashable, Hashable, str], list[dict]] = {}
+    multigraph_explicit_keys: set[tuple[Hashable, Hashable, str]] = set()
+    multigraph_diagnostics = {"exact_duplicate_edges": 0, "key_collision_edges": 0}
     # Iterate edges in a deterministic order. The graph is undirected and stores
     # direction in _src/_tgt; when two edges collapse onto the same node pair the
     # last write wins, so an unstable iteration order flips _src/_tgt run-to-run
-    # and makes the serialized graph churn. Sorting fixes the last-write outcome.
-    for edge in sorted(
-        extraction.get("edges", []),
-        key=lambda e: (
-            str(e.get("source", e.get("from", ""))),
-            str(e.get("target", e.get("to", ""))),
-            str(e.get("relation", "")),
-        ),
-    ):
+    # and makes the serialized graph churn. Sorting also stabilizes multigraph
+    # key-collision grouping before keyed emission.
+    def _edge_sort_key(edge: object) -> tuple[str, str, str, str]:
+        if not isinstance(edge, dict):
+            return ("", "", "", repr(edge))
+        return (
+            str(edge.get("source", edge.get("from", ""))),
+            str(edge.get("target", edge.get("to", ""))),
+            str(edge.get("relation", "")),
+            json.dumps(edge, sort_keys=True, ensure_ascii=False, default=str),
+        )
+
+    for edge in sorted(edges, key=_edge_sort_key):
+        if not isinstance(edge, dict):
+            continue
         if "source" not in edge and "from" in edge:
             edge["source"] = edge["from"]
         if "target" not in edge and "to" in edge:
@@ -179,29 +309,38 @@ def build_from_json(extraction: dict, *, directed: bool = False, root: str | Pat
         if "source" not in edge or "target" not in edge:
             continue
         src, tgt = edge["source"], edge["target"]
+        srcis_hashable = is_hashable(src)
+        tgtis_hashable = is_hashable(tgt)
+        if not srcis_hashable or not tgtis_hashable:
+            endpoint = "source" if not srcis_hashable else "target"
+            endpoint_value = src if not srcis_hashable else tgt
+            print(
+                "[graphify] WARNING: skipped edge with unhashable "
+                f"{endpoint} endpoint ({type(endpoint_value).__name__})",
+                file=sys.stderr,
+            )
+            continue
         # Remap mismatched IDs via normalization before dropping the edge.
-        if src not in node_set:
+        if isinstance(src, str) and src not in node_set:
             src = norm_to_id.get(_normalize_id(src), src)
-        if tgt not in node_set:
+        if isinstance(tgt, str) and tgt not in node_set:
             tgt = norm_to_id.get(_normalize_id(tgt), tgt)
         if src not in node_set or tgt not in node_set:
             continue  # skip edges to external/stdlib nodes - expected, not an error
-        attrs = {k: v for k, v in edge.items() if k not in ("source", "target")}
+        # Exclude legacy from/to alongside source/target so they don't survive
+        # as ordinary edge attrs after legacy-shape remap above.
+        base_attrs = {
+            k: v for k, v in edge.items() if k not in ("source", "target", "from", "to")
+        }
+        raw_key, attrs = strip_schema_key(base_attrs)
         if "source_file" in attrs:
-            attrs["source_file"] = _norm_source_file(attrs["source_file"], _root)
+            attrs["source_file"] = _norm_source_file(
+                _stable_identity_component(attrs["source_file"]), _root
+            )
         # Drop cross-language INFERRED `calls` edges — same short names (render,
         # parse, etc.) appear across language boundaries in multi-language chunks,
         # producing phantom edges that don't represent real call relationships.
         if attrs.get("relation") == "calls" and attrs.get("confidence") == "INFERRED":
-            _LANG_FAMILY: dict[str, str] = {
-                ".py": "py", ".pyi": "py",
-                ".js": "js", ".mjs": "js", ".cjs": "js", ".jsx": "js",
-                ".ts": "js", ".tsx": "js",
-                ".go": "go", ".rs": "rs",
-                ".java": "jvm", ".kt": "jvm", ".scala": "jvm", ".groovy": "jvm",
-                ".c": "c", ".h": "c", ".cc": "cpp", ".cpp": "cpp", ".hpp": "cpp",
-                ".rb": "rb", ".php": "php", ".cs": "cs", ".swift": "swift", ".lua": "lua",
-            }
             src_ext = Path(G.nodes[src].get("source_file") or "").suffix.lower()
             tgt_ext = Path(G.nodes[tgt].get("source_file") or "").suffix.lower()
             if src_ext and tgt_ext and _LANG_FAMILY.get(src_ext) != _LANG_FAMILY.get(tgt_ext):
@@ -210,23 +349,131 @@ def build_from_json(extraction: dict, *, directed: bool = False, root: str | Pat
         # causing display functions to show edges backwards.
         attrs["_src"] = src
         attrs["_tgt"] = tgt
-        # When the graph is undirected and the same node pair appears twice with
-        # the same relation but opposite directions (e.g. a `calls` b and b `calls` a),
-        # nx.Graph collapses them into one edge. The deterministic sort above means
-        # the lexicographically-later direction would systematically overwrite the
-        # earlier one's _src/_tgt, silently flipping the surviving edge's caller
-        # and callee. First-seen direction wins instead — drop the redundant
-        # reverse-direction duplicate so the original direction is preserved (#1061).
-        if not G.is_directed() and G.has_edge(src, tgt):
-            existing = edge_data(G, src, tgt)
-            if existing.get("relation") == attrs.get("relation") and (
-                existing.get("_src") == tgt and existing.get("_tgt") == src
-            ):
-                continue
-        G.add_edge(src, tgt, **attrs)
+        # Refuse to store any edge whose attrs cannot round-trip through JSON.
+        # Mutating attrs in place would silently change the user's stored value;
+        # skipping with a warning matches the rest of the build's defensive policy
+        # and prevents later json.dump crashes during export, identically in
+        # simple-graph and multigraph modes.
+        try:
+            json.dumps(attrs, ensure_ascii=False)
+        except (TypeError, ValueError) as exc:
+            print(
+                f"[graphify] WARNING: edge ({src}->{tgt}) has non-JSON-serializable "
+                f"attrs ({exc}); skipping.",
+                file=sys.stderr,
+            )
+            continue
+        if multigraph:
+            if raw_key is not None and not isinstance(raw_key, str):
+                raise TypeError(
+                    f"multigraph edge 'key' must be a string, got "
+                    f"{type(raw_key).__name__} ({raw_key!r})"
+                )
+            base_key = (
+                raw_key
+                if raw_key is not None
+                else make_stable_key(
+                    _stable_identity_component(attrs.get("relation")),
+                    _stable_identity_component(attrs.get("source_file")),
+                    _stable_identity_component(attrs.get("source_location")),
+                )
+            )
+            if raw_key is not None:
+                multigraph_explicit_keys.add((src, tgt, base_key))
+            multigraph_groups.setdefault((src, tgt, base_key), []).append(dict(attrs))
+        else:
+            # When the graph is undirected and the same node pair appears twice with
+            # the same relation but opposite directions (e.g. a `calls` b and b `calls` a),
+            # nx.Graph collapses them into one edge. The deterministic sort above means
+            # the lexicographically-later direction would systematically overwrite the
+            # earlier one's _src/_tgt, silently flipping the surviving edge's caller
+            # and callee. First-seen direction wins instead — drop the redundant
+            # reverse-direction duplicate so the original direction is preserved (#1061).
+            if not G.is_directed() and G.has_edge(src, tgt):
+                existing = edge_data(G, src, tgt)
+                if existing.get("relation") == attrs.get("relation") and (
+                    existing.get("_src") == tgt and existing.get("_tgt") == src
+                ):
+                    continue
+            G.add_edge(src, tgt, **attrs)
+    if multigraph:
+        singleton_groups: list[tuple[Hashable, Hashable, str, dict]] = []
+        multi_groups: list[tuple[Hashable, Hashable, str, list[dict]]] = []
+        used_keys_by_pair: dict[tuple[Hashable, Hashable], set[str]] = {}
+        for (src, tgt, base_key), group_attrs in multigraph_groups.items():
+            unique_attrs: list[dict] = []
+            seen_attr_fingerprints: set[str] = set()
+            for attrs in group_attrs:
+                attr_fingerprint = json.dumps(
+                    attrs, sort_keys=True, ensure_ascii=False, default=str
+                )
+                if attr_fingerprint in seen_attr_fingerprints:
+                    multigraph_diagnostics["exact_duplicate_edges"] += 1
+                else:
+                    seen_attr_fingerprints.add(attr_fingerprint)
+                    unique_attrs.append(attrs)
+            if len(unique_attrs) > 1:
+                multigraph_diagnostics["key_collision_edges"] += 1
+                unique_attrs.sort(
+                    key=lambda attrs: json.dumps(
+                        attrs, sort_keys=True, ensure_ascii=False, default=str
+                    )
+                )
+                multi_groups.append((src, tgt, base_key, unique_attrs))
+            elif unique_attrs:
+                # Reserve the singleton's base_key so any later multi-attr
+                # collision-repair on the same (src, tgt) avoids it.
+                used_keys_by_pair.setdefault((src, tgt), set()).add(base_key)
+                singleton_groups.append((src, tgt, base_key, unique_attrs[0]))
+        # Sort both lists deterministically.
+        singleton_groups.sort(
+            key=lambda item: (
+                repr(item[0]),
+                repr(item[1]),
+                item[2],
+                json.dumps(item[3], sort_keys=True, ensure_ascii=False, default=str),
+            )
+        )
+        multi_groups.sort(
+            key=lambda item: (
+                repr(item[0]),
+                repr(item[1]),
+                item[2],
+                json.dumps(item[3], sort_keys=True, ensure_ascii=False, default=str),
+            )
+        )
+        # Emit singletons first: they use base_key directly and were reserved
+        # in the pre-loop above, so collision-repair from multi groups will
+        # see those reservations and salt around them.
+        for src, tgt, base_key, attrs in singleton_groups:
+            G.add_edge(src, tgt, key=base_key, **attrs)
+        # Then emit multi-attr groups with collision-repair salting against
+        # both reserved singleton base_keys and earlier multi-group repair
+        # keys on the same (src, tgt) pair.
+        for src, tgt, base_key, unique_attrs in multi_groups:
+            used_keys = used_keys_by_pair.setdefault((src, tgt), set())
+            preserve_explicit = (src, tgt, base_key) in multigraph_explicit_keys
+            for index, attrs in enumerate(unique_attrs):
+                # When the user passed an explicit `key` shared across multiple
+                # distinct edges, preserve it on the first emit so at least one
+                # edge per group keeps the canonical user-supplied key.
+                # Derived base_keys (from make_stable_key) always go through
+                # collision-repair so emission stays order-independent.
+                if preserve_explicit and index == 0 and base_key not in used_keys:
+                    key = base_key
+                else:
+                    key = _make_collision_key(base_key, attrs)
+                    salt = 0
+                    while key in used_keys:
+                        salt += 1
+                        key = _make_collision_key(base_key, attrs, salt=salt)
+                used_keys.add(key)
+                G.add_edge(src, tgt, key=key, **attrs)
     hyperedges = extraction.get("hyperedges", [])
     if hyperedges:
         G.graph["hyperedges"] = hyperedges
+    if multigraph:
+        G.graph["graphify_multigraph_diagnostics"] = multigraph_diagnostics
     return G
 
 
@@ -237,7 +484,8 @@ def build(
     dedup: bool = True,
     dedup_llm_backend: str | None = None,
     root: str | Path | None = None,
-) -> nx.Graph:
+    multigraph: bool = False,
+) -> nx.Graph | nx.DiGraph | nx.MultiDiGraph:
     """Merge multiple extraction results into one graph.
 
     directed=True produces a DiGraph that preserves edge direction (source→target).
@@ -253,7 +501,14 @@ def build(
     reverse the order if you prefer AST source_location precision to win.
     """
     from graphify.dedup import deduplicate_entities
-    combined: dict = {"nodes": [], "edges": [], "hyperedges": [], "input_tokens": 0, "output_tokens": 0}
+
+    combined: dict = {
+        "nodes": [],
+        "edges": [],
+        "hyperedges": [],
+        "input_tokens": 0,
+        "output_tokens": 0,
+    }
     for ext in extractions:
         combined["nodes"].extend(ext.get("nodes", []))
         combined["edges"].extend(ext.get("edges", []))
@@ -262,10 +517,12 @@ def build(
         combined["output_tokens"] += ext.get("output_tokens", 0)
     if dedup and combined["nodes"]:
         combined["nodes"], combined["edges"] = deduplicate_entities(
-            combined["nodes"], combined["edges"], communities={},
+            combined["nodes"],
+            combined["edges"],
+            communities={},
             dedup_llm_backend=dedup_llm_backend,
         )
-    return build_from_json(combined, directed=directed, root=root)
+    return build_from_json(combined, directed=directed, root=root, multigraph=multigraph)
 
 
 def _norm_label(label: str) -> str:
@@ -282,7 +539,7 @@ def deduplicate_by_label(nodes: list[dict], edges: list[dict]) -> tuple[list[dic
     """
     _CHUNK_SUFFIX = re.compile(r"_c\d+$")
     canonical: dict[str, dict] = {}  # norm_label -> surviving node
-    remap: dict[str, str] = {}       # old_id -> surviving_id
+    remap: dict[str, str] = {}  # old_id -> surviving_id
 
     for node in nodes:
         key = _norm_label(node.get("label", node.get("id", "")))
@@ -325,16 +582,23 @@ def build_merge(
     graph_path: str | Path = "graphify-out/graph.json",
     prune_sources: list[str] | None = None,
     *,
-    directed: bool = False,
+    directed: bool | None = None,
     dedup: bool = True,
     dedup_llm_backend: str | None = None,
     root: str | Path | None = None,
-) -> nx.Graph:
-    """Load existing graph.json, merge new chunks into it, and save back.
+) -> nx.Graph | nx.DiGraph:
+    """Load existing graph.json, merge new chunks into it, and return the merged graph.
+
+    Persistence is the caller's responsibility (e.g., via ``export.to_json``);
+    this function does not write back to disk.
 
     Never replaces - only grows (or prunes deleted-file nodes via prune_sources).
     Safe to call repeatedly: existing nodes and edges are preserved.
     root: if given, absolute source_file paths in new_chunks are made relative (#932).
+
+    ``directed`` defaults to inheriting the saved graph's flag when an
+    existing graph.json is present, so updating a directed simple graph with
+    default args no longer silently downgrades it to undirected.
     """
     graph_path = Path(graph_path)
     if graph_path.exists():
@@ -346,18 +610,62 @@ def build_merge(
         # attrs are popped before saving in export.py, so going through the
         # NetworkX round-trip loses direction permanently (#760).
         from graphify.security import check_graph_file_size_cap
+
         check_graph_file_size_cap(graph_path)
         data = json.loads(graph_path.read_text(encoding="utf-8"))
+        if not isinstance(data, dict):
+            raise TypeError(
+                f"saved graph.json at {graph_path} must be a JSON object, "
+                f"got {type(data).__name__}"
+            )
+        # Refuse to silently collapse a saved multigraph. build() runs in
+        # simple mode here, which would drop parallel edges; stateful
+        # multigraph update paths are out of scope for the internal keyed
+        # build path (watch/cache/global-graph land in later slices).
+        saved_multigraph = data.get("multigraph", False)
+        if saved_multigraph is True:
+            raise NotImplementedError(
+                f"build_merge cannot update a multigraph graph.json. "
+                f"Found multigraph=true in {graph_path}. Rebuild from extraction "
+                f"or use a simple-graph build target."
+            )
+        if saved_multigraph is not False:
+            raise TypeError(
+                f"'multigraph' in {graph_path} must be a boolean, "
+                f"got {type(saved_multigraph).__name__} ({saved_multigraph!r})"
+            )
+        # Honor the saved graph's `directed` flag unless the caller explicitly
+        # overrides. Without this, an update with default args on a directed
+        # graph silently downgrades it and loses edge direction on next export.
+        saved_directed_raw = data.get("directed", False)
+        if saved_directed_raw is not True and saved_directed_raw is not False:
+            raise TypeError(
+                f"'directed' in {graph_path} must be a boolean, "
+                f"got {type(saved_directed_raw).__name__} ({saved_directed_raw!r})"
+            )
+        saved_directed = saved_directed_raw
+        if directed is None:
+            directed = saved_directed
+        elif directed != saved_directed:
+            print(
+                f"[graphify] WARNING: build_merge directed={directed} overrides "
+                f"saved graph.json directed={saved_directed}",
+                file=sys.stderr,
+            )
         links_key = "links" if "links" in data else "edges"
         existing_nodes = list(data.get("nodes", []))
         existing_edges = list(data.get(links_key, []))
         base = [{"nodes": existing_nodes, "edges": existing_edges}]
     else:
+        if directed is None:
+            directed = False
         existing_nodes = []
         base = []
 
     all_chunks = base + list(new_chunks)
-    G = build(all_chunks, directed=directed, dedup=dedup, dedup_llm_backend=dedup_llm_backend, root=root)
+    G = build(
+        all_chunks, directed=directed, dedup=dedup, dedup_llm_backend=dedup_llm_backend, root=root
+    )
 
     # Prune nodes and edges from deleted source files
     if prune_sources:
@@ -390,8 +698,7 @@ def build_merge(
             )
 
         edges_to_remove = [
-            (u, v) for u, v, d in G.edges(data=True)
-            if d.get("source_file") in prune_set
+            (u, v) for u, v, d in G.edges(data=True) if d.get("source_file") in prune_set
         ]
         if edges_to_remove:
             G.remove_edges_from(edges_to_remove)
@@ -418,6 +725,7 @@ def build_merge(
                 f"Pass prune_sources explicitly if you intend to remove nodes."
             )
 
+    # No write to graph_path here; persistence is the caller's responsibility.
     return G
 
 
diff --git a/graphify/edge_identity.py b/graphify/edge_identity.py
new file mode 100644
index 000000000..f1802bca4
--- /dev/null
+++ b/graphify/edge_identity.py
@@ -0,0 +1,58 @@
+"""Stable edge identity helpers and schema constants for MultiDiGraph support.
+
+The node-link ``"key"`` field is reserved schema — it identifies a parallel edge
+and must never be stored as an ordinary edge attribute.  All callers that build or
+load graphs should use :func:`strip_schema_key` before passing attrs to
+``G.add_edge`` so the ``key`` kwarg is never duplicated.
+"""
+
+from __future__ import annotations
+
+import hashlib
+import json as _json
+
+SCHEMA_KEY_FIELD = "key"
+
+
+def make_stable_key(
+    relation: str | None,
+    source_file: str | None,
+    source_location: str | None,
+) -> str:
+    """Return a collision-safe deterministic edge key from semantic identity fields.
+
+    Uses SHA-256 over a canonical JSON payload with explicit field names so that
+    delimiter characters in field values cannot produce false collisions.  The key
+    format is ``"edge:v1:<sha256hex>"``.
+
+    Two edges with the same relation, file, and location always produce the same
+    key; any difference in those three fields produces a different key.
+    """
+    payload = _json.dumps(
+        {
+            "relation": relation,
+            "source_file": source_file,
+            "source_location": source_location,
+        },
+        sort_keys=True,
+    )
+    digest = hashlib.sha256(payload.encode()).hexdigest()
+    return f"edge:v1:{digest}"
+
+
+def strip_schema_key(attrs: dict) -> tuple[object | None, dict]:
+    """Separate the ``"key"`` schema field from edge attribute kwargs.
+
+    Returns ``(key_value, cleaned_attrs)`` where ``cleaned_attrs`` is a new dict
+    with ``SCHEMA_KEY_FIELD`` removed.  The original *attrs* dict is not mutated.
+
+    The return type is ``object | None`` rather than ``str | None`` because the
+    field may carry any JSON-decodable value at this layer; callers narrow to
+    ``str`` after explicit validation (see the multigraph loader/build paths).
+
+    Use before ``G.add_edge(u, v, key=key_value, **cleaned_attrs)`` to avoid
+    passing ``key`` twice (once as the positional schema arg and once inside attrs).
+    """
+    key_val = attrs.get(SCHEMA_KEY_FIELD)
+    cleaned = {k: v for k, v in attrs.items() if k != SCHEMA_KEY_FIELD}
+    return key_val, cleaned
diff --git a/graphify/graph_loader.py b/graphify/graph_loader.py
new file mode 100644
index 000000000..437c4024a
--- /dev/null
+++ b/graphify/graph_loader.py
@@ -0,0 +1,301 @@
+"""Schema-aware graph loader for saved graphify node-link JSON.
+
+This module loads *serialized graph files* (graph.json / node-link format).
+It is distinct from :func:`graphify.build.build_from_json`, which assembles
+graphs from raw extraction dicts produced by AST and semantic passes.
+
+The two are complementary:
+  - extraction dict  →  ``build_from_json``
+  - saved graph.json →  ``load_graph`` / ``load_graph_file``
+"""
+
+from __future__ import annotations
+
+import hashlib
+import json
+import sys
+from pathlib import Path
+
+import networkx as nx
+
+from .edge_identity import strip_schema_key
+from .multigraph_compat import require_multigraph_capabilities
+from .validate import is_hashable
+
+GRAPHIFY_PROFILE_KEY = "graphify_profile"
+
+
+def load_graph(
+    data: object,
+    *,
+    require_capabilities: bool = True,
+) -> nx.Graph | nx.DiGraph | nx.MultiDiGraph:
+    """Load a serialized node-link graph dict into the appropriate NetworkX type.
+
+    Detects graph type from ``multigraph`` and ``directed`` flags in *data*:
+
+    - ``multigraph: true``               → :class:`nx.MultiDiGraph`
+    - ``multigraph: false, directed: true``  → :class:`nx.DiGraph`
+    - ``multigraph: false, directed: false`` → :class:`nx.Graph`
+
+    All paths set ``G.graph[GRAPHIFY_PROFILE_KEY]`` with at minimum
+    ``{"graph_type": "simple" | "digraph" | "multidigraph"}``.
+
+    ``require_capabilities`` (default ``True``) gates multigraph loading behind
+    :func:`~graphify.multigraph_compat.require_multigraph_capabilities`.  Pass
+    ``False`` to skip the probe entirely — used in unit tests and when the
+    caller has already verified capabilities externally.
+    """
+    if not isinstance(data, dict):
+        raise TypeError("serialized graph data must be a JSON object")
+
+    multigraph_flag = _require_bool_field(data, "multigraph", default=False)
+    directed_flag = _require_bool_field(data, "directed", default=False, allow_none=True)
+    directed_present = "directed" in data
+
+    if multigraph_flag is True:
+        # Only warn when ``directed`` was *explicitly* set to false; an omitted
+        # flag does not contradict ``multigraph: true``.
+        if directed_present and directed_flag is False:
+            print(
+                "[graphify] WARNING: multigraph=true but directed=false; "
+                "normalizing to MultiDiGraph (graphify uses directed graphs).",
+                file=sys.stderr,
+            )
+        if require_capabilities:
+            require_multigraph_capabilities()
+        return _load_multigraph(data)
+    if directed_flag is True:
+        return _load_directed_simple(data)
+    return _load_simple(data)
+
+
+def _require_bool_field(
+    data: dict, field: str, *, default: bool, allow_none: bool = False
+) -> bool | None:
+    """Read a strict-boolean field from serialized graph JSON.
+
+    Rejects non-boolean values (e.g., the string ``"false"``) so corrupted JSON
+    cannot be misclassified by Python's truthiness rules.
+    """
+    if field not in data:
+        return default
+    value = data[field]
+    if value is True or value is False:
+        return value
+    if allow_none and value is None:
+        return None
+    raise TypeError(
+        f"'{field}' must be a boolean, got {type(value).__name__} ({value!r})"
+    )
+
+
+def load_graph_file(
+    path: str | Path,
+    *,
+    require_capabilities: bool = True,
+) -> nx.Graph | nx.DiGraph | nx.MultiDiGraph:
+    """Load a graph.json file produced by graphify.
+
+    Applies the 512 MiB size cap before parsing.
+    """
+    from .security import check_graph_file_size_cap
+
+    path = Path(path)
+    check_graph_file_size_cap(path)
+    data = json.loads(path.read_text(encoding="utf-8"))
+    return load_graph(data, require_capabilities=require_capabilities)
+
+
+# ---------------------------------------------------------------------------
+# Internal helpers
+# ---------------------------------------------------------------------------
+
+
+def _get_edges(data: dict) -> list[dict]:
+    """Return the edge list, accepting both ``"edges"`` and legacy ``"links"``."""
+    for key in ("edges", "links"):
+        if key in data:
+            val = data[key]
+            if not isinstance(val, list):
+                raise TypeError(f"'{key}' must be a list, got {type(val).__name__}")
+            return [e for e in val if isinstance(e, dict)]
+    return []
+
+
+def _set_graph_profile(G: nx.Graph, data: dict, *, graph_type: str) -> None:
+    """Store Graphify profile metadata in ``G.graph[GRAPHIFY_PROFILE_KEY]``.
+
+    NetworkX ``node_link_data`` serializes ``G.graph[...]`` attributes under
+    ``data["graph"]``; some graphify writers also promote ``graphify_profile``
+    to the top level. Read both so round-trips do not silently drop metadata.
+    """
+    nested = data.get("graph", {})
+    if isinstance(nested, dict):
+        for key, value in nested.items():
+            G.graph[key] = value
+    # Prefer the top-level profile when it is a usable dict; fall through to
+    # the nested copy when the top-level value is absent OR malformed.
+    raw = data.get(GRAPHIFY_PROFILE_KEY)
+    if not isinstance(raw, dict) and isinstance(nested, dict):
+        raw = nested.get(GRAPHIFY_PROFILE_KEY)
+    profile = dict(raw) if isinstance(raw, dict) else {}
+    # Overwrite graph_type with the value derived from the multigraph/directed
+    # flags on this load; a stale graph_type in a serialized profile must not
+    # mislabel the actual NetworkX type we just constructed.
+    profile["graph_type"] = graph_type
+    G.graph[GRAPHIFY_PROFILE_KEY] = profile
+
+
+def _add_nodes(G: nx.Graph, data: dict) -> set:
+    """Add valid nodes from *data* to *G*; return the resulting node ID set."""
+    nodes = data.get("nodes", [])
+    if not isinstance(nodes, list):
+        raise TypeError(f"'nodes' must be a list, got {type(nodes).__name__}")
+
+    skipped_unhashable = 0
+    for node in nodes:
+        if not isinstance(node, dict) or "id" not in node:
+            continue
+        node_id = node["id"]
+        if not is_hashable(node_id):
+            skipped_unhashable += 1
+            continue
+        G.add_node(node_id, **{k: v for k, v in node.items() if k != "id"})
+    if skipped_unhashable:
+        print(
+            f"[graphify] WARNING: skipped {skipped_unhashable} node(s) with unhashable id",
+            file=sys.stderr,
+        )
+    return set(G.nodes())
+
+
+def _load_simple(data: dict) -> nx.Graph:
+    """Build an undirected :class:`nx.Graph` from node-link data."""
+    G = nx.Graph()
+    _set_graph_profile(G, data, graph_type="simple")
+    node_set = _add_nodes(G, data)
+    for edge in _get_edges(data):
+        if not isinstance(edge, dict):
+            continue
+        src = edge["source"] if "source" in edge else edge.get("from")
+        tgt = edge["target"] if "target" in edge else edge.get("to")
+        # `is None` (not falsy) so valid hashable IDs like 0 or False survive;
+        # the unhashable guard prevents `in node_set` from raising TypeError on
+        # corrupt input like {"source": ["bad"]}.
+        if src is None or tgt is None:
+            continue
+        if not is_hashable(src) or not is_hashable(tgt):
+            continue
+        if src not in node_set or tgt not in node_set:
+            continue
+        attrs = {k: v for k, v in edge.items() if k not in ("source", "target", "from", "to")}
+        _, attrs = strip_schema_key(attrs)
+        G.add_edge(src, tgt, **attrs)
+    return G
+
+
+def _load_directed_simple(data: dict) -> nx.DiGraph:
+    """Build a directed :class:`nx.DiGraph` from node-link data."""
+    G = nx.DiGraph()
+    _set_graph_profile(G, data, graph_type="digraph")
+    node_set = _add_nodes(G, data)
+    for edge in _get_edges(data):
+        if not isinstance(edge, dict):
+            continue
+        src = edge["source"] if "source" in edge else edge.get("from")
+        tgt = edge["target"] if "target" in edge else edge.get("to")
+        # `is None` (not falsy) so valid hashable IDs like 0 or False survive;
+        # the unhashable guard prevents `in node_set` from raising TypeError on
+        # corrupt input like {"source": ["bad"]}.
+        if src is None or tgt is None:
+            continue
+        if not is_hashable(src) or not is_hashable(tgt):
+            continue
+        if src not in node_set or tgt not in node_set:
+            continue
+        attrs = {k: v for k, v in edge.items() if k not in ("source", "target", "from", "to")}
+        _, attrs = strip_schema_key(attrs)
+        G.add_edge(src, tgt, **attrs)
+    return G
+
+
+def _load_multigraph(data: dict) -> nx.MultiDiGraph:
+    """Build a :class:`nx.MultiDiGraph` with preserved edge keys.
+
+    Missing-key repair: when a serialized edge has no ``"key"`` field, a
+    deterministic repair key is generated from the full edge attribute payload
+    (not just the 3 identity fields) so parallel edges with different metadata
+    are never silently overwritten.
+    """
+    G = nx.MultiDiGraph()
+    _set_graph_profile(G, data, graph_type="multidigraph")
+    node_set = _add_nodes(G, data)
+    missing_key_count = 0
+    duplicate_key_count = 0
+    used_keys_by_pair: dict[tuple[object, object], set[str]] = {}
+    # Sort edges by a stable fingerprint so duplicate-key repair is
+    # input-order-independent: the same malformed graph.json with edges in any
+    # order produces the same final (src, tgt, key) layout.
+    sorted_edges = sorted(
+        _get_edges(data),
+        key=lambda e: json.dumps(e, sort_keys=True, default=str),
+    )
+    for edge in sorted_edges:
+        if not isinstance(edge, dict):
+            continue
+        src = edge["source"] if "source" in edge else edge.get("from")
+        tgt = edge["target"] if "target" in edge else edge.get("to")
+        # `is None` (not falsy) so valid hashable IDs like 0 or False survive;
+        # the unhashable guard prevents `in node_set` from raising TypeError on
+        # corrupt input like {"source": ["bad"]}.
+        if src is None or tgt is None:
+            continue
+        if not is_hashable(src) or not is_hashable(tgt):
+            continue
+        if src not in node_set or tgt not in node_set:
+            continue
+        attrs = {k: v for k, v in edge.items() if k not in ("source", "target", "from", "to")}
+        key, attrs = strip_schema_key(attrs)
+        if key is not None and not isinstance(key, str):
+            raise TypeError(
+                f"multigraph edge 'key' must be a string, got "
+                f"{type(key).__name__} ({key!r})"
+            )
+        if key is None:
+            missing_key_count += 1
+            # Hash the full payload so edges with different metadata get different
+            # keys and both survive (identity-field-only hashing collapses distinct
+            # parallel edges that share relation/source_file/source_location).
+            repair_payload = json.dumps(attrs, sort_keys=True, default=str)
+            repair_digest = hashlib.sha256(repair_payload.encode()).hexdigest()
+            key = f"edge:v1:{repair_digest}"
+        # Detect duplicate (src, tgt, key) tuples. add_edge would otherwise
+        # silently overwrite a previously loaded parallel edge.
+        used = used_keys_by_pair.setdefault((src, tgt), set())
+        if key in used:
+            duplicate_key_count += 1
+            repair_payload = json.dumps(attrs, sort_keys=True, default=str)
+            salt = 0
+            candidate = f"{key}:dup:{hashlib.sha256(repair_payload.encode()).hexdigest()}"
+            while candidate in used:
+                salt += 1
+                candidate = (
+                    f"{key}:dup:{hashlib.sha256((repair_payload + str(salt)).encode()).hexdigest()}"
+                )
+            key = candidate
+        used.add(key)
+        G.add_edge(src, tgt, key=key, **attrs)
+    if missing_key_count:
+        print(
+            f"[graphify] WARNING: {missing_key_count} multigraph edge(s) were missing "
+            f"'key' — generated repair keys from full edge payload.",
+            file=sys.stderr,
+        )
+    if duplicate_key_count:
+        print(
+            f"[graphify] WARNING: {duplicate_key_count} multigraph edge(s) had duplicate "
+            f"(source, target, key) tuples — generated repair keys to preserve all edges.",
+            file=sys.stderr,
+        )
+    return G
diff --git a/graphify/projections.py b/graphify/projections.py
new file mode 100644
index 000000000..a591538bd
--- /dev/null
+++ b/graphify/projections.py
@@ -0,0 +1,214 @@
+"""Projection helpers for graph consumers that need explicit edge semantics."""
+
+from __future__ import annotations
+
+from collections.abc import Hashable, Iterable
+from typing import Any, Literal, cast
+
+import networkx as nx
+
+WeightMode = Literal["confidence", "count", "sum"]
+
+_CONFIDENCE_SCORE = {
+    "EXTRACTED": 1.0,
+    "INFERRED": 0.5,
+    "AMBIGUOUS": 0.2,
+}
+
+
+def _confidence_score(data: dict[str, Any]) -> float:
+    raw_score = data.get("confidence_score")
+    if isinstance(raw_score, int | float) and not isinstance(raw_score, bool):  # Python 3.10+
+        return float(raw_score)
+    raw_confidence = data.get("confidence")
+    if isinstance(raw_confidence, str):
+        return _CONFIDENCE_SCORE.get(raw_confidence.upper(), 0.0)
+    return 0.0
+
+
+def _edge_sort_key(data: dict[str, Any]) -> tuple:
+    return (
+        -_confidence_score(data),
+        str(data.get("relation", "")),
+        str(data.get("source_file", "")),
+        str(data.get("source_location", "")),
+        str(data.get("context", "")),
+        repr(sorted((str(key), repr(value)) for key, value in data.items())),
+    )
+
+
+def _iter_edge_data(G: nx.Graph) -> Iterable[tuple[Any, Any, Any, dict[str, Any]]]:
+    if isinstance(G, nx.MultiGraph | nx.MultiDiGraph):  # Python 3.10+
+        yield from G.edges(keys=True, data=True)
+        return
+    for u, v, data in G.edges(data=True):
+        yield u, v, None, data
+
+
+def _copy_graph_skeleton(G: nx.Graph, graph_type: type[nx.Graph]) -> nx.Graph:
+    H = graph_type()
+    H.graph.update(G.graph)
+    H.add_nodes_from((node, attrs.copy()) for node, attrs in G.nodes(data=True))
+    return H
+
+
+def _unordered_pair(u: Any, v: Any) -> tuple[Any, Any]:
+    if repr(u) <= repr(v):
+        return u, v
+    return v, u
+
+
+def _merged_edge_attrs(records: list[dict[str, Any]], weight_mode: WeightMode) -> dict[str, Any]:
+    if weight_mode not in ("confidence", "count", "sum"):
+        raise ValueError("weight_mode must be one of: confidence, count, sum")
+    sorted_records = sorted(records, key=_edge_sort_key)
+    representative = sorted_records[0].copy()
+    scores = [_confidence_score(record) for record in records]
+    if weight_mode == "confidence":
+        weight = max(scores, default=0.0)
+    elif weight_mode == "count":
+        weight = float(len(records))
+    else:
+        weight = float(sum(scores))
+    representative["weight"] = weight
+    representative["parallel_edge_count"] = len(records)
+    return representative
+
+
+def project_for_community(G: nx.Graph, *, weight_mode: WeightMode = "confidence") -> nx.Graph:
+    """Return a simple undirected projection for clustering and community metrics."""
+    groups: dict[tuple[Any, Any], list[dict[str, Any]]] = {}
+    for u, v, _key, data in _iter_edge_data(G):
+        if u == v:
+            continue
+        pair = _unordered_pair(u, v)
+        groups.setdefault(pair, []).append(dict(data))
+
+    H = _copy_graph_skeleton(G, nx.Graph)
+    for (u, v), records in sorted(
+        groups.items(), key=lambda item: (repr(item[0][0]), repr(item[0][1]))
+    ):
+        H.add_edge(u, v, **_merged_edge_attrs(records, weight_mode))
+    return H
+
+
+def project_for_path(G: nx.Graph) -> nx.Graph:
+    """Return a simple undirected topology projection for path search."""
+    return project_for_community(G, weight_mode="count")
+
+
+def project_for_callflow(
+    G: nx.Graph,
+    *,
+    relations: frozenset[str] | set[str] | None = None,
+) -> nx.DiGraph:
+    """Return a simple directed projection for callflow-style consumers."""
+    relation_filter = set(relations) if relations is not None else None
+    groups: dict[tuple[Any, Any], list[dict[str, Any]]] = {}
+    for u, v, _key, data in _iter_edge_data(G):
+        relation = data.get("relation")
+        # Guard against non-string `relation`; relation_filter is set[str], and
+        # an unhashable relation would TypeError on the `in` membership test.
+        if relation_filter is not None and (
+            not isinstance(relation, str) or relation not in relation_filter
+        ):
+            continue
+        src = data.get("_src", u)
+        tgt = data.get("_tgt", v)
+        if src == tgt:
+            continue
+        groups.setdefault((src, tgt), []).append(dict(data))
+
+    H = cast(nx.DiGraph, _copy_graph_skeleton(G, nx.DiGraph))
+    for (src, tgt), records in sorted(
+        groups.items(), key=lambda item: (repr(item[0][0]), repr(item[0][1]))
+    ):
+        if src not in H:
+            H.add_node(src)
+        if tgt not in H:
+            H.add_node(tgt)
+        H.add_edge(src, tgt, **_merged_edge_attrs(records, "confidence"))
+    return H
+
+
+def _normalize_contexts(contexts: Iterable[str] | str | None) -> set[str] | None:
+    if contexts is None:
+        return None
+    raw_contexts = [contexts] if isinstance(contexts, str) else contexts
+    normalized = {str(context).strip().lower() for context in raw_contexts if str(context).strip()}
+    return normalized or None
+
+
+def project_for_context(G: nx.Graph, *, contexts: Iterable[str] | str | None = None) -> nx.Graph:
+    """Return a graph copy containing only edges whose context matches the filter."""
+    filters = _normalize_contexts(contexts)
+    H = _copy_graph_skeleton(G, G.__class__)
+    for u, v, key, data in _iter_edge_data(G):
+        if filters is not None and str(data.get("context", "")).strip().lower() not in filters:
+            continue
+        if isinstance(H, nx.MultiGraph | nx.MultiDiGraph):  # Python 3.10+
+            H.add_edge(u, v, key=key, **data)
+        else:
+            H.add_edge(u, v, **data)
+    return H
+
+
+def edge_records_between(G: nx.Graph, u: Hashable, v: Hashable) -> list[dict[str, Any]]:
+    """Return shallow copies of all edge records connecting two nodes."""
+    records: list[dict[str, Any]] = []
+
+    def collect(src: Hashable, tgt: Hashable) -> None:
+        if not G.has_edge(src, tgt):
+            return
+        raw = G.get_edge_data(src, tgt)
+        if not isinstance(raw, dict):
+            return
+        if isinstance(G, nx.MultiGraph | nx.MultiDiGraph):  # Python 3.10+
+            records.extend(dict(data) for data in raw.values() if isinstance(data, dict))
+        else:
+            records.append(dict(raw))
+
+    collect(u, v)
+    if G.is_directed() and u != v:
+        collect(v, u)
+    return sorted(records, key=_edge_sort_key)
+
+
+def edge_summary_between(G: nx.Graph, u: Hashable, v: Hashable) -> dict[str, Any]:
+    """Summarize all relationships between two nodes for display consumers."""
+    records = edge_records_between(G, u, v)
+    representative = records[0].copy() if records else {}
+    return {
+        "count": len(records),
+        "relations": sorted(
+            {str(record.get("relation")) for record in records if record.get("relation")}
+        ),
+        "confidences": sorted(
+            {str(record.get("confidence")) for record in records if record.get("confidence")}
+        ),
+        "representative": representative,
+    }
+
+
+def distinct_neighbor_degree(G: nx.Graph, node: Hashable) -> int:
+    """Count unique adjacent nodes without inflating parallel edges."""
+    if node not in G:
+        return 0
+    if G.is_directed():
+        directed = cast(nx.DiGraph, G)
+        return len(set(directed.predecessors(node)) | set(directed.successors(node)))
+    return len(set(G.neighbors(node)))
+
+
+def normalize_to_multidigraph(G: nx.Graph) -> nx.MultiDiGraph:
+    """Return a MultiDiGraph copy, preserving parallel keys when present."""
+    H = nx.MultiDiGraph()
+    H.graph.update(G.graph)
+    H.add_nodes_from((node, attrs.copy()) for node, attrs in G.nodes(data=True))
+    if isinstance(G, nx.MultiGraph | nx.MultiDiGraph):  # Python 3.10+
+        for u, v, key, data in G.edges(keys=True, data=True):
+            H.add_edge(u, v, key=key, **data)
+    else:
+        for u, v, data in G.edges(data=True):
+            H.add_edge(u, v, **data)
+    return H
diff --git a/graphify/symbol_resolution.py b/graphify/symbol_resolution.py
index 7bc68093a..5cb0dad15 100644
--- a/graphify/symbol_resolution.py
+++ b/graphify/symbol_resolution.py
@@ -243,7 +243,7 @@ def resolve_python_import_guided_calls(
         if path.suffix != ".py":
             continue
         slot: Any = per_file[index] if index < len(per_file) else None
-        result_by_file[str(path)] = slot if isinstance(slot, dict) else {"nodes": [], "edges": []}
+        result_by_file[str(path)] = slot if isinstance(slot, dict) else {"nodes": [], "edges": []}  # empty fragment for missing/non-dict slots
     resolved_edges: list[dict[str, Any]] = []
 
     for path in paths:
@@ -256,7 +256,7 @@ def resolve_python_import_guided_calls(
         file_result = result_by_file.get(source_file, {"raw_calls": []})
         raw_calls = file_result.get("raw_calls", [])
         if not isinstance(raw_calls, list):
-            continue
+            continue  # raw_calls must be a list; skip malformed fragments
         for raw_call in raw_calls:
             if not isinstance(raw_call, dict):
                 continue
diff --git a/graphify/validate.py b/graphify/validate.py
index 5f6bad364..4b63d1af4 100644
--- a/graphify/validate.py
+++ b/graphify/validate.py
@@ -7,6 +7,14 @@
 REQUIRED_EDGE_FIELDS = {"source", "target", "relation", "confidence", "source_file"}
 
 
+def is_hashable(value: object) -> bool:
+    try:
+        hash(value)
+    except TypeError:
+        return False
+    return True
+
+
 def validate_extraction(data: dict) -> list[str]:
     """
     Validate an extraction JSON dict against the graphify schema.
@@ -29,7 +37,11 @@ def validate_extraction(data: dict) -> list[str]:
                 continue
             for field in REQUIRED_NODE_FIELDS:
                 if field not in node:
-                    errors.append(f"Node {i} (id={node.get('id', '?')!r}) missing required field '{field}'")
+                    errors.append(
+                        f"Node {i} (id={node.get('id', '?')!r}) missing required field '{field}'"
+                    )
+            if "id" in node and not is_hashable(node["id"]):
+                errors.append(f"Node {i} id is unhashable and cannot be used as a node id")
             if "file_type" in node and node["file_type"] not in VALID_FILE_TYPES:
                 errors.append(
                     f"Node {i} (id={node.get('id', '?')!r}) has invalid file_type "
@@ -43,7 +55,17 @@ def validate_extraction(data: dict) -> list[str]:
     elif not isinstance(edge_list, list):
         errors.append("'edges' must be a list")
     else:
-        node_ids = {n["id"] for n in data.get("nodes", []) if isinstance(n, dict) and "id" in n}
+        # Guard against non-list `nodes` (the earlier branch only records the
+        # error and falls through to here); iterating a non-list would otherwise
+        # raise an incidental TypeError instead of yielding a clean validation
+        # message.
+        raw_nodes = data.get("nodes", [])
+        nodes_iter = raw_nodes if isinstance(raw_nodes, list) else []
+        node_ids = {
+            n["id"]
+            for n in nodes_iter
+            if isinstance(n, dict) and "id" in n and is_hashable(n["id"])
+        }
         for i, edge in enumerate(edge_list):
             if not isinstance(edge, dict):
                 errors.append(f"Edge {i} must be an object")
@@ -56,10 +78,16 @@ def validate_extraction(data: dict) -> list[str]:
                     f"Edge {i} has invalid confidence '{edge['confidence']}' "
                     f"- must be one of {sorted(VALID_CONFIDENCES)}"
                 )
-            if "source" in edge and node_ids and edge["source"] not in node_ids:
-                errors.append(f"Edge {i} source '{edge['source']}' does not match any node id")
-            if "target" in edge and node_ids and edge["target"] not in node_ids:
-                errors.append(f"Edge {i} target '{edge['target']}' does not match any node id")
+            if "source" in edge:
+                if not is_hashable(edge["source"]):
+                    errors.append(f"Edge {i} source is unhashable and cannot match any node id")
+                elif node_ids and edge["source"] not in node_ids:
+                    errors.append(f"Edge {i} source '{edge['source']}' does not match any node id")
+            if "target" in edge:
+                if not is_hashable(edge["target"]):
+                    errors.append(f"Edge {i} target is unhashable and cannot match any node id")
+                elif node_ids and edge["target"] not in node_ids:
+                    errors.append(f"Edge {i} target '{edge['target']}' does not match any node id")
 
     return errors
 
@@ -68,5 +96,7 @@ def assert_valid(data: dict) -> None:
     """Raise ValueError with all errors if extraction is invalid."""
     errors = validate_extraction(data)
     if errors:
-        msg = f"Extraction JSON has {len(errors)} error(s):\n" + "\n".join(f"  • {e}" for e in errors)
+        msg = f"Extraction JSON has {len(errors)} error(s):\n" + "\n".join(
+            f"  • {e}" for e in errors
+        )
         raise ValueError(msg)
diff --git a/tests/test_build.py b/tests/test_build.py
index 9be6c1289..19bd65199 100644
--- a/tests/test_build.py
+++ b/tests/test_build.py
@@ -1,40 +1,61 @@
 import json
 from pathlib import Path
+from typing import cast
 import networkx as nx
+import pytest
 from networkx.readwrite import json_graph
-from graphify.build import build_from_json, build, build_merge, edge_data, edge_datas
+from graphify.build import (
+    _make_collision_key,
+    build_from_json,
+    build,
+    build_merge,
+    edge_data,
+    edge_datas,
+)
+from graphify.edge_identity import make_stable_key
 
 FIXTURES = Path(__file__).parent / "fixtures"
 
+
 def load_extraction():
     return json.loads((FIXTURES / "extraction.json").read_text())
 
+
 def test_build_from_json_node_count():
     G = build_from_json(load_extraction())
     assert G.number_of_nodes() == 4
 
+
 def test_build_from_json_edge_count():
     G = build_from_json(load_extraction())
     assert G.number_of_edges() == 4
 
+
 def test_nodes_have_label():
     G = build_from_json(load_extraction())
     assert G.nodes["n_transformer"]["label"] == "Transformer"
 
+
 def test_edges_have_confidence():
-    G = build_from_json(load_extraction())
+    G = cast(nx.Graph, build_from_json(load_extraction()))
     data = G.edges["n_attention", "n_concept_attn"]
     assert data["confidence"] == "INFERRED"
 
+
 def test_ambiguous_edge_preserved():
-    G = build_from_json(load_extraction())
+    G = cast(nx.Graph, build_from_json(load_extraction()))
     data = G.edges["n_layernorm", "n_concept_attn"]
     assert data["confidence"] == "AMBIGUOUS"
 
+
 def test_legacy_node_source_canonicalized():
     """Legacy 'source' key on nodes is renamed to 'source_file' before graph build."""
-    ext = {"nodes": [{"id": "n1", "label": "A", "file_type": "code", "source": "a.py"}],
-           "edges": [], "input_tokens": 0, "output_tokens": 0}
+    ext = {
+        "nodes": [{"id": "n1", "label": "A", "file_type": "code", "source": "a.py"}],
+        "edges": [],
+        "input_tokens": 0,
+        "output_tokens": 0,
+    }
     G = build_from_json(ext)
     assert "source_file" in G.nodes["n1"]
     assert G.nodes["n1"]["source_file"] == "a.py"
@@ -43,11 +64,24 @@ def test_legacy_node_source_canonicalized():
 
 def test_legacy_edge_from_to_canonicalized():
     """Legacy 'from'/'to' keys on edges are accepted alongside 'source'/'target'."""
-    ext = {"nodes": [{"id": "n1", "label": "A", "file_type": "code", "source_file": "a.py"},
-                     {"id": "n2", "label": "B", "file_type": "code", "source_file": "b.py"}],
-           "edges": [{"from": "n1", "to": "n2", "relation": "calls",
-                      "confidence": "EXTRACTED", "source_file": "a.py", "weight": 1.0}],
-           "input_tokens": 0, "output_tokens": 0}
+    ext = {
+        "nodes": [
+            {"id": "n1", "label": "A", "file_type": "code", "source_file": "a.py"},
+            {"id": "n2", "label": "B", "file_type": "code", "source_file": "b.py"},
+        ],
+        "edges": [
+            {
+                "from": "n1",
+                "to": "n2",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "a.py",
+                "weight": 1.0,
+            }
+        ],
+        "input_tokens": 0,
+        "output_tokens": 0,
+    }
     G = build_from_json(ext)
     assert G.number_of_edges() == 1
 
@@ -56,11 +90,22 @@ def test_source_file_backslash_normalized():
     """Windows backslash paths and POSIX paths for the same file must produce one node."""
     extraction = {
         "nodes": [
-            {"id": "n1", "label": "A", "file_type": "code", "source_file": "src\\middleware\\auth.py"},
-            {"id": "n2", "label": "B", "file_type": "code", "source_file": "src/middleware/auth.py"},
+            {
+                "id": "n1",
+                "label": "A",
+                "file_type": "code",
+                "source_file": "src\\middleware\\auth.py",
+            },
+            {
+                "id": "n2",
+                "label": "B",
+                "file_type": "code",
+                "source_file": "src/middleware/auth.py",
+            },
         ],
         "edges": [],
-        "input_tokens": 0, "output_tokens": 0,
+        "input_tokens": 0,
+        "output_tokens": 0,
     }
     G = build_from_json(extraction)
     sources = {G.nodes[n]["source_file"] for n in G.nodes()}
@@ -68,12 +113,27 @@ def test_source_file_backslash_normalized():
 
 
 def test_build_merges_multiple_extractions():
-    ext1 = {"nodes": [{"id": "n1", "label": "A", "file_type": "code", "source_file": "a.py"}],
-            "edges": [], "input_tokens": 0, "output_tokens": 0}
-    ext2 = {"nodes": [{"id": "n2", "label": "B", "file_type": "document", "source_file": "b.md"}],
-            "edges": [{"source": "n1", "target": "n2", "relation": "references",
-                       "confidence": "INFERRED", "source_file": "b.md", "weight": 1.0}],
-            "input_tokens": 0, "output_tokens": 0}
+    ext1 = {
+        "nodes": [{"id": "n1", "label": "A", "file_type": "code", "source_file": "a.py"}],
+        "edges": [],
+        "input_tokens": 0,
+        "output_tokens": 0,
+    }
+    ext2 = {
+        "nodes": [{"id": "n2", "label": "B", "file_type": "document", "source_file": "b.md"}],
+        "edges": [
+            {
+                "source": "n1",
+                "target": "n2",
+                "relation": "references",
+                "confidence": "INFERRED",
+                "source_file": "b.md",
+                "weight": 1.0,
+            }
+        ],
+        "input_tokens": 0,
+        "output_tokens": 0,
+    }
     G = build([ext1, ext2])
     assert G.number_of_nodes() == 2
     assert G.number_of_edges() == 1
@@ -191,8 +251,9 @@ def test_build_merge_preserves_call_edge_direction(tmp_path):
 
     # Verify direction is correct in the freshly written JSON.
     saved = json.loads(graph_path.read_text())
-    saved_calls = [e for e in saved.get("links", saved.get("edges", []))
-                   if e.get("relation") == "calls"]
+    saved_calls = [
+        e for e in saved.get("links", saved.get("edges", [])) if e.get("relation") == "calls"
+    ]
     assert len(saved_calls) == 1
     assert saved_calls[0]["source"] == truth_src
     assert saved_calls[0]["target"] == truth_tgt
@@ -203,8 +264,9 @@ def test_build_merge_preserves_call_edge_direction(tmp_path):
 
     # The calls edge must still go a -> b, not b -> a.
     reloaded = json.loads(graph_path.read_text())
-    reloaded_calls = [e for e in reloaded.get("links", reloaded.get("edges", []))
-                      if e.get("relation") == "calls"]
+    reloaded_calls = [
+        e for e in reloaded.get("links", reloaded.get("edges", [])) if e.get("relation") == "calls"
+    ]
     assert len(reloaded_calls) == 1
     assert reloaded_calls[0]["source"] == truth_src, (
         f"calls edge source flipped after build_merge round-trip: "
@@ -280,6 +342,7 @@ def test_build_from_json_preserves_first_direction_on_bidirectional_pair(tmp_pat
 # whenever the loaded JSON has multigraph: true. Plain G.edges[u, v] crashes
 # on those with `ValueError: not enough values to unpack (expected 3, got 2)`.
 
+
 def test_edge_data_simple_graph():
     G = nx.Graph()
     G.add_edge("a", "b", relation="calls", confidence="EXTRACTED")
@@ -367,12 +430,22 @@ def test_build_from_json_relativizes_absolute_source_file(tmp_path):
     abs_path = str(root / "docs" / "overview.md")
     extraction = {
         "nodes": [
-            {"id": "overview_intro", "label": "Intro", "source_file": abs_path, "file_type": "document"},
+            {
+                "id": "overview_intro",
+                "label": "Intro",
+                "source_file": abs_path,
+                "file_type": "document",
+            },
         ],
         "edges": [
-            {"source": "overview_intro", "target": "overview_intro",
-             "relation": "self", "confidence": "EXTRACTED", "confidence_score": 1.0,
-             "source_file": abs_path},
+            {
+                "source": "overview_intro",
+                "target": "overview_intro",
+                "relation": "self",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": abs_path,
+            },
         ],
     }
     G = build_from_json(extraction, root=root)
@@ -398,7 +471,9 @@ def test_build_relativizes_absolute_source_file(tmp_path):
 def test_build_from_json_relative_source_file_unchanged(tmp_path):
     """Already-relative source_file paths must not be modified."""
     extraction = {
-        "nodes": [{"id": "foo_bar", "label": "bar", "source_file": "src/foo.py", "file_type": "code"}],
+        "nodes": [
+            {"id": "foo_bar", "label": "bar", "source_file": "src/foo.py", "file_type": "code"}
+        ],
         "edges": [],
     }
     G = build_from_json(extraction, root=tmp_path)
@@ -468,3 +543,710 @@ def test_build_merge_rejects_oversized_existing_graph(monkeypatch, tmp_path):
     monkeypatch.setattr("graphify.security._MAX_GRAPH_FILE_BYTES", 8)
     with pytest.raises(ValueError, match="exceeds"):
         build_merge([], graph_path, dedup=False)
+
+
+def _parallel_edge_extraction() -> dict:
+    return {
+        "nodes": [
+            {"id": "a", "label": "A", "file_type": "code", "source_file": "src/a.py"},
+            {"id": "b", "label": "B", "file_type": "code", "source_file": "src/b.py"},
+        ],
+        "edges": [
+            {
+                "source": "a",
+                "target": "b",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "src/a.py",
+                "source_location": "L10",
+            },
+            {
+                "source": "a",
+                "target": "b",
+                "relation": "imports",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "src/a.py",
+                "source_location": "L2",
+            },
+        ],
+    }
+
+
+def test_default_build_stays_simple_when_parallel_edges_exist():
+    G = build_from_json(_parallel_edge_extraction())
+
+    assert type(G) is nx.Graph
+    assert not G.is_multigraph()
+    assert G.number_of_edges("a", "b") == 1
+
+
+def test_multigraph_build_preserves_same_endpoint_different_relations():
+    G = build_from_json(_parallel_edge_extraction(), multigraph=True)
+
+    assert type(G) is nx.MultiDiGraph
+    assert G.number_of_edges("a", "b") == 2
+    edge_records = list(G["a"]["b"].items())
+    assert {data["relation"] for _key, data in edge_records} == {"calls", "imports"}
+    assert all(str(key).startswith("edge:v1:") for key, _data in edge_records)
+    assert all("key" not in data for _key, data in edge_records)
+
+
+def test_multigraph_build_preserves_same_identity_except_source_location():
+    extraction = _parallel_edge_extraction()
+    extraction["edges"][1].update(
+        {
+            "relation": "calls",
+            "source_location": "L20",
+        }
+    )
+
+    G = build_from_json(extraction, multigraph=True)
+
+    assert G.number_of_edges("a", "b") == 2
+    assert {data["source_location"] for data in G["a"]["b"].values()} == {"L10", "L20"}
+
+
+def test_multigraph_build_collapses_exact_duplicates_with_diagnostic():
+    extraction = _parallel_edge_extraction()
+    extraction["edges"].append(dict(extraction["edges"][0]))
+
+    G = build_from_json(extraction, multigraph=True)
+
+    assert G.number_of_edges("a", "b") == 2
+    assert G.graph["graphify_multigraph_diagnostics"]["exact_duplicate_edges"] == 1
+
+
+def test_multigraph_build_preserves_non_exact_key_collisions_with_diagnostic():
+    extraction = {
+        "nodes": [
+            {"id": "a", "label": "A", "file_type": "code", "source_file": "src/a.py"},
+            {"id": "b", "label": "B", "file_type": "code", "source_file": "src/b.py"},
+        ],
+        "edges": [
+            {
+                "source": "a",
+                "target": "b",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "src/a.py",
+                "source_location": "L10",
+                "context": "static",
+            },
+            {
+                "source": "a",
+                "target": "b",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "src/a.py",
+                "source_location": "L10",
+                "context": "runtime",
+            },
+        ],
+    }
+
+    G = build_from_json(extraction, multigraph=True)
+
+    assert G.number_of_edges("a", "b") == 2
+    assert {data["context"] for data in G["a"]["b"].values()} == {
+        "static",
+        "runtime",
+    }
+    assert G.graph["graphify_multigraph_diagnostics"]["key_collision_edges"] == 1
+
+
+def test_multigraph_build_collapses_duplicates_after_collision_repair():
+    extraction = {
+        "nodes": [
+            {"id": "a", "label": "A", "file_type": "code", "source_file": "src/a.py"},
+            {"id": "b", "label": "B", "file_type": "code", "source_file": "src/b.py"},
+        ],
+        "edges": [
+            {
+                "source": "a",
+                "target": "b",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "src/a.py",
+                "source_location": "L10",
+                "context": "static",
+            },
+            {
+                "source": "a",
+                "target": "b",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "src/a.py",
+                "source_location": "L10",
+                "context": "runtime",
+            },
+            {
+                "source": "a",
+                "target": "b",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "src/a.py",
+                "source_location": "L10",
+                "context": "runtime",
+            },
+        ],
+    }
+
+    G = build_from_json(extraction, multigraph=True)
+
+    assert G.number_of_edges("a", "b") == 2
+    assert {data["context"] for data in G["a"]["b"].values()} == {
+        "static",
+        "runtime",
+    }
+    assert G.graph["graphify_multigraph_diagnostics"] == {
+        "exact_duplicate_edges": 1,
+        "key_collision_edges": 1,
+    }
+
+
+def test_multigraph_build_preserves_empty_string_schema_key():
+    extraction = _parallel_edge_extraction()
+    extraction["edges"] = [dict(extraction["edges"][0], key="")]
+
+    G = build_from_json(extraction, multigraph=True)
+
+    assert list(G["a"]["b"].keys()) == [""]
+
+
+def test_multigraph_build_normalizes_path_identity_fields_for_stable_key(tmp_path):
+    """Path objects survive coercion via the JSON 'default=str' path of json.dumps."""
+    extraction = _parallel_edge_extraction()
+    absolute_source = tmp_path / "src" / "a.py"
+    extraction["edges"] = [
+        {
+            **extraction["edges"][0],
+            "source_file": absolute_source,
+            "source_location": {"line": 10},
+        }
+    ]
+
+    G = build_from_json(extraction, root=tmp_path, multigraph=True)
+
+    assert G.number_of_edges("a", "b") == 1
+    assert next(iter(G["a"]["b"].keys())).startswith("edge:v1:")
+    assert next(iter(G["a"]["b"].values()))["source_file"] == "src/a.py"
+
+
+def test_multigraph_build_skips_edge_with_non_json_serializable_attrs(capsys):
+    """Edges whose attrs cannot round-trip through JSON are skipped with a warning.
+
+    Mutating attrs in place would silently change the user's stored value;
+    skipping with a warning preserves data integrity for surviving edges and
+    prevents later json.dump crashes during export.
+    """
+    extraction = _parallel_edge_extraction()
+    extraction["edges"] = [
+        {
+            **extraction["edges"][0],
+            "relation": {"calls", "uses"},
+        }
+    ]
+
+    G = build_from_json(extraction, multigraph=True)
+
+    assert G.number_of_edges("a", "b") == 0
+    captured = capsys.readouterr()
+    assert "non-JSON-serializable" in captured.err
+
+
+@pytest.mark.parametrize("field", ["nodes", "edges"])
+def test_build_from_json_treats_non_list_node_or_edge_field_as_empty(field, capsys):
+    extraction = _parallel_edge_extraction()
+    extraction[field] = 123
+
+    G = build_from_json(extraction, multigraph=True)
+
+    assert G.number_of_edges() == 0
+    captured = capsys.readouterr()
+    assert f"extraction field '{field}' must be a list" in captured.err
+
+
+def test_multigraph_collision_repair_keys_do_not_depend_on_edge_order():
+    base_edges = [
+        {
+            "source": "a",
+            "target": "b",
+            "relation": "calls",
+            "confidence": "EXTRACTED",
+            "confidence_score": 1.0,
+            "source_file": "src/a.py",
+            "source_location": "L10",
+            "context": "static",
+        },
+        {
+            "source": "a",
+            "target": "b",
+            "relation": "calls",
+            "confidence": "EXTRACTED",
+            "confidence_score": 1.0,
+            "source_file": "src/a.py",
+            "source_location": "L10",
+            "context": "runtime",
+        },
+    ]
+
+    def keys_by_context(edges: list[dict]) -> dict[str, str]:
+        extraction = {
+            "nodes": [
+                {"id": "a", "label": "A", "file_type": "code", "source_file": "src/a.py"},
+                {"id": "b", "label": "B", "file_type": "code", "source_file": "src/b.py"},
+            ],
+            "edges": edges,
+        }
+        G = build_from_json(extraction, multigraph=True)
+        return {data["context"]: key for key, data in G["a"]["b"].items()}
+
+    forward = keys_by_context(base_edges)
+    reverse = keys_by_context(list(reversed(base_edges)))
+
+    assert forward == reverse
+    assert all(":alt:" in key for key in forward.values())
+
+
+def test_multigraph_collision_repair_does_not_overwrite_explicit_key():
+    runtime_attrs = {
+        "relation": "calls",
+        "confidence": "EXTRACTED",
+        "confidence_score": 1.0,
+        "source_file": "src/a.py",
+        "source_location": "L10",
+        "context": "runtime",
+        "_src": "a",
+        "_tgt": "b",
+    }
+    base_key = make_stable_key("calls", "src/a.py", "L10")
+    explicit_conflict_key = _make_collision_key(base_key, runtime_attrs)
+    edges = [
+        {
+            "source": "a",
+            "target": "b",
+            "key": explicit_conflict_key,
+            "relation": "imports",
+            "confidence": "EXTRACTED",
+            "confidence_score": 1.0,
+            "source_file": "src/a.py",
+            "source_location": "L2",
+            "context": "explicit",
+        },
+        {
+            "source": "a",
+            "target": "b",
+            "relation": "calls",
+            "confidence": "EXTRACTED",
+            "confidence_score": 1.0,
+            "source_file": "src/a.py",
+            "source_location": "L10",
+            "context": "static",
+        },
+        {
+            "source": "a",
+            "target": "b",
+            "relation": "calls",
+            "confidence": "EXTRACTED",
+            "confidence_score": 1.0,
+            "source_file": "src/a.py",
+            "source_location": "L10",
+            "context": "runtime",
+        },
+    ]
+
+    def contexts_by_key(edge_order: list[dict]) -> dict[str, str]:
+        extraction = {
+            "nodes": [
+                {"id": "a", "label": "A", "file_type": "code", "source_file": "src/a.py"},
+                {"id": "b", "label": "B", "file_type": "code", "source_file": "src/b.py"},
+            ],
+            "edges": edge_order,
+        }
+        G = build_from_json(extraction, multigraph=True)
+        assert G.number_of_edges("a", "b") == 3
+        return {key: data["context"] for key, data in G["a"]["b"].items()}
+
+    forward = contexts_by_key(edges)
+    reverse = contexts_by_key(list(reversed(edges)))
+
+    assert forward == reverse
+    assert forward[explicit_conflict_key] == "explicit"
+    runtime_keys = [key for key, context in forward.items() if context == "runtime"]
+    assert len(runtime_keys) == 1
+    assert runtime_keys[0] != explicit_conflict_key
+
+
+def test_multigraph_build_roundtrips_through_json_loader(tmp_path):
+    from graphify.export import to_json
+    from graphify.graph_loader import load_graph_file
+
+    G = build_from_json(_parallel_edge_extraction(), multigraph=True)
+    graph_path = tmp_path / "graph.json"
+
+    assert to_json(G, {}, str(graph_path), force=True)
+    data = json.loads(graph_path.read_text())
+    loaded = load_graph_file(graph_path)
+
+    assert data["multigraph"] is True
+    assert data["directed"] is True
+    assert len(data["links"]) == 2
+    assert all("key" in link for link in data["links"])
+    assert type(loaded) is nx.MultiDiGraph
+    assert loaded.number_of_edges("a", "b") == 2
+    assert set(loaded["a"]["b"]) == {link["key"] for link in data["links"]}
+
+
+def test_build_multigraph_merges_extractions_without_collapsing_parallel_edges():
+    extraction = _parallel_edge_extraction()
+
+    G = build(
+        [
+            {"nodes": extraction["nodes"], "edges": [extraction["edges"][0]]},
+            {"nodes": [], "edges": [extraction["edges"][1]]},
+        ],
+        dedup=False,
+        multigraph=True,
+    )
+
+    assert type(G) is nx.MultiDiGraph
+    assert G.number_of_edges("a", "b") == 2
+
+
+def test_build_preserves_hashable_non_string_edge_endpoints():
+    extraction = {
+        "nodes": [
+            {"id": 1, "label": "A", "file_type": "code", "source_file": "src/a.py"},
+            {"id": 2, "label": "B", "file_type": "code", "source_file": "src/b.py"},
+        ],
+        "edges": [
+            {
+                "source": 1,
+                "target": 2,
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "src/a.py",
+                "source_location": "L10",
+            },
+        ],
+    }
+
+    G = build_from_json(extraction)
+
+    assert G.has_edge(1, 2)
+
+
+def test_build_skips_unhashable_edge_endpoints_without_crashing(capsys):
+    extraction = {
+        "nodes": [
+            {"id": "a", "label": "A", "file_type": "code", "source_file": "src/a.py"},
+            {"id": "b", "label": "B", "file_type": "code", "source_file": "src/b.py"},
+        ],
+        "edges": [
+            {
+                "source": "a",
+                "target": {"bad": "target"},
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "src/a.py",
+                "source_location": "L10",
+            },
+        ],
+    }
+
+    G = build_from_json(extraction)
+    captured = capsys.readouterr()
+
+    assert G.number_of_edges() == 0
+    assert "unhashable" in captured.err
+
+
+def test_build_skips_unhashable_node_ids_without_crashing(capsys):
+    extraction = {
+        "nodes": [
+            {"id": ["bad"], "label": "Bad", "file_type": "code", "source_file": "src/bad.py"},
+            {"id": "ok", "label": "OK", "file_type": "code", "source_file": "src/ok.py"},
+        ],
+        "edges": [],
+    }
+
+    G = build_from_json(extraction)
+    captured = capsys.readouterr()
+
+    assert list(G.nodes()) == ["ok"]
+    assert "Node 0 id is unhashable" in captured.err
+
+
+def test_build_skips_malformed_nodes_without_crashing(capsys):
+    extraction = {
+        "nodes": [
+            "bad-node",
+            {"label": "Missing ID", "file_type": "code", "source_file": "src/missing.py"},
+            {"id": "ok", "label": "OK", "file_type": "code", "source_file": "src/ok.py"},
+        ],
+        "edges": [],
+    }
+
+    G = build_from_json(extraction)
+    captured = capsys.readouterr()
+
+    assert list(G.nodes()) == ["ok"]
+    assert "Node 0 must be an object" in captured.err
+
+
+def test_build_warns_when_skipping_unhashable_endpoint_without_node_ids(capsys):
+    extraction = {
+        "nodes": [],
+        "edges": [
+            {
+                "source": ["bad"],
+                "target": "b",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "src/a.py",
+                "source_location": "L10",
+            },
+        ],
+    }
+
+    G = build_from_json(extraction)
+    captured = capsys.readouterr()
+
+    assert G.number_of_edges() == 0
+    assert "skipped edge with unhashable source endpoint" in captured.err
+
+
+def test_build_skips_malformed_edges_without_crashing(capsys):
+    extraction = {
+        "nodes": [
+            {"id": "a", "label": "A", "file_type": "code", "source_file": "src/a.py"},
+            {"id": "b", "label": "B", "file_type": "code", "source_file": "src/b.py"},
+        ],
+        "edges": [
+            7,
+            {
+                "source": "a",
+                "target": "b",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "src/a.py",
+                "source_location": "L10",
+            },
+        ],
+    }
+
+    G = build_from_json(extraction)
+    captured = capsys.readouterr()
+
+    assert G.number_of_edges() == 1
+    assert "Edge 0 must be an object" in captured.err
+
+
+def test_build_merge_rejects_multigraph_graph_json(tmp_path):
+    """build_merge must refuse a multigraph input rather than silently collapse parallel edges."""
+    import json as _json
+
+    graph_path = tmp_path / "graph.json"
+    graph_path.write_text(
+        _json.dumps(
+            {
+                "directed": True,
+                "multigraph": True,
+                "nodes": [{"id": "a"}, {"id": "b"}],
+                "links": [
+                    {"source": "a", "target": "b", "key": "k1", "relation": "calls"},
+                    {"source": "a", "target": "b", "key": "k2", "relation": "imports"},
+                ],
+            }
+        )
+    )
+
+    with pytest.raises(NotImplementedError, match="multigraph"):
+        build_merge([], graph_path=graph_path)
+
+
+def test_build_merge_inherits_directed_from_saved_graph_json(tmp_path):
+    """build_merge with default args must preserve direction of a directed saved graph."""
+    import json as _json
+
+    graph_path = tmp_path / "graph.json"
+    graph_path.write_text(
+        _json.dumps(
+            {
+                "directed": True,
+                "multigraph": False,
+                "nodes": [
+                    {"id": "caller", "file_type": "code", "source_file": "a.py"},
+                    {"id": "callee", "file_type": "code", "source_file": "b.py"},
+                ],
+                "links": [
+                    {
+                        "source": "caller",
+                        "target": "callee",
+                        "relation": "calls",
+                        "source_file": "a.py",
+                        "_src": "caller",
+                        "_tgt": "callee",
+                    }
+                ],
+            }
+        )
+    )
+
+    # No `directed=` arg passed — must inherit True from the saved file.
+    G = build_merge([], graph_path=graph_path)
+    assert G.is_directed(), "build_merge default-args must inherit directed=True from saved graph"
+    assert G.has_edge("caller", "callee")
+    assert not G.has_edge("callee", "caller")
+
+
+def test_build_merge_directed_override_warns(tmp_path, capsys):
+    """Explicit directed=False against a directed saved graph emits a warning."""
+    import json as _json
+
+    graph_path = tmp_path / "graph.json"
+    graph_path.write_text(
+        _json.dumps(
+            {
+                "directed": True,
+                "multigraph": False,
+                "nodes": [{"id": "a"}, {"id": "b"}],
+                "links": [{"source": "a", "target": "b", "relation": "calls"}],
+            }
+        )
+    )
+
+    G = build_merge([], graph_path=graph_path, directed=False)
+    captured = capsys.readouterr()
+    assert "overrides saved" in captured.err.lower()
+    assert not G.is_directed()
+
+
+def test_build_merge_rejects_non_bool_multigraph_in_saved_graph(tmp_path):
+    """A saved graph.json with a non-bool 'multigraph' value must be rejected."""
+    import json as _json
+    graph_path = tmp_path / "graph.json"
+    graph_path.write_text(_json.dumps({
+        "directed": True, "multigraph": "false",
+        "nodes": [{"id": "a"}, {"id": "b"}],
+        "links": [{"source": "a", "target": "b", "relation": "calls"}],
+    }))
+    with pytest.raises(TypeError, match="'multigraph' in .* must be a boolean"):
+        build_merge([], graph_path=graph_path)
+
+
+def test_build_merge_rejects_non_bool_directed_in_saved_graph(tmp_path):
+    import json as _json
+    graph_path = tmp_path / "graph.json"
+    graph_path.write_text(_json.dumps({
+        "directed": "true", "multigraph": False,
+        "nodes": [{"id": "a"}, {"id": "b"}],
+        "links": [{"source": "a", "target": "b", "relation": "calls"}],
+    }))
+    with pytest.raises(TypeError, match="'directed' in .* must be a boolean"):
+        build_merge([], graph_path=graph_path)
+
+
+def test_simple_build_skips_edge_with_non_json_serializable_attrs(capsys):
+    """Same skip-and-warn policy applies to simple-graph builds."""
+    extraction = _parallel_edge_extraction()
+    extraction["edges"] = [
+        {
+            **extraction["edges"][0],
+            "relation": {"calls", "uses"},
+        }
+    ]
+    G = build_from_json(extraction, multigraph=False)
+    assert G.number_of_edges("a", "b") == 0
+    captured = capsys.readouterr()
+    assert "non-JSON-serializable" in captured.err
+
+
+def test_build_skips_node_with_non_json_serializable_attrs(capsys):
+    """Nodes with non-JSON-serializable attrs are skipped with a warning."""
+    extraction = {
+        "nodes": [
+            {"id": "ok", "label": "OK", "file_type": "code", "source_file": "a.py"},
+            {
+                "id": "bad",
+                "label": "Bad",
+                "file_type": "code",
+                "source_file": "b.py",
+                "tags": {"unhashable", "set"},
+            },
+        ],
+        "edges": [],
+        "input_tokens": 0,
+        "output_tokens": 0,
+    }
+    G = build_from_json(extraction)
+    assert "ok" in G.nodes
+    assert "bad" not in G.nodes
+    captured = capsys.readouterr()
+    assert "non-JSON-serializable" in captured.err
+
+
+def test_build_strips_legacy_from_to_from_edge_attrs():
+    """Legacy from/to keys must not survive into stored edge attrs after remap."""
+    ext = {
+        "nodes": [
+            {"id": "a", "label": "A", "file_type": "code", "source_file": "a.py"},
+            {"id": "b", "label": "B", "file_type": "code", "source_file": "b.py"},
+        ],
+        "edges": [
+            {
+                "from": "a",
+                "to": "b",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "a.py",
+                "weight": 1.0,
+            }
+        ],
+        "input_tokens": 0,
+        "output_tokens": 0,
+    }
+    G = cast(nx.Graph, build_from_json(ext))
+    data = G.edges["a", "b"]
+    assert "from" not in data
+    assert "to" not in data
+
+
+def test_multigraph_preserves_first_explicit_key_in_collision_group():
+    """When multiple edges share an explicit user key, the first one preserves it."""
+    extraction = {
+        "nodes": [
+            {"id": "a", "label": "A", "file_type": "code", "source_file": "a.py"},
+            {"id": "b", "label": "B", "file_type": "code", "source_file": "b.py"},
+        ],
+        "edges": [
+            {
+                "source": "a", "target": "b",
+                "key": "user-key",
+                "relation": "calls", "confidence": "EXTRACTED",
+                "source_file": "a.py", "context": "first",
+            },
+            {
+                "source": "a", "target": "b",
+                "key": "user-key",
+                "relation": "calls", "confidence": "EXTRACTED",
+                "source_file": "a.py", "context": "second",
+            },
+        ],
+        "input_tokens": 0, "output_tokens": 0,
+    }
+    G = build_from_json(extraction, multigraph=True)
+    keys = set(G["a"]["b"].keys())
+    assert "user-key" in keys, "First edge must retain the explicit user-supplied key"
+    assert len(keys) == 2, "Both edges must survive; second gets a repair key"
diff --git a/tests/test_edge_identity.py b/tests/test_edge_identity.py
new file mode 100644
index 000000000..f80efb345
--- /dev/null
+++ b/tests/test_edge_identity.py
@@ -0,0 +1,85 @@
+"""Tests for graphify.edge_identity — schema constants and stable key helpers."""
+
+from __future__ import annotations
+
+from graphify.edge_identity import SCHEMA_KEY_FIELD, make_stable_key, strip_schema_key
+
+
+def test_schema_key_field_constant():
+    assert SCHEMA_KEY_FIELD == "key"
+
+
+def test_make_stable_key_deterministic():
+    k1 = make_stable_key("calls", "src/a.py", "L10")
+    k2 = make_stable_key("calls", "src/a.py", "L10")
+    assert k1 == k2
+    assert isinstance(k1, str)
+    assert k1  # non-empty
+
+
+def test_make_stable_key_all_none():
+    k = make_stable_key(None, None, None)
+    assert isinstance(k, str)
+    assert k  # non-empty — never crashes or returns empty string
+
+
+def test_make_stable_key_differs_by_source_location():
+    k1 = make_stable_key("calls", "src/a.py", "L10")
+    k2 = make_stable_key("calls", "src/a.py", "L20")
+    assert k1 != k2
+
+
+def test_make_stable_key_identical_fields_match():
+    k1 = make_stable_key("imports", "graphify/build.py", "L42")
+    k2 = make_stable_key("imports", "graphify/build.py", "L42")
+    assert k1 == k2
+
+
+def test_strip_schema_key_removes_key_field():
+    attrs = {"key": "calls:a.py:L1", "relation": "calls", "confidence": "EXTRACTED"}
+    key_val, cleaned = strip_schema_key(attrs)
+    assert key_val == "calls:a.py:L1"
+    assert "key" not in cleaned
+    assert cleaned["relation"] == "calls"
+    assert cleaned["confidence"] == "EXTRACTED"
+
+
+def test_strip_schema_key_no_key_present():
+    attrs = {"relation": "imports", "confidence_score": 1.0}
+    key_val, cleaned = strip_schema_key(attrs)
+    assert key_val is None
+    assert cleaned == attrs
+    assert "key" not in cleaned
+
+
+def test_strip_schema_key_does_not_mutate_input():
+    attrs = {"key": "k1", "relation": "calls"}
+    original = dict(attrs)
+    strip_schema_key(attrs)
+    assert attrs == original
+
+
+# ---------------------------------------------------------------------------
+# Blocker 1: delimiter-collision safety
+# ---------------------------------------------------------------------------
+
+
+def test_make_stable_key_no_delimiter_collision():
+    # "a:b","c","d" must not hash the same as "a","b:c","d"
+    k1 = make_stable_key("a:b", "c", "d")
+    k2 = make_stable_key("a", "b:c", "d")
+    assert k1 != k2
+
+
+def test_make_stable_key_format_is_versioned():
+    k = make_stable_key("calls", "a.py", "L1")
+    assert k.startswith("edge:v1:")
+
+
+def test_make_stable_key_none_differs_from_empty_and_unknown():
+    # make_stable_key(None, None, None) must not collide with
+    # make_stable_key("unknown", "", "") — None must serialize as JSON null,
+    # not be normalised to "unknown"/"" before hashing.
+    k_none = make_stable_key(None, None, None)
+    k_defaults = make_stable_key("unknown", "", "")
+    assert k_none != k_defaults
diff --git a/tests/test_graph_loader.py b/tests/test_graph_loader.py
new file mode 100644
index 000000000..4fcd3ce45
--- /dev/null
+++ b/tests/test_graph_loader.py
@@ -0,0 +1,557 @@
+"""Tests for graphify.graph_loader — schema-aware graph loading.
+
+Seven required PR 2 scenarios from the Wave 3 handoff guardrails.
+"""
+
+from __future__ import annotations
+
+from unittest.mock import patch
+
+import networkx as nx
+import pytest
+
+from graphify.graph_loader import GRAPHIFY_PROFILE_KEY, load_graph
+
+# ---------------------------------------------------------------------------
+# Fixtures
+# ---------------------------------------------------------------------------
+
+_NODES = [
+    {"id": "a", "label": "A", "file_type": "code", "source_file": "a.py"},
+    {"id": "b", "label": "B", "file_type": "code", "source_file": "b.py"},
+]
+
+_SIMPLE_EDGE = {
+    "source": "a",
+    "target": "b",
+    "relation": "calls",
+    "confidence": "EXTRACTED",
+    "confidence_score": 1.0,
+    "source_file": "a.py",
+    "weight": 1.0,
+}
+
+_KEYED_EDGE = {**_SIMPLE_EDGE, "key": "calls:a.py:L1"}
+
+_KEYED_EDGE_2 = {
+    "source": "a",
+    "target": "b",
+    "relation": "imports",
+    "confidence": "EXTRACTED",
+    "confidence_score": 1.0,
+    "source_file": "a.py",
+    "key": "imports:a.py:L5",
+    "weight": 1.0,
+}
+
+
+def _simple_links() -> dict:
+    """Legacy simple JSON using 'links' key."""
+    return {"nodes": _NODES, "links": [_SIMPLE_EDGE]}
+
+
+def _simple_edges() -> dict:
+    """Modern simple JSON using 'edges' key."""
+    return {"nodes": _NODES, "edges": [_SIMPLE_EDGE]}
+
+
+def _multigraph_data() -> dict:
+    """Valid multigraph node-link JSON with two keyed parallel edges."""
+    return {
+        "multigraph": True,
+        "nodes": _NODES,
+        "links": [_KEYED_EDGE, _KEYED_EDGE_2],
+    }
+
+
+def _multigraph_missing_keys() -> dict:
+    """Multigraph JSON where edges lack 'key' fields."""
+    edge_no_key = {k: v for k, v in _SIMPLE_EDGE.items() if k != "key"}
+    return {"multigraph": True, "nodes": _NODES, "links": [edge_no_key]}
+
+
+# ---------------------------------------------------------------------------
+# Scenario 1: legacy 'links' loads as nx.Graph
+# ---------------------------------------------------------------------------
+
+
+def test_load_graph_rejects_non_object_payload():
+    with pytest.raises(TypeError, match="serialized graph data must be a JSON object"):
+        load_graph([])
+
+
+def test_load_graph_rejects_non_list_nodes():
+    data = {**_simple_links(), "nodes": 123}
+
+    with pytest.raises(TypeError, match="'nodes' must be a list, got int"):
+        load_graph(data)
+
+
+def test_legacy_links_loads_as_simple_graph():
+    G = load_graph(_simple_links())
+    assert type(G) is nx.Graph
+    assert not G.is_multigraph()
+    assert G.number_of_nodes() == 2
+    assert G.number_of_edges() == 1
+
+
+# ---------------------------------------------------------------------------
+# Scenario 2: modern 'edges' loads as nx.Graph
+# ---------------------------------------------------------------------------
+
+
+def test_modern_edges_loads_as_simple_graph():
+    G = load_graph(_simple_edges())
+    assert type(G) is nx.Graph
+    assert not G.is_multigraph()
+    assert G.number_of_edges() == 1
+
+
+# ---------------------------------------------------------------------------
+# Scenario 3: valid multigraph JSON with keyed parallel edges → nx.MultiDiGraph
+# ---------------------------------------------------------------------------
+
+
+def test_valid_multigraph_loads_as_multidigraph():
+    G = load_graph(_multigraph_data())
+    assert type(G) is nx.MultiDiGraph
+    assert G.is_multigraph()
+    assert G.number_of_nodes() == 2
+    assert G.number_of_edges() == 2  # both parallel edges preserved
+
+
+# ---------------------------------------------------------------------------
+# Scenario 4: malformed multigraph (missing keys) repairs explicitly, not silently
+# ---------------------------------------------------------------------------
+
+
+def test_malformed_multigraph_missing_keys_repairs_explicitly(capsys):
+    G = load_graph(_multigraph_missing_keys())
+    # Must produce a MultiDiGraph (not silently fall back to simple)
+    assert type(G) is nx.MultiDiGraph
+    assert G.number_of_edges() == 1
+    # Must warn to stderr
+    captured = capsys.readouterr()
+    assert "missing" in captured.err.lower() or "key" in captured.err.lower()
+
+
+# ---------------------------------------------------------------------------
+# Scenario 5: edge 'key' is stripped from attrs — not stored as an edge attribute
+# ---------------------------------------------------------------------------
+
+
+def test_schema_key_stripped_from_edge_attrs():
+    G = load_graph(_multigraph_data())
+    assert isinstance(G, nx.MultiDiGraph)
+    for u, v, k, data in G.edges(keys=True, data=True):
+        assert "key" not in data, (
+            f"Edge ({u},{v},key={k!r}) must not store 'key' inside its attrs dict"
+        )
+
+
+# ---------------------------------------------------------------------------
+# Scenario 6: G.graph["graphify_profile"] is present after load
+# ---------------------------------------------------------------------------
+
+
+def test_graph_profile_metadata_round_trips():
+    G = load_graph(_simple_links())
+    assert GRAPHIFY_PROFILE_KEY in G.graph
+    profile = G.graph[GRAPHIFY_PROFILE_KEY]
+    assert isinstance(profile, dict)
+    assert "graph_type" in profile
+
+
+def test_graph_profile_type_for_multidigraph():
+    G = load_graph(_multigraph_data())
+    assert G.graph[GRAPHIFY_PROFILE_KEY]["graph_type"] == "multidigraph"
+
+
+def test_graph_profile_type_for_simple():
+    G = load_graph(_simple_links())
+    assert G.graph[GRAPHIFY_PROFILE_KEY]["graph_type"] == "simple"
+
+
+# ---------------------------------------------------------------------------
+# Scenario 7: capability probe failure raises clearly; simple loading unaffected
+# ---------------------------------------------------------------------------
+
+
+def test_capability_probe_failure_raises_clear_error():
+    with patch(
+        "graphify.graph_loader.require_multigraph_capabilities",
+        side_effect=RuntimeError("MultiDiGraph not supported: simulated failure"),
+    ):
+        with pytest.raises(RuntimeError, match="MultiDiGraph not supported"):
+            load_graph(_multigraph_data(), require_capabilities=True)
+
+
+def test_capability_probe_failure_does_not_affect_simple_load():
+    with patch(
+        "graphify.graph_loader.require_multigraph_capabilities",
+        side_effect=RuntimeError("should not be called"),
+    ):
+        # Simple JSON must not trigger the capability probe at all
+        G = load_graph(_simple_links(), require_capabilities=True)
+    assert type(G) is nx.Graph
+
+
+# ---------------------------------------------------------------------------
+# Blocker 2: missing-key repair must preserve distinct parallel edges
+# ---------------------------------------------------------------------------
+
+
+def _two_missing_key_parallel_edges() -> dict:
+    """Multigraph with two missing-key edges sharing relation/file but different attrs."""
+    return {
+        "multigraph": True,
+        "nodes": _NODES,
+        "links": [
+            {
+                "source": "a",
+                "target": "b",
+                "relation": "calls",
+                "source_file": "a.py",
+                "confidence": "EXTRACTED",
+                "weight": 1.0,
+                "context": "one",
+            },
+            {
+                "source": "a",
+                "target": "b",
+                "relation": "calls",
+                "source_file": "a.py",
+                "confidence": "EXTRACTED",
+                "weight": 1.0,
+                "context": "two",
+            },
+        ],
+    }
+
+
+def test_missing_key_repair_preserves_distinct_parallel_edges(capsys):
+    G = load_graph(_two_missing_key_parallel_edges())
+    assert type(G) is nx.MultiDiGraph
+    assert G.number_of_edges() == 2, (
+        f"Both missing-key parallel edges must survive repair; got {G.number_of_edges()}"
+    )
+    captured = capsys.readouterr()
+    assert "missing" in captured.err.lower() or "key" in captured.err.lower()
+
+
+# ---------------------------------------------------------------------------
+# Blocker 3: simple loader must respect serialized directedness
+# ---------------------------------------------------------------------------
+
+
+def test_directed_true_loads_as_digraph():
+    data = {
+        "directed": True,
+        "multigraph": False,
+        "nodes": _NODES,
+        "edges": [_SIMPLE_EDGE],
+    }
+    G = load_graph(data)
+    assert type(G) is nx.DiGraph
+
+
+def test_directed_false_explicitly_loads_as_graph():
+    data = {
+        "directed": False,
+        "multigraph": False,
+        "nodes": _NODES,
+        "edges": [_SIMPLE_EDGE],
+    }
+    G = load_graph(data)
+    assert type(G) is nx.Graph
+
+
+def test_directed_true_profile_graph_type():
+    data = {
+        "directed": True,
+        "multigraph": False,
+        "nodes": _NODES,
+        "edges": [_SIMPLE_EDGE],
+    }
+    G = load_graph(data)
+    assert G.graph[GRAPHIFY_PROFILE_KEY]["graph_type"] == "digraph"
+
+
+# ---------------------------------------------------------------------------
+# Blocker 4: malformed JSON must fail cleanly or skip under documented policy
+# ---------------------------------------------------------------------------
+
+
+def test_non_dict_edge_entries_are_skipped():
+    data = {"nodes": _NODES, "edges": ["not-a-dict", None, 42]}
+    G = load_graph(data)
+    assert G.number_of_edges() == 0
+
+
+def test_edges_value_not_a_list_raises():
+    data = {"nodes": _NODES, "edges": "not-a-list"}
+    with pytest.raises((TypeError, ValueError)):
+        load_graph(data)
+
+
+def test_non_dict_graphify_profile_is_ignored():
+    data = {
+        "nodes": _NODES,
+        "edges": [_SIMPLE_EDGE],
+        GRAPHIFY_PROFILE_KEY: "bad-profile",
+    }
+    G = load_graph(data)
+    assert isinstance(G.graph[GRAPHIFY_PROFILE_KEY], dict)
+    assert "graph_type" in G.graph[GRAPHIFY_PROFILE_KEY]
+
+
+def test_edge_missing_source_or_target_skipped():
+    data = {
+        "nodes": _NODES,
+        "edges": [
+            {"target": "b", "relation": "calls"},
+            {"source": "a", "relation": "calls"},
+        ],
+    }
+    G = load_graph(data)
+    assert G.number_of_edges() == 0
+
+
+# ---------------------------------------------------------------------------
+# Non-string multigraph key values must raise before NetworkX sees them
+# ---------------------------------------------------------------------------
+
+
+def _multigraph_with_key(key_value: object) -> dict:
+    return {
+        "multigraph": True,
+        "nodes": _NODES,
+        "links": [{**_SIMPLE_EDGE, "key": key_value}],
+    }
+
+
+def test_multigraph_list_key_raises():
+    with pytest.raises((TypeError, ValueError)):
+        load_graph(_multigraph_with_key(["bad"]))
+
+
+def test_multigraph_dict_key_raises():
+    with pytest.raises((TypeError, ValueError)):
+        load_graph(_multigraph_with_key({"bad": 1}))
+
+
+def test_multigraph_int_key_raises():
+    with pytest.raises((TypeError, ValueError)):
+        load_graph(_multigraph_with_key(123))
+
+
+def test_load_simple_edge_with_empty_string_source_not_shadowed_by_from():
+    # An edge with source="" AND from="a" must not silently use "from" as the
+    # source — an explicitly-set empty source means the edge is invalid.
+    data = {
+        "nodes": _NODES,
+        "links": [{"source": "", "from": "a", "target": "b", "relation": "calls"}],
+    }
+    G = load_graph(data)
+    assert G.number_of_edges() == 0
+
+
+def test_load_simple_edge_with_from_key_loaded():
+    # Edges using legacy "from"/"to" keys should load correctly as long as
+    # the IDs are non-empty and present in the node set.
+    data = {
+        "nodes": _NODES,
+        "links": [{"from": "a", "to": "b", "relation": "calls"}],
+    }
+    G = load_graph(data)
+    assert G.number_of_edges() == 1
+
+
+def test_load_simple_preserves_falsy_hashable_ids():
+    """Falsy-but-hashable node IDs like 0 or False must survive the loader."""
+    data = {
+        "directed": False,
+        "multigraph": False,
+        "nodes": [{"id": 0}, {"id": ""}, {"id": "x"}],
+        "links": [
+            {"source": 0, "target": "", "relation": "calls"},
+            {"source": 0, "target": "x", "relation": "imports"},
+        ],
+    }
+    G = load_graph(data)
+    assert G.number_of_nodes() == 3
+    assert G.number_of_edges() == 2
+    assert G.has_edge(0, "")
+    assert G.has_edge(0, "x")
+
+
+def test_load_directed_preserves_falsy_hashable_ids():
+    data = {
+        "directed": True,
+        "multigraph": False,
+        "nodes": [{"id": 0}, {"id": "y"}],
+        "links": [{"source": 0, "target": "y", "relation": "calls"}],
+    }
+    G = load_graph(data)
+    assert G.number_of_edges() == 1
+    assert G.has_edge(0, "y")
+
+
+def test_load_multigraph_preserves_falsy_hashable_ids():
+    data = {
+        "directed": True,
+        "multigraph": True,
+        "nodes": [{"id": 0}, {"id": 1}],
+        "links": [
+            {"source": 0, "target": 1, "key": "k1", "relation": "calls"},
+            {"source": 0, "target": 1, "key": "k2", "relation": "imports"},
+        ],
+    }
+    G = load_graph(data)
+    assert G.number_of_edges() == 2
+    assert G.has_edge(0, 1)
+
+
+def test_graph_attributes_round_trip_through_node_link_data():
+    """G.graph[...] attrs must survive node_link_data → load_graph round-trip.
+
+    NetworkX serializes graph-level metadata under data["graph"]; the loader
+    must read from there, not only from top-level keys.
+    """
+    import networkx as nx
+    from networkx.readwrite import json_graph
+
+    G_out = nx.DiGraph()
+    G_out.add_node("a")
+    G_out.add_node("b")
+    G_out.add_edge("a", "b", relation="calls")
+    G_out.graph["graphify_profile"] = {"graph_type": "digraph", "extra": "value"}
+    G_out.graph["hyperedges"] = [{"members": ["a", "b"]}]
+    G_out.graph["graphify_multigraph_diagnostics"] = {"collapsed": 0}
+
+    data = json_graph.node_link_data(G_out, edges="links")
+    G_in = load_graph(data)
+
+    assert G_in.graph["graphify_profile"]["extra"] == "value"
+    assert G_in.graph["hyperedges"] == [{"members": ["a", "b"]}]
+    assert G_in.graph["graphify_multigraph_diagnostics"] == {"collapsed": 0}
+
+
+def test_graph_attributes_round_trip_through_multigraph_node_link_data():
+    """Same round-trip guarantee for multigraph exports."""
+    import networkx as nx
+    from networkx.readwrite import json_graph
+
+    G_out = nx.MultiDiGraph()
+    G_out.add_node("a")
+    G_out.add_node("b")
+    G_out.add_edge("a", "b", key="k1", relation="calls")
+    G_out.add_edge("a", "b", key="k2", relation="imports")
+    G_out.graph["graphify_profile"] = {"graph_type": "multidigraph"}
+    G_out.graph["graphify_multigraph_diagnostics"] = {"exact_duplicates": 0}
+
+    data = json_graph.node_link_data(G_out, edges="links")
+    G_in = load_graph(data, require_capabilities=False)
+
+    assert G_in.graph["graphify_profile"]["graph_type"] == "multidigraph"
+    assert G_in.graph["graphify_multigraph_diagnostics"] == {"exact_duplicates": 0}
+    assert G_in.number_of_edges() == 2
+
+
+def test_load_skips_unhashable_node_ids(capsys):
+    """Corrupted graph.json with unhashable node ids must not crash; skip + warn."""
+    data = {
+        "directed": True,
+        "multigraph": False,
+        "nodes": [{"id": "ok"}, {"id": ["unhashable"]}, {"id": {"also": "unhashable"}}],
+        "links": [{"source": "ok", "target": "ok", "relation": "self"}],
+    }
+    G = load_graph(data)
+    captured = capsys.readouterr()
+    assert G.number_of_nodes() == 1
+    assert "unhashable" in captured.err.lower()
+
+
+def test_load_skips_edges_with_unhashable_endpoints():
+    """Edges with unhashable source/target must be skipped, not raise TypeError."""
+    data = {
+        "directed": True,
+        "multigraph": False,
+        "nodes": [{"id": "a"}, {"id": "b"}],
+        "links": [
+            {"source": "a", "target": "b", "relation": "calls"},
+            {"source": ["unhashable"], "target": "b", "relation": "bogus"},
+            {"source": "a", "target": {"also": "unhashable"}, "relation": "bogus"},
+        ],
+    }
+    G = load_graph(data)
+    assert G.number_of_edges() == 1
+
+
+def test_load_multigraph_skips_unhashable_endpoints():
+    """Same protection in the multigraph loader."""
+    data = {
+        "directed": True,
+        "multigraph": True,
+        "nodes": [{"id": "a"}, {"id": "b"}],
+        "links": [
+            {"source": "a", "target": "b", "key": "k1", "relation": "calls"},
+            {"source": ["bad"], "target": "b", "key": "k2", "relation": "calls"},
+        ],
+    }
+    G = load_graph(data, require_capabilities=False)
+    assert G.number_of_edges() == 1
+
+
+def test_load_multigraph_duplicate_keys_repaired_not_overwritten(capsys):
+    """Two parallel edges with same (src, tgt, key) but different attrs must both survive."""
+    data = {
+        "directed": True,
+        "multigraph": True,
+        "nodes": [{"id": "a"}, {"id": "b"}],
+        "links": [
+            {"source": "a", "target": "b", "key": "same", "relation": "calls", "context": "one"},
+            {"source": "a", "target": "b", "key": "same", "relation": "calls", "context": "two"},
+        ],
+    }
+    G = load_graph(data, require_capabilities=False)
+    assert G.number_of_edges() == 2, "duplicate-key edges must both be preserved via repair keys"
+    captured = capsys.readouterr()
+    assert "duplicate" in captured.err.lower()
+
+
+def test_load_graph_rejects_non_bool_multigraph_field():
+    """String 'false' or other non-bool 'multigraph' must be rejected, not coerced."""
+    data = {**_simple_links(), "multigraph": "false"}
+    with pytest.raises(TypeError, match="'multigraph' must be a boolean"):
+        load_graph(data)
+
+
+def test_load_graph_rejects_non_bool_directed_field():
+    data = {**_simple_links(), "directed": "true"}
+    with pytest.raises(TypeError, match="'directed' must be a boolean"):
+        load_graph(data)
+
+
+def test_load_multigraph_with_omitted_directed_does_not_warn(capsys):
+    """Missing 'directed' alongside 'multigraph: true' must not trigger the false warning."""
+    data = {
+        "multigraph": True,
+        "nodes": [{"id": "a"}, {"id": "b"}],
+        "links": [{"source": "a", "target": "b", "key": "k", "relation": "calls"}],
+    }
+    load_graph(data, require_capabilities=False)
+    captured = capsys.readouterr()
+    assert "multigraph=true but directed=false" not in captured.err
+
+
+def test_load_graph_overwrites_stale_graph_type_in_profile():
+    """Stale graph_type from serialized profile must not survive when loading."""
+    data = {
+        "multigraph": True,
+        "nodes": [{"id": "a"}, {"id": "b"}],
+        "links": [{"source": "a", "target": "b", "key": "k", "relation": "calls"}],
+        "graph": {"graphify_profile": {"graph_type": "simple"}},
+    }
+    G = load_graph(data, require_capabilities=False)
+    assert G.graph[GRAPHIFY_PROFILE_KEY]["graph_type"] == "multidigraph"
diff --git a/tests/test_multigraph_diagnostics.py b/tests/test_multigraph_diagnostics.py
index 8c39b8e23..6c49c58cf 100644
--- a/tests/test_multigraph_diagnostics.py
+++ b/tests/test_multigraph_diagnostics.py
@@ -147,7 +147,10 @@ def test_diagnose_extraction_handles_malformed_shapes_without_crashing() -> None
     assert summary["missing_endpoint_edges"] == 1
     assert summary["dangling_endpoint_edges"] == 2
     assert summary["valid_candidate_edges"] == 1
-    assert summary["post_build_error"].startswith("TypeError:")
+    assert summary["post_build_graph_type"] == "DiGraph"
+    assert summary["post_build_node_count"] == 2
+    assert summary["post_build_edge_count"] == 1
+    assert summary["post_build_error"] == ""
 
 
 def test_diagnose_extraction_handles_non_list_nodes_and_edges() -> None:
@@ -228,7 +231,8 @@ def test_format_diagnostic_report_includes_build_and_suppression_errors(
 
     report = format_diagnostic_report(summary)
 
-    assert "post_build_error: TypeError:" in report
+    assert "post_build_error:" not in report
+    assert "post_build_graph_type: DiGraph" in report
     assert "producer_suppression_error: file not found" in report
 
 
diff --git a/tests/test_projections.py b/tests/test_projections.py
new file mode 100644
index 000000000..140274eb5
--- /dev/null
+++ b/tests/test_projections.py
@@ -0,0 +1,202 @@
+from __future__ import annotations
+
+import networkx as nx
+import pytest
+from typing import Any, cast
+
+from graphify.projections import (
+    distinct_neighbor_degree,
+    edge_records_between,
+    edge_summary_between,
+    normalize_to_multidigraph,
+    project_for_callflow,
+    project_for_community,
+    project_for_context,
+    project_for_path,
+)
+
+
+def _parallel_graph() -> nx.MultiDiGraph:
+    graph = nx.MultiDiGraph()
+    graph.graph["graphify_profile"] = "test-profile"
+    graph.add_node("a", label="A")
+    graph.add_node("b", label="B")
+    graph.add_node("c", label="C")
+    graph.add_edge(
+        "a",
+        "b",
+        key="calls-low",
+        relation="calls",
+        confidence="INFERRED",
+        confidence_score=0.4,
+        source_file="src/a.py",
+        source_location="L10",
+        context="code",
+    )
+    graph.add_edge(
+        "a",
+        "b",
+        key="imports-high",
+        relation="imports",
+        confidence="EXTRACTED",
+        confidence_score=0.9,
+        source_file="src/a.py",
+        source_location="L2",
+        context="code",
+    )
+    graph.add_edge(
+        "b",
+        "a",
+        key="returns",
+        relation="returns",
+        confidence="AMBIGUOUS",
+        confidence_score=0.2,
+        source_file="src/b.py",
+        source_location="L5",
+        context="runtime",
+    )
+    graph.add_edge(
+        "b",
+        "c",
+        key="calls-c",
+        relation="calls",
+        confidence="EXTRACTED",
+        confidence_score=1.0,
+        source_file="src/b.py",
+        source_location="L7",
+        context="code",
+    )
+    graph.add_edge("c", "c", key="self", relation="calls", confidence="EXTRACTED")
+    return graph
+
+
+def test_project_for_community_returns_simple_weighted_copy() -> None:
+    projected = project_for_community(_parallel_graph(), weight_mode="count")
+
+    assert type(projected) is nx.Graph
+    assert projected.graph["graphify_profile"] == "test-profile"
+    assert set(projected.nodes) == {"a", "b", "c"}
+    assert not projected.has_edge("c", "c")
+    assert projected["a"]["b"]["weight"] == 3.0
+    assert projected["a"]["b"]["parallel_edge_count"] == 3
+    assert projected["b"]["c"]["weight"] == 1.0
+
+
+def test_project_for_community_supports_confidence_and_sum_weight_modes() -> None:
+    graph = _parallel_graph()
+
+    by_confidence = project_for_community(graph, weight_mode="confidence")
+    by_sum = project_for_community(graph, weight_mode="sum")
+
+    assert by_confidence["a"]["b"]["weight"] == 0.9
+    assert by_confidence["a"]["b"]["relation"] == "imports"
+    assert by_sum["a"]["b"]["weight"] == pytest.approx(1.5)
+    with pytest.raises(ValueError, match="weight_mode"):
+        project_for_community(graph, weight_mode=cast(Any, "unknown"))
+
+
+def test_project_for_path_uses_simple_graph_not_multigraph_view() -> None:
+    projected = project_for_path(_parallel_graph())
+
+    assert type(projected) is nx.Graph
+    assert not projected.is_multigraph()
+    assert projected.number_of_edges("a", "b") == 1
+    assert nx.shortest_path(projected, "a", "c") == ["a", "b", "c"]
+
+
+def test_project_for_callflow_preserves_direction_and_filters_relations() -> None:
+    projected = project_for_callflow(_parallel_graph(), relations=frozenset({"calls"}))
+
+    assert type(projected) is nx.DiGraph
+    assert set(projected.edges()) == {("a", "b"), ("b", "c")}
+    assert projected["a"]["b"]["relation"] == "calls"
+
+
+def test_project_for_callflow_recovers_src_tgt_from_undirected_edges() -> None:
+    graph = nx.Graph()
+    graph.add_node("display_a")
+    graph.add_node("display_b")
+    graph.add_edge("display_a", "display_b", _src="real_src", _tgt="real_tgt", relation="calls")
+
+    projected = project_for_callflow(graph)
+
+    assert set(projected.edges()) == {("real_src", "real_tgt")}
+    assert projected["real_src"]["real_tgt"]["relation"] == "calls"
+
+
+def test_project_for_context_preserves_multigraph_type_keys_and_metadata() -> None:
+    projected = project_for_context(_parallel_graph(), contexts=["code"])
+
+    assert isinstance(projected, nx.MultiDiGraph)
+    assert projected.graph["graphify_profile"] == "test-profile"
+    assert set(projected["a"]["b"]) == {"calls-low", "imports-high"}
+    assert "returns" not in projected.get_edge_data("b", "a", default={})
+
+
+def test_project_for_context_none_returns_copy_not_original() -> None:
+    graph = _parallel_graph()
+
+    projected = project_for_context(graph)
+
+    assert projected is not graph
+    assert isinstance(projected, nx.MultiDiGraph)
+    assert projected.number_of_edges() == graph.number_of_edges()
+
+
+def test_project_for_context_empty_filter_is_noop_copy() -> None:
+    graph = _parallel_graph()
+
+    projected = project_for_context(graph, contexts=[])
+
+    assert projected is not graph
+    assert projected.number_of_edges() == graph.number_of_edges()
+
+
+def test_edge_records_between_returns_copies_from_both_directions() -> None:
+    graph = _parallel_graph()
+
+    records = edge_records_between(graph, "a", "b")
+
+    assert [record["relation"] for record in records] == ["imports", "calls", "returns"]
+    records[0]["relation"] = "mutated"
+    assert graph["a"]["b"]["imports-high"]["relation"] == "imports"
+
+
+def test_edge_summary_between_counts_and_picks_representative() -> None:
+    summary = edge_summary_between(_parallel_graph(), "a", "b")
+
+    assert summary["count"] == 3
+    assert summary["relations"] == ["calls", "imports", "returns"]
+    assert summary["confidences"] == ["AMBIGUOUS", "EXTRACTED", "INFERRED"]
+    assert summary["representative"]["relation"] == "imports"
+
+
+def test_distinct_neighbor_degree_does_not_count_parallel_edges() -> None:
+    graph = _parallel_graph()
+
+    assert graph.degree("a") == 3
+    assert distinct_neighbor_degree(graph, "a") == 1
+    assert distinct_neighbor_degree(graph, "missing") == 0
+
+
+def test_normalize_to_multidigraph_preserves_parallel_keys_and_simple_edges() -> None:
+    graph = nx.MultiGraph()
+    graph.graph["name"] = "mixed"
+    graph.add_node("a", label="A")
+    graph.add_node("b", label="B")
+    graph.add_edge("a", "b", key="one", relation="calls")
+    graph.add_edge("a", "b", key="two", relation="imports")
+
+    normalized = normalize_to_multidigraph(graph)
+
+    assert isinstance(normalized, nx.MultiDiGraph)
+    assert normalized.graph["name"] == "mixed"
+    assert set(normalized["a"]["b"]) == {"one", "two"}
+
+    simple = nx.Graph()
+    simple.add_edge("x", "y", relation="uses")
+    simple_normalized = normalize_to_multidigraph(simple)
+
+    assert isinstance(simple_normalized, nx.MultiDiGraph)
+    assert simple_normalized.number_of_edges("x", "y") == 1
+    assert next(iter(simple_normalized["x"]["y"].values()))["relation"] == "uses"
diff --git a/tests/test_validate.py b/tests/test_validate.py
index 396e90c8c..e5f9cd50f 100644
--- a/tests/test_validate.py
+++ b/tests/test_validate.py
@@ -7,26 +7,37 @@
         {"id": "n2", "label": "Bar", "file_type": "document", "source_file": "bar.md"},
     ],
     "edges": [
-        {"source": "n1", "target": "n2", "relation": "references",
-         "confidence": "EXTRACTED", "source_file": "foo.py", "weight": 1.0},
+        {
+            "source": "n1",
+            "target": "n2",
+            "relation": "references",
+            "confidence": "EXTRACTED",
+            "source_file": "foo.py",
+            "weight": 1.0,
+        },
     ],
 }
 
+
 def test_valid_passes():
     assert validate_extraction(VALID) == []
 
+
 def test_missing_nodes_key():
     errors = validate_extraction({"edges": []})
     assert any("nodes" in e for e in errors)
 
+
 def test_missing_edges_key():
     errors = validate_extraction({"nodes": []})
     assert any("edges" in e for e in errors)
 
+
 def test_not_a_dict():
     errors = validate_extraction([])
     assert len(errors) == 1
 
+
 def test_invalid_file_type():
     data = {
         "nodes": [{"id": "n1", "label": "X", "file_type": "video", "source_file": "x.mp4"}],
@@ -35,6 +46,7 @@ def test_invalid_file_type():
     errors = validate_extraction(data)
     assert any("file_type" in e for e in errors)
 
+
 def test_invalid_confidence():
     data = {
         "nodes": [
@@ -42,35 +54,87 @@ def test_invalid_confidence():
             {"id": "n2", "label": "B", "file_type": "code", "source_file": "b.py"},
         ],
         "edges": [
-            {"source": "n1", "target": "n2", "relation": "calls",
-             "confidence": "CERTAIN", "source_file": "a.py"},
+            {
+                "source": "n1",
+                "target": "n2",
+                "relation": "calls",
+                "confidence": "CERTAIN",
+                "source_file": "a.py",
+            },
         ],
     }
     errors = validate_extraction(data)
     assert any("confidence" in e for e in errors)
 
+
 def test_dangling_edge_source():
     data = {
         "nodes": [{"id": "n1", "label": "A", "file_type": "code", "source_file": "a.py"}],
         "edges": [
-            {"source": "missing_id", "target": "n1", "relation": "calls",
-             "confidence": "EXTRACTED", "source_file": "a.py"},
+            {
+                "source": "missing_id",
+                "target": "n1",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "a.py",
+            },
         ],
     }
     errors = validate_extraction(data)
     assert any("source" in e and "missing_id" in e for e in errors)
 
+
 def test_dangling_edge_target():
     data = {
         "nodes": [{"id": "n1", "label": "A", "file_type": "code", "source_file": "a.py"}],
         "edges": [
-            {"source": "n1", "target": "ghost", "relation": "calls",
-             "confidence": "EXTRACTED", "source_file": "a.py"},
+            {
+                "source": "n1",
+                "target": "ghost",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "a.py",
+            },
         ],
     }
     errors = validate_extraction(data)
     assert any("target" in e and "ghost" in e for e in errors)
 
+
+def test_unhashable_edge_source_reported_without_node_ids():
+    data = {
+        "nodes": [],
+        "edges": [
+            {
+                "source": ["bad"],
+                "target": "n1",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "a.py",
+            },
+        ],
+    }
+    errors = validate_extraction(data)
+    assert any("source is unhashable" in e for e in errors)
+
+
+def test_unhashable_edge_target_reported_without_node_ids():
+    data = {
+        "nodes": [],
+        "edges": [
+            {
+                "source": "n1",
+                "target": {"bad": "target"},
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "a.py",
+            },
+        ],
+    }
+    errors = validate_extraction(data)
+    assert any("target is unhashable" in e for e in errors)
+
+
 def test_missing_node_field():
     data = {
         "nodes": [{"id": "n1", "label": "A", "source_file": "a.py"}],  # missing file_type
@@ -79,9 +143,24 @@ def test_missing_node_field():
     errors = validate_extraction(data)
     assert any("file_type" in e for e in errors)
 
+
 def test_assert_valid_raises_on_errors():
     with pytest.raises(ValueError, match="error"):
         assert_valid({"nodes": [], "edges": [], "oops": True, **{"nodes": "bad"}})
 
+
 def test_assert_valid_passes_silently():
     assert_valid(VALID)  # should not raise
+
+
+def test_validate_extraction_does_not_typeerror_on_non_list_nodes():
+    """validate_extraction must report 'nodes must be a list' without raising TypeError."""
+    from graphify.validate import validate_extraction
+    errors = validate_extraction({"nodes": 123, "edges": []})
+    assert any("'nodes' must be a list" in e for e in errors)
+
+
+def test_validate_extraction_does_not_typeerror_on_non_list_edges():
+    from graphify.validate import validate_extraction
+    errors = validate_extraction({"nodes": [], "edges": 42})
+    assert any("'edges' must be a list" in e for e in errors)

From b7e05f75257b3178e9eeaf51e17cc6b50c0d08c1 Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Wed, 27 May 2026 15:33:28 -0500
Subject: [PATCH 02/21] chore: no-waiver lint/type/security cleanup + upstream
 v8 rebase

Rebased onto upstream/v8 (740382a). Conflict in graphify/extract.py
resolved preserving both upstream (TypeScript abstract class, C# base-list,
Java interface inheritance, defusedxml) and local multigraph behavior.

Full-repo ruff/pyright/security pass: 0 errors, 0 warnings, all .AUDIT
gates clean. Added --no-viz support to graphify update. Raised
AUDIT_COPILOT_MAX_DIFF_BYTES default from 120KB to 2MB. Updated AGENTS.md
(no-waiver rule, conflict rule, removed stale memory block) and added
CLAUDE.md with durable project policy.

1507 passed, ruff clean, pyright clean.

gost
---
 .AUDIT/copilot-local-review.sh            |  252 ++
 CLAUDE.md                                 |   32 +
 graphify/__init__.py                      |    1 +
 graphify/__main__.py                      |  957 ++++--
 graphify/affected.py                      |    7 +-
 graphify/analyze.py                       |  297 +-
 graphify/benchmark.py                     |   30 +-
 graphify/cache.py                         |   10 +-
 graphify/callflow_html.py                 |  616 +++-
 graphify/cluster.py                       |   17 +-
 graphify/detect.py                        |  389 ++-
 graphify/export.py                        |  323 +-
 graphify/extract.py                       | 3591 ++++++++++++++-------
 graphify/global_graph.py                  |    7 +-
 graphify/google_workspace.py              |   27 +-
 graphify/graph_loader.py                  |    7 +-
 graphify/hooks.py                         |   32 +-
 graphify/ingest.py                        |   55 +-
 graphify/llm.py                           |   76 +-
 graphify/mcp_ingest.py                    |   63 +-
 graphify/prs.py                           |  271 +-
 graphify/report.py                        |   61 +-
 graphify/security.py                      |   48 +-
 graphify/semantic_cleanup.py              |    4 +-
 graphify/serve.py                         |  281 +-
 graphify/symbol_resolution.py             |   25 +-
 graphify/transcribe.py                    |   33 +-
 graphify/tree_html.py                     |   43 +-
 graphify/watch.py                         |  208 +-
 graphify/wiki.py                          |   26 +-
 pyproject.toml                            |    4 +-
 tests/bench_extract.py                    |   12 +-
 tests/test_affected_cli.py                |   16 +-
 tests/test_analyze.py                     |  473 ++-
 tests/test_astro_extraction.py            |    1 +
 tests/test_benchmark.py                   |   54 +-
 tests/test_build.py                       |   95 +-
 tests/test_cache.py                       |   11 +-
 tests/test_callflow_html.py               |  105 +-
 tests/test_charmap_encoding.py            |  122 +-
 tests/test_chunking.py                    |   46 +-
 tests/test_claude_cli_backend.py          |   63 +-
 tests/test_claude_md.py                   |   11 +-
 tests/test_cli_export.py                  |   41 +-
 tests/test_cluster.py                     |    9 +-
 tests/test_confidence.py                  |   69 +-
 tests/test_dedup.py                       |   19 +-
 tests/test_detect.py                      |  100 +-
 tests/test_devin.py                       |   30 +-
 tests/test_dotnet.py                      |   16 +-
 tests/test_explain_cli.py                 |   66 +-
 tests/test_export.py                      |   38 +-
 tests/test_extract.py                     |  237 +-
 tests/test_extract_cli.py                 |   26 +-
 tests/test_global_graph.py                |   89 +-
 tests/test_google_workspace.py            |    1 -
 tests/test_hooks.py                       |    7 +-
 tests/test_hypergraph.py                  |   68 +-
 tests/test_import_extension_resolution.py |  202 +-
 tests/test_incremental.py                 |    2 +-
 tests/test_ingest.py                      |    6 +-
 tests/test_install.py                     |   66 +-
 tests/test_install_strings.py             |    8 +-
 tests/test_install_upgrade.py             |   19 +-
 tests/test_js_import_resolution.py        |   35 +-
 tests/test_languages.py                   |  392 ++-
 tests/test_llm_backends.py                |  120 +-
 tests/test_llm_parser.py                  |    2 -
 tests/test_mcp_ingest.py                  |  136 +-
 tests/test_multilang.py                   |   84 +-
 tests/test_ollama.py                      |    6 +-
 tests/test_pascal.py                      |  104 +-
 tests/test_path_cli.py                    |   36 +-
 tests/test_pipeline.py                    |   22 +-
 tests/test_prs.py                         |   41 +-
 tests/test_python_import_resolution.py    |   19 +-
 tests/test_query_cli.py                   |    1 +
 tests/test_rationale.py                   |   86 +-
 tests/test_report.py                      |   50 +-
 tests/test_security.py                    |   34 +-
 tests/test_semantic_similarity.py         |   84 +-
 tests/test_serve.py                       |   74 +-
 tests/test_transcribe.py                  |   10 +-
 tests/test_validate.py                    |    4 +-
 tests/test_watch.py                       |   65 +-
 tests/test_wiki.py                        |   23 +-
 uv.lock                                   |   10 +-
 87 files changed, 8051 insertions(+), 3308 deletions(-)
 create mode 100755 .AUDIT/copilot-local-review.sh
 create mode 100644 CLAUDE.md

diff --git a/.AUDIT/copilot-local-review.sh b/.AUDIT/copilot-local-review.sh
new file mode 100755
index 000000000..46613fbac
--- /dev/null
+++ b/.AUDIT/copilot-local-review.sh
@@ -0,0 +1,252 @@
+#!/usr/bin/env bash
+set -uo pipefail
+# Owner: Codex
+#
+# Private local gate: run GitHub Copilot CLI against the staged diff before a
+# local commit. This is an early-warning review, not a replacement for the
+# origin/upstream PR review gate.
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
+cd "$ROOT" || exit 2
+
+if [[ -f "$ROOT/.venv/bin/activate" ]]; then
+  # shellcheck disable=SC1091
+  source "$ROOT/.venv/bin/activate"
+fi
+
+if [[ "${AUDIT_PRIVATE_GUARD:-1}" != "0" && -x "$SCRIPT_DIR/private-guard.sh" ]]; then
+  "$SCRIPT_DIR/private-guard.sh" --quiet || exit "$?"
+fi
+
+usage() {
+  cat <<'EOF'
+Usage: .AUDIT/copilot-local-review.sh [--cached|--worktree|--base <ref>] [--advisory] [--max-diff-bytes <n>]
+
+Runs GitHub Copilot CLI against a local diff and blocks when Copilot reports
+actionable findings.
+
+Modes:
+  --cached        review staged changes; default and intended for pre-commit
+  --worktree      review unstaged worktree changes
+  --base <ref>    review changes from merge-base(<ref>, HEAD) to HEAD
+  --advisory      always exit 0 after saving/reporting review output
+
+Environment:
+  AUDIT_COPILOT_MAX_DIFF_BYTES  default 120000; local review blocks above this
+  AUDIT_COPILOT_REPORT_DIR      default .AUDIT/reports
+
+Output contract:
+  Copilot must emit exactly one decision line:
+    LOCAL_COPILOT_REVIEW_DECISION: PASS
+    LOCAL_COPILOT_REVIEW_DECISION: BLOCK
+
+PASS means no actionable correctness/security/regression/test issue was found.
+BLOCK or ambiguous output stops the commit gate.
+EOF
+}
+
+mode="cached"
+base_ref=""
+advisory=0
+max_diff_bytes="${AUDIT_COPILOT_MAX_DIFF_BYTES:-2000000}"
+
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --cached)
+      mode="cached"
+      ;;
+    --worktree)
+      mode="worktree"
+      ;;
+    --base)
+      if [[ $# -lt 2 ]]; then
+        echo "[copilot-local-review] USAGE_ERROR: --base requires a ref" >&2
+        exit 2
+      fi
+      mode="base"
+      base_ref="$2"
+      shift
+      ;;
+    --base=*)
+      mode="base"
+      base_ref="${1#--base=}"
+      ;;
+    --advisory)
+      advisory=1
+      ;;
+    --max-diff-bytes)
+      if [[ $# -lt 2 ]]; then
+        echo "[copilot-local-review] USAGE_ERROR: --max-diff-bytes requires a number" >&2
+        exit 2
+      fi
+      max_diff_bytes="$2"
+      shift
+      ;;
+    --max-diff-bytes=*)
+      max_diff_bytes="${1#--max-diff-bytes=}"
+      ;;
+    --help|-h)
+      usage
+      exit 0
+      ;;
+    *)
+      echo "[copilot-local-review] USAGE_ERROR: unknown option '$1'" >&2
+      usage >&2
+      exit 2
+      ;;
+  esac
+  shift
+done
+
+case "$max_diff_bytes" in
+  ''|*[!0-9]*)
+    echo "[copilot-local-review] USAGE_ERROR: max diff bytes must be a non-negative integer" >&2
+    exit 2
+    ;;
+esac
+
+if ! command -v copilot >/dev/null 2>&1; then
+  echo "[copilot-local-review] BLOCKED: GitHub Copilot CLI is not installed or not on PATH" >&2
+  echo "[copilot-local-review] Install/authenticate Copilot CLI or set AUDIT_SKIP_LOCAL_COPILOT=1 for an explicit bypass." >&2
+  exit 1
+fi
+
+diff_file="$(mktemp)"
+stat_file="$(mktemp)"
+trap 'rm -f "$diff_file" "$stat_file"' EXIT
+
+if [[ "$mode" == "cached" ]]; then
+  git diff --cached --stat >"$stat_file"
+  git diff --cached --no-ext-diff --binary --unified=80 >"$diff_file"
+elif [[ "$mode" == "worktree" ]]; then
+  git diff --stat >"$stat_file"
+  git diff --no-ext-diff --binary --unified=80 >"$diff_file"
+else
+  merge_base="$(git merge-base HEAD "$base_ref")" || {
+    echo "[copilot-local-review] GIT_CONTEXT_ERROR: could not merge-base HEAD and $base_ref" >&2
+    exit 2
+  }
+  git diff --stat "$merge_base..HEAD" >"$stat_file"
+  git diff --no-ext-diff --binary --unified=80 "$merge_base..HEAD" >"$diff_file"
+fi
+
+if [[ ! -s "$diff_file" ]]; then
+  echo "[copilot-local-review] clean: no diff to review for mode=$mode"
+  exit 0
+fi
+
+if grep -Eq '^(Binary files |GIT binary patch)' "$diff_file"; then
+  echo "[copilot-local-review] BLOCKED: binary diff present; Copilot local text review cannot inspect it reliably" >&2
+  exit 1
+fi
+
+diff_bytes="$(wc -c <"$diff_file" | tr -d '[:space:]')"
+if (( max_diff_bytes > 0 && diff_bytes > max_diff_bytes )); then
+  echo "[copilot-local-review] BLOCKED: diff is ${diff_bytes} bytes, above local review limit ${max_diff_bytes}" >&2
+  echo "[copilot-local-review] Split the commit or use origin PR review as the authoritative review surface." >&2
+  exit 1
+fi
+
+report_dir="${AUDIT_COPILOT_REPORT_DIR:-$SCRIPT_DIR/reports}"
+mkdir -p "$report_dir"
+report_file="$report_dir/$(date +%Y%m%d-%H%M%S)-copilot-local-review.md"
+
+prompt_file="$(mktemp)"
+trap 'rm -f "$diff_file" "$stat_file" "$prompt_file"' EXIT
+
+{
+  cat <<'EOF'
+You are GitHub Copilot reviewing a local staged diff before commit.
+
+Review only the supplied diff. Do not edit files. Do not run tools. Do not ask
+questions. Focus on correctness, security, data loss, regression risk, broken
+tests, missing tests for changed behavior, and user-visible behavior. Ignore
+pure style unless it can create functional risk.
+
+Your response MUST include exactly one decision line:
+
+LOCAL_COPILOT_REVIEW_DECISION: PASS
+
+or:
+
+LOCAL_COPILOT_REVIEW_DECISION: BLOCK
+
+Use PASS only if you find no actionable issue. Use BLOCK if there is any
+actionable issue or if the diff is too incomplete to review safely.
+
+After the decision line, provide concise findings with file/path references
+when blocking. If passing, provide a brief explanation of the risk areas you
+checked.
+
+Diff stat:
+EOF
+  cat "$stat_file"
+  printf '\nDiff:\n```diff\n'
+  cat "$diff_file"
+  printf '\n```\n'
+} >"$prompt_file"
+
+echo "[copilot-local-review] invoking Copilot CLI mode=$mode diff_bytes=$diff_bytes"
+set +e
+copilot_output="$(
+  copilot \
+    -p "$(cat "$prompt_file")" \
+    --disable-builtin-mcps \
+    --disallow-temp-dir \
+    --no-color \
+    --output-format text 2>&1
+)"
+copilot_rc=$?
+set -e
+
+{
+  echo "# Local Copilot Review"
+  echo
+  echo "- Mode: \`$mode\`"
+  [[ -n "$base_ref" ]] && echo "- Base: \`$base_ref\`"
+  echo "- Diff bytes: \`$diff_bytes\`"
+  echo "- Copilot exit code: \`$copilot_rc\`"
+  echo "- Started: \`$(date -u +%Y-%m-%dT%H:%M:%SZ)\`"
+  echo
+  echo "## Diff Stat"
+  echo
+  echo '```text'
+  cat "$stat_file"
+  echo '```'
+  echo
+  echo "## Copilot Output"
+  echo
+  echo '```text'
+  printf '%s\n' "$copilot_output"
+  echo '```'
+} >"$report_file"
+
+printf '%s\n' "$copilot_output"
+echo "[copilot-local-review] report=$report_file"
+
+if (( advisory == 1 )); then
+  echo "[copilot-local-review] advisory mode: not blocking"
+  exit 0
+fi
+
+if (( copilot_rc != 0 )); then
+  echo "[copilot-local-review] BLOCKED: Copilot CLI exited $copilot_rc" >&2
+  exit 1
+fi
+
+pass_count="$(grep -c '^LOCAL_COPILOT_REVIEW_DECISION: PASS$' "$report_file" || true)"
+block_count="$(grep -c '^LOCAL_COPILOT_REVIEW_DECISION: BLOCK$' "$report_file" || true)"
+
+if [[ "$pass_count" == "1" && "$block_count" == "0" ]]; then
+  echo "[copilot-local-review] clean: Copilot returned PASS"
+  exit 0
+fi
+
+if [[ "$block_count" != "0" ]]; then
+  echo "[copilot-local-review] BLOCKED: Copilot returned BLOCK" >&2
+  exit 1
+fi
+
+echo "[copilot-local-review] BLOCKED: Copilot output did not contain the required PASS decision line" >&2
+exit 1
diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 100644
index 000000000..6f0ce17fe
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1,32 @@
+## graphify
+
+This project has a graphify knowledge graph at graphify-out/.
+
+Rules:
+- Before answering architecture or codebase questions, read graphify-out/GRAPH_REPORT.md for god nodes and community structure
+- If graphify-out/wiki/index.md exists, navigate it instead of reading raw files
+- After modifying code files in this session, run `graphify update .` to keep the graph current (AST-only, no API cost)
+
+## Remote and Rebase Policy
+
+Rules:
+- `origin` is the active development target unless the user explicitly says otherwise.
+- `upstream` is read-only/reference by default. Do not open, update, or describe work as ready for upstream contribution unless the user explicitly reopens that path.
+- Still follow upstream Graphify closely: before planning implementation slices or rebases, fetch `origin` and `upstream`, verify current branch/HEAD/ahead-behind state, and compare live upstream changes that touch the same files.
+- Prefer rebasing/pulling useful upstream changes into the local/origin development branch when doing so preserves the Graphify direction and reduces future drift.
+- Upstream synchronization is preauthorized only for this Graphify/vampyre checkout (`/Users/jonathandirks/Development/vampyre`). Do not generalize this permission to any other repo or project. In this checkout, do not ask for human approval before stashing local work, fetching `origin`/`upstream`, comparing upstream changes, rebasing onto useful upstream updates, resolving conflicts by comparison, reapplying the stash, and rerunning verification.
+- If upstream changes are harmful, irrelevant, or incompatible with the local Graphify direction, skip them only after comparing the change and recording the reason in the handoff.
+- Helper scripts for stash/fetch/rebase workflows may be created only in local-only paths that are excluded from git, such as `.agent-local/`; do not add those helper scripts to GitHub-bound history.
+- Absolute conflict rule: never resolve merge, rebase, cherry-pick, or generated-artifact conflicts by blindly choosing ours/theirs or by assuming the local branch is always correct. Always inspect and compare both sides, identify the intended behavior on each side, preserve useful upstream behavior unless it is deliberately incompatible with the local plan, and document the resolution in the handoff or commit notes.
+- If conflict behavior is unclear after comparing both sides and the relevant tests/docs, stop and ask before resolving.
+- For this checkout only, publishing a verified local/origin development branch is part of closing an upstream sync. Before pushing, run the full local verification stack, including local Copilot review, tests, lint/type/security/warning gates, pre-commit/pre-push gates, and graph refresh when applicable.
+- After verification passes, fetch `origin` and compare. If local `HEAD` contains `origin` or deliberately supersedes it because origin-only commits are already integrated, duplicated by rebase, or intentionally rejected with the reason recorded, update `origin` with a normal push for fast-forwards or `--force-with-lease` for rewritten history.
+- Do not leave `origin` behind a verified local stack merely because the local branch was rebased. If the lease fails, or if origin contains new valuable or unclear work that is not in local, stop, fetch, compare, and integrate or ask before publishing.
+- This does not authorize publishing to `upstream`, changing remotes, or generalizing the Graphify/vampyre publication rule to any other repo.
+
+## Verification and Failure Policy
+
+Rules:
+- Do not waive test failures, skips, warnings, linter findings, gate failures, graphify update failures, or audit findings as "pre-existing." If an issue is reproducible in the current workspace, it becomes the current agent's active task until resolved or until the user explicitly redirects the work.
+- Do not mark a slice, PR, branch, or handoff as ready while any reproduced failure, skip, warning, or gate finding remains unresolved.
+- If another agent reports an issue as pre-existing, independently reproduce it, root-cause it, fix it when it is in scope, and record the invalid waiver in the handoff or audit notes.
diff --git a/graphify/__init__.py b/graphify/__init__.py
index e34c938ef..e0f698b6e 100644
--- a/graphify/__init__.py
+++ b/graphify/__init__.py
@@ -22,6 +22,7 @@ def __getattr__(name):
     }
     if name in _map:
         import importlib
+
         mod_name, attr = _map[name]
         mod = importlib.import_module(mod_name)
         return getattr(mod, attr)
diff --git a/graphify/__main__.py b/graphify/__main__.py
index 380572890..2d916b911 100644
--- a/graphify/__main__.py
+++ b/graphify/__main__.py
@@ -1,4 +1,5 @@
 """graphify CLI - `graphify install` sets up the Claude Code skill."""
+
 from __future__ import annotations
 import json
 import os
@@ -10,6 +11,7 @@
 
 try:
     from importlib.metadata import version as _pkg_version
+
     __version__ = _pkg_version("graphifyy")
 except Exception:
     __version__ = "unknown"
@@ -35,6 +37,7 @@ def _enforce_graph_size_cap_or_exit(gp: Path) -> None:
     and let the ``ValueError`` propagate.
     """
     from graphify.security import check_graph_file_size_cap
+
     try:
         check_graph_file_size_cap(gp)
     except ValueError as exc:
@@ -48,11 +51,16 @@ def _check_skill_version(skill_dst: Path) -> None:
     if not version_file.exists():
         return
     if not skill_dst.exists():
-        print("  warning: skill dir exists but SKILL.md is missing. Run 'graphify install' to repair.")
+        print(
+            "  warning: skill dir exists but SKILL.md is missing. Run 'graphify install' to repair."
+        )
         return
     installed = version_file.read_text(encoding="utf-8").strip()
     if installed != __version__:
-        print(f"  warning: skill is from graphify {installed}, package is {__version__}. Run 'graphify install' to update.", file=sys.stderr)
+        print(
+            f"  warning: skill is from graphify {installed}, package is {__version__}. Run 'graphify install' to update.",
+            file=sys.stderr,
+        )
 
 
 def _refresh_all_version_stamps() -> None:
@@ -68,7 +76,9 @@ def _refresh_all_version_stamps() -> None:
             vf.write_text(__version__, encoding="utf-8")
 
 
-def _platform_skill_destination(platform_name: str, *, project: bool = False, project_dir: Path | None = None) -> Path:
+def _platform_skill_destination(
+    platform_name: str, *, project: bool = False, project_dir: Path | None = None
+) -> Path:
     """Return the skill destination for a platform and scope."""
     if platform_name == "gemini":
         if project:
@@ -102,9 +112,13 @@ def _platform_skill_destination(platform_name: str, *, project: bool = False, pr
     return Path.home() / cfg["skill_dst"]
 
 
-def _copy_skill_file(platform_name: str, *, project: bool = False, project_dir: Path | None = None) -> Path:
+def _copy_skill_file(
+    platform_name: str, *, project: bool = False, project_dir: Path | None = None
+) -> Path:
     """Copy a packaged skill file and write its version stamp."""
-    skill_file = "skill.md" if platform_name == "gemini" else _PLATFORM_CONFIG[platform_name]["skill_file"]
+    skill_file = (
+        "skill.md" if platform_name == "gemini" else _PLATFORM_CONFIG[platform_name]["skill_file"]
+    )
     skill_src = Path(__file__).parent / skill_file
     if not skill_src.exists():
         print(f"error: {skill_file} not found in package - reinstall graphify", file=sys.stderr)
@@ -127,7 +141,9 @@ def _copy_skill_file(platform_name: str, *, project: bool = False, project_dir:
     return skill_dst
 
 
-def _remove_skill_file(platform_name: str, *, project: bool = False, project_dir: Path | None = None) -> bool:
+def _remove_skill_file(
+    platform_name: str, *, project: bool = False, project_dir: Path | None = None
+) -> bool:
     """Remove a platform skill file and its version stamp without touching other scopes."""
     skill_dst = _platform_skill_destination(platform_name, project=project, project_dir=project_dir)
     removed = False
@@ -187,6 +203,7 @@ def _print_project_git_add_hint(paths: list[Path]) -> None:
     print("Project-scoped install. Add to version control:")
     print(f"  git add {' '.join(unique)}")
 
+
 _SETTINGS_HOOK = {
     # Claude Code v2.1.117+ removed dedicated Grep/Glob tools; searches now go through Bash.
     # We match on Bash and inspect the command string to avoid firing on every shell call.
@@ -195,10 +212,10 @@ def _print_project_git_add_hint(paths: list[Path]) -> None:
         {
             "type": "command",
             "command": (
-                "CMD=$(python3 -c \""
+                'CMD=$(python3 -c "'
                 "import json,sys; d=json.load(sys.stdin); "
                 "print(d.get('tool_input',d).get('command',''))\" 2>/dev/null || true); "
-                "case \"$CMD\" in "
+                'case "$CMD" in '
                 r"*grep*|*rg\ *|*ripgrep*|*find\ *|*fd\ *|*ack\ *|*ag\ *) "
                 "  [ -f graphify-out/graph.json ] && "
                 r"""  echo '{"hookSpecificOutput":{"hookEventName":"PreToolUse","additionalContext":"graphify: knowledge graph at graphify-out/. For focused questions, run `graphify query \"<question>\"` (scoped subgraph, usually much smaller than GRAPH_REPORT.md) instead of grepping raw files. Read GRAPH_REPORT.md only for broad architecture context."}}' """
@@ -209,13 +226,14 @@ def _print_project_git_add_hint(paths: list[Path]) -> None:
     ],
 }
 
+
 def _skill_registration(skill_path: str = "~/.claude/skills/graphify/SKILL.md") -> str:
     return (
         "\n# graphify\n"
         f"- **graphify** (`{skill_path}`) "
         "- any input to knowledge graph. Trigger: `/graphify`\n"
         "When the user types `/graphify`, invoke the Skill tool "
-        "with `skill: \"graphify\"` before doing anything else.\n"
+        'with `skill: "graphify"` before doing anything else.\n'
     )
 
 
@@ -360,7 +378,9 @@ def _replace_or_append_section(content: str, marker: str, new_section: str) -> s
     return out
 
 
-def install(platform: str = "claude", *, project: bool = False, project_dir: Path | None = None) -> None:
+def install(
+    platform: str = "claude", *, project: bool = False, project_dir: Path | None = None
+) -> None:
     if platform == "gemini":
         gemini_install(project_dir=project_dir, project=project)
         return
@@ -383,12 +403,18 @@ def install(platform: str = "claude", *, project: bool = False, project_dir: Pat
 
     if cfg["claude_md"]:
         # Register in the matching Claude Code scope.
-        claude_md = (project_dir / ".claude" / "CLAUDE.md") if project else Path.home() / ".claude" / "CLAUDE.md"
-        registration = _skill_registration(".claude/skills/graphify/SKILL.md" if project else "~/.claude/skills/graphify/SKILL.md")
+        claude_md = (
+            (project_dir / ".claude" / "CLAUDE.md")
+            if project
+            else Path.home() / ".claude" / "CLAUDE.md"
+        )
+        registration = _skill_registration(
+            ".claude/skills/graphify/SKILL.md" if project else "~/.claude/skills/graphify/SKILL.md"
+        )
         if claude_md.exists():
             content = claude_md.read_text(encoding="utf-8")
             if "graphify" in content:
-                print(f"  CLAUDE.md        ->  already registered (no change)")
+                print("  CLAUDE.md        ->  already registered (no change)")
             else:
                 claude_md.write_text(content.rstrip() + registration, encoding="utf-8")
                 print(f"  CLAUDE.md        ->  skill registered in {claude_md}")
@@ -495,9 +521,7 @@ def gemini_install(project_dir: Path | None = None, *, project: bool = False) ->
 
     if target.exists():
         content = target.read_text(encoding="utf-8")
-        new_content = _replace_or_append_section(
-            content, _GEMINI_MD_MARKER, _GEMINI_MD_SECTION
-        )
+        new_content = _replace_or_append_section(content, _GEMINI_MD_MARKER, _GEMINI_MD_SECTION)
     else:
         new_content = _GEMINI_MD_SECTION
 
@@ -511,7 +535,13 @@ def gemini_install(project_dir: Path | None = None, *, project: bool = False) ->
     # wording) is replaced on upgrade.
     _install_gemini_hook(project_dir)
     if project:
-        _print_project_git_add_hint([_project_scope_root(skill_dst, project_dir), project_dir / "GEMINI.md", project_dir / ".gemini"])
+        _print_project_git_add_hint(
+            [
+                _project_scope_root(skill_dst, project_dir),
+                project_dir / "GEMINI.md",
+                project_dir / ".gemini",
+            ]
+        )
     print()
     print("Gemini CLI will now check the knowledge graph before answering")
     print("codebase questions and rebuild it after code changes.")
@@ -521,7 +551,9 @@ def _install_gemini_hook(project_dir: Path) -> None:
     settings_path = project_dir / ".gemini" / "settings.json"
     settings_path.parent.mkdir(parents=True, exist_ok=True)
     try:
-        settings = json.loads(settings_path.read_text(encoding="utf-8")) if settings_path.exists() else {}
+        settings = (
+            json.loads(settings_path.read_text(encoding="utf-8")) if settings_path.exists() else {}
+        )
     except json.JSONDecodeError:
         settings = {}
     before_tool = settings.setdefault("hooks", {}).setdefault("BeforeTool", [])
@@ -615,7 +647,9 @@ def vscode_install(project_dir: Path | None = None) -> None:
             print(f"  {instructions}  ->  already configured (no change)")
         else:
             instructions.write_text(new_content, encoding="utf-8")
-            print(f"  {instructions}  ->  graphify section {'updated' if _VSCODE_INSTRUCTIONS_MARKER in content else 'added'}")
+            print(
+                f"  {instructions}  ->  graphify section {'updated' if _VSCODE_INSTRUCTIONS_MARKER in content else 'added'}"
+            )
     else:
         instructions.write_text(_VSCODE_INSTRUCTIONS_SECTION, encoding="utf-8")
         print(f"  {instructions}  ->  created")
@@ -720,7 +754,7 @@ def _kiro_install(project_dir: Path) -> None:
     steering_dir.mkdir(parents=True, exist_ok=True)
     steering_dst = steering_dir / "graphify.md"
     if steering_dst.exists() and steering_dst.read_text(encoding="utf-8") == _KIRO_STEERING:
-        print(f"  .kiro/steering/graphify.md  ->  already configured (no change)")
+        print("  .kiro/steering/graphify.md  ->  already configured (no change)")
     else:
         # File is wholly graphify-owned. Overwrite on upgrade so older
         # report-first wording does not silently linger (issue #580).
@@ -801,11 +835,15 @@ def _antigravity_install(project_dir: Path) -> None:
     print("Antigravity will now check the knowledge graph before answering")
     print("codebase questions. Run /graphify first to build the graph.")
     print()
-    print("To enable full MCP architecture navigation, add this to ~/.gemini/antigravity/mcp_config.json:")
+    print(
+        "To enable full MCP architecture navigation, add this to ~/.gemini/antigravity/mcp_config.json:"
+    )
     print('  "graphify": {')
     print('    "command": "uv",')
-    print('    "args": ["run", "--with", "graphifyy", "--with", "mcp", "-m", "graphify.serve", "${workspace.path}/graphify-out/graph.json"]')
-    print('  }')
+    print(
+        '    "args": ["run", "--with", "graphifyy", "--with", "mcp", "-m", "graphify.serve", "${workspace.path}/graphify-out/graph.json"]'
+    )
+    print("  }")
 
 
 def _antigravity_uninstall(project_dir: Path, *, project: bool = False) -> None:
@@ -1029,6 +1067,7 @@ def _resolve_graphify_exe() -> str:
     not on PATH (e.g. VS Code Codex extension on Windows).
     """
     import shutil
+
     found = shutil.which("graphify")
     if found:
         return found
@@ -1086,7 +1125,7 @@ def _uninstall_codex_hook(project_dir: Path) -> None:
     filtered = [h for h in pre_tool if "graphify" not in str(h)]
     existing["hooks"]["PreToolUse"] = filtered
     hooks_path.write_text(json.dumps(existing, indent=2), encoding="utf-8")
-    print(f"  .codex/hooks.json  ->  PreToolUse hook removed")
+    print("  .codex/hooks.json  ->  PreToolUse hook removed")
 
 
 def _agents_install(project_dir: Path, platform: str) -> None:
@@ -1095,9 +1134,7 @@ def _agents_install(project_dir: Path, platform: str) -> None:
 
     if target.exists():
         content = target.read_text(encoding="utf-8")
-        new_content = _replace_or_append_section(
-            content, _AGENTS_MD_MARKER, _AGENTS_MD_SECTION
-        )
+        new_content = _replace_or_append_section(content, _AGENTS_MD_MARKER, _AGENTS_MD_SECTION)
     else:
         new_content = _AGENTS_MD_SECTION
 
@@ -1136,7 +1173,17 @@ def _project_install(platform_name: str, project_dir: Path | None = None) -> Non
     elif platform_name == "kiro":
         _kiro_install(project_dir)
         _print_project_git_add_hint([project_dir / ".kiro"])
-    elif platform_name in ("aider", "amp", "codex", "opencode", "claw", "droid", "trae", "trae-cn", "hermes"):
+    elif platform_name in (
+        "aider",
+        "amp",
+        "codex",
+        "opencode",
+        "claw",
+        "droid",
+        "trae",
+        "trae-cn",
+        "hermes",
+    ):
         skill_dst = _copy_skill_file(platform_name, project=True, project_dir=project_dir)
         _agents_install(project_dir, platform_name)
         hint_paths = [_project_scope_root(skill_dst, project_dir), project_dir / "AGENTS.md"]
@@ -1148,7 +1195,9 @@ def _project_install(platform_name: str, project_dir: Path | None = None) -> Non
     elif platform_name == "devin":
         skill_dst = _copy_skill_file("devin", project=True, project_dir=project_dir)
         _devin_rules_install(project_dir)
-        _print_project_git_add_hint([_project_scope_root(skill_dst, project_dir), project_dir / ".windsurf"])
+        _print_project_git_add_hint(
+            [_project_scope_root(skill_dst, project_dir), project_dir / ".windsurf"]
+        )
     elif platform_name in ("copilot", "pi", "antigravity", "kimi"):
         skill_dst = _copy_skill_file(platform_name, project=True, project_dir=project_dir)
         _print_project_git_add_hint([_project_scope_root(skill_dst, project_dir)])
@@ -1169,7 +1218,17 @@ def _project_uninstall(platform_name: str, project_dir: Path | None = None) -> N
         _cursor_uninstall(project_dir)
     elif platform_name == "kiro":
         _kiro_uninstall(project_dir)
-    elif platform_name in ("aider", "amp", "codex", "opencode", "claw", "droid", "trae", "trae-cn", "hermes"):
+    elif platform_name in (
+        "aider",
+        "amp",
+        "codex",
+        "opencode",
+        "claw",
+        "droid",
+        "trae",
+        "trae-cn",
+        "hermes",
+    ):
         _remove_skill_file(platform_name, project=True, project_dir=project_dir)
         _agents_uninstall(project_dir, platform=platform_name)
         if platform_name == "codex":
@@ -1236,9 +1295,7 @@ def claude_install(project_dir: Path | None = None) -> None:
 
     if target.exists():
         content = target.read_text(encoding="utf-8")
-        new_content = _replace_or_append_section(
-            content, _CLAUDE_MD_MARKER, _CLAUDE_MD_SECTION
-        )
+        new_content = _replace_or_append_section(content, _CLAUDE_MD_MARKER, _CLAUDE_MD_SECTION)
     else:
         new_content = _CLAUDE_MD_SECTION
 
@@ -1273,10 +1330,14 @@ def _install_claude_hook(project_dir: Path) -> None:
     hooks = settings.setdefault("hooks", {})
     pre_tool = hooks.setdefault("PreToolUse", [])
 
-    hooks["PreToolUse"] = [h for h in pre_tool if not (h.get("matcher") in ("Glob|Grep", "Bash") and "graphify" in str(h))]
+    hooks["PreToolUse"] = [
+        h
+        for h in pre_tool
+        if not (h.get("matcher") in ("Glob|Grep", "Bash") and "graphify" in str(h))
+    ]
     hooks["PreToolUse"].append(_SETTINGS_HOOK)
     settings_path.write_text(json.dumps(settings, indent=2), encoding="utf-8")
-    print(f"  .claude/settings.json  ->  PreToolUse hook registered")
+    print("  .claude/settings.json  ->  PreToolUse hook registered")
 
 
 def _uninstall_claude_hook(project_dir: Path) -> None:
@@ -1289,12 +1350,16 @@ def _uninstall_claude_hook(project_dir: Path) -> None:
     except json.JSONDecodeError:
         return
     pre_tool = settings.get("hooks", {}).get("PreToolUse", [])
-    filtered = [h for h in pre_tool if not (h.get("matcher") in ("Glob|Grep", "Bash") and "graphify" in str(h))]
+    filtered = [
+        h
+        for h in pre_tool
+        if not (h.get("matcher") in ("Glob|Grep", "Bash") and "graphify" in str(h))
+    ]
     if len(filtered) == len(pre_tool):
         return
     settings["hooks"]["PreToolUse"] = filtered
     settings_path.write_text(json.dumps(settings, indent=2), encoding="utf-8")
-    print(f"  .claude/settings.json  ->  PreToolUse hook removed")
+    print("  .claude/settings.json  ->  PreToolUse hook removed")
 
 
 def uninstall_all(project_dir: Path | None = None, purge: bool = False) -> None:
@@ -1317,18 +1382,20 @@ def uninstall_all(project_dir: Path | None = None, purge: bool = False) -> None:
     # Git hook
     try:
         from graphify.hooks import uninstall as hook_uninstall
+
         result = hook_uninstall(pd)
         if result:
             print(result)
-    except Exception:
-        pass
+    except Exception as exc:
+        print(f"[graphify] warning: could not uninstall git hook: {exc}", file=sys.stderr)
 
     if purge:
         import shutil as _shutil
+
         out = pd / "graphify-out"
         if out.exists():
             _shutil.rmtree(out)
-            print(f"\n  graphify-out/  ->  deleted (--purge)")
+            print("\n  graphify-out/  ->  deleted (--purge)")
         else:
             print("\n  graphify-out/  ->  not found (nothing to purge)")
 
@@ -1403,7 +1470,7 @@ def _clone_repo(url: str, branch: str | None = None, out_dir: Path | None = None
         cmd = ["git", "-C", str(dest), "pull"]
         if branch:
             cmd += ["origin", "--", branch]
-        result = _sp.run(cmd, capture_output=True, text=True)
+        result = _sp.run(cmd, capture_output=True, text=True)  # nosec B603
         if result.returncode != 0:
             print(f"warning: git pull failed:\n{result.stderr}", file=sys.stderr)
     else:
@@ -1413,7 +1480,7 @@ def _clone_repo(url: str, branch: str | None = None, out_dir: Path | None = None
         if branch:
             cmd += ["--branch", branch]
         cmd += ["--", git_url, str(dest)]
-        result = _sp.run(cmd, capture_output=True, text=True)
+        result = _sp.run(cmd, capture_output=True, text=True)  # nosec B603
         if result.returncode != 0:
             print(f"error: git clone failed:\n{result.stderr}", file=sys.stderr)
             sys.exit(1)
@@ -1446,12 +1513,14 @@ def main() -> None:
         print("Usage: graphify <command>")
         print()
         print("Commands:")
-        print("  install [--platform P]  copy skill to platform config dir (claude|windows|codex|opencode|aider|claw|droid|trae|trae-cn|gemini|cursor|antigravity|hermes|kiro|pi|devin)")
+        print(
+            "  install [--platform P]  copy skill to platform config dir (claude|windows|codex|opencode|aider|claw|droid|trae|trae-cn|gemini|cursor|antigravity|hermes|kiro|pi|devin)"
+        )
         print("  uninstall               remove graphify from all detected platforms in one shot")
         print("    --purge                 also delete graphify-out/ directory")
-        print("  path \"A\" \"B\"            shortest path between two nodes in graph.json")
+        print('  path "A" "B"            shortest path between two nodes in graph.json')
         print("    --graph <path>          path to graph.json (default graphify-out/graph.json)")
-        print("  explain \"X\"             plain-language explanation of a node and its neighbors")
+        print('  explain "X"             plain-language explanation of a node and its neighbors')
         print("    --graph <path>          path to graph.json (default graphify-out/graph.json)")
         print("  diagnose multigraph    report same-endpoint edge collapse risk in graph.json")
         print("    --graph <path>          path to graph/extraction JSON")
@@ -1463,40 +1532,64 @@ def main() -> None:
         print("                            (default follows JSON directed flag;")
         print("                             raw extraction with no flag defaults directed)")
         print("    --extract-path PATH     extractor source for suppression scan")
-        print("  clone <github-url>      clone a GitHub repo locally and print its path for /graphify")
-        print("  merge-driver <base> <current> <other>  git merge driver: union-merge two graph.json files (set up via hook install)")
-        print("  merge-graphs <g1> <g2>  merge two or more graph.json files into one cross-repo graph")
+        print(
+            "  clone <github-url>      clone a GitHub repo locally and print its path for /graphify"
+        )
+        print(
+            "  merge-driver <base> <current> <other>  git merge driver: union-merge two graph.json files (set up via hook install)"
+        )
+        print(
+            "  merge-graphs <g1> <g2>  merge two or more graph.json files into one cross-repo graph"
+        )
         print("    --out <path>            output path (default: graphify-out/merged-graph.json)")
         print("    --branch <branch>       checkout a specific branch (default: repo default)")
-        print("    --out <dir>             clone to a custom directory (default: ~/.graphify/repos/<owner>/<repo>)")
+        print(
+            "    --out <dir>             clone to a custom directory (default: ~/.graphify/repos/<owner>/<repo>)"
+        )
         print("  add <url>               fetch a URL and save it to ./raw, then update the graph")
-        print("    --author \"Name\"         tag the author of the content")
-        print("    --contributor \"Name\"    tag who added it to the corpus")
+        print('    --author "Name"         tag the author of the content')
+        print('    --contributor "Name"    tag who added it to the corpus')
         print("    --dir <path>            target directory (default: ./raw)")
         print("  watch <path>            watch a folder and rebuild the graph on code changes")
-        print("  update <path>           re-extract code files and update the graph (no LLM needed)")
-        print("    --force                 overwrite graph.json even if the rebuild has fewer nodes")
-        print("                            (also: GRAPHIFY_FORCE=1 env var; use after refactors that delete code)")
+        print(
+            "  update <path>           re-extract code files and update the graph (no LLM needed)"
+        )
+        print(
+            "    --force                 overwrite graph.json even if the rebuild has fewer nodes"
+        )
+        print(
+            "                            (also: GRAPHIFY_FORCE=1 env var; use after refactors that delete code)"
+        )
         print("    --no-cluster            skip clustering, write raw extraction only")
-        print("  cluster-only <path>     rerun clustering on an existing graph.json and regenerate report")
-        print("    --no-viz                skip graph.html generation (useful for >5000 node graphs / CI)")
-        print("    --graph <path>          path to graph.json (default <path>/graphify-out/graph.json)")
-        print("  query \"<question>\"       BFS traversal of graph.json for a question")
+        print(
+            "  cluster-only <path>     rerun clustering on an existing graph.json and regenerate report"
+        )
+        print(
+            "    --no-viz                skip graph.html generation (useful for >5000 node graphs / CI)"
+        )
+        print(
+            "    --graph <path>          path to graph.json (default <path>/graphify-out/graph.json)"
+        )
+        print('  query "<question>"       BFS traversal of graph.json for a question')
         print("    --dfs                   use depth-first instead of breadth-first")
         print("    --context C             explicit edge-context filter (repeatable)")
         print("    --budget N              cap output at N tokens (default 2000)")
         print("    --graph <path>          path to graph.json (default graphify-out/graph.json)")
-        print("  affected \"X\"             reverse traversal to find nodes impacted by X")
+        print('  affected "X"             reverse traversal to find nodes impacted by X')
         print("    --relation R            edge relation to traverse in reverse (repeatable)")
         print("    --depth N               reverse traversal depth (default 2)")
         print("    --graph <path>          path to graph.json (default graphify-out/graph.json)")
-        print("  save-result             save a Q&A result to graphify-out/memory/ for graph feedback loop")
+        print(
+            "  save-result             save a Q&A result to graphify-out/memory/ for graph feedback loop"
+        )
         print("    --question Q            the question asked")
         print("    --answer A              the answer to save")
         print("    --type T                query type: query|path_query|explain (default: query)")
         print("    --nodes N1 N2 ...       source node labels cited in the answer")
         print("    --memory-dir DIR        memory directory (default: graphify-out/memory)")
-        print("  check-update <path>     check needs_update flag and notify if semantic re-extraction is pending (cron-safe)")
+        print(
+            "  check-update <path>     check needs_update flag and notify if semantic re-extraction is pending (cron-safe)"
+        )
         print("  tree                    emit a D3 v7 collapsible-tree HTML for graph.json")
         print("    --graph PATH            path to graph.json (default graphify-out/graph.json)")
         print("    --output HTML           output path (default graphify-out/GRAPH_TREE.html)")
@@ -1504,44 +1597,70 @@ def main() -> None:
         print("    --max-children N        cap children per node (default 200)")
         print("    --top-k-edges N         per-symbol outbound edges in inspector (default 12)")
         print("    --label NAME            project label in header")
-        print("  extract <path>          headless full extraction (AST + semantic LLM) for CI/scripts")
-        print("    --backend B             gemini|kimi|claude|openai|deepseek|ollama (default: whichever API key is set)")
+        print(
+            "  extract <path>          headless full extraction (AST + semantic LLM) for CI/scripts"
+        )
+        print(
+            "    --backend B             gemini|kimi|claude|openai|deepseek|ollama (default: whichever API key is set)"
+        )
         print("    --model M               override backend default model")
         print("    --mode deep             aggressive INFERRED-edge semantic extraction")
         print("    --max-workers N         AST extraction subprocess count (default: cpu_count)")
-        print("    --token-budget N        per-chunk token cap for semantic extraction (default: 60000)")
-        print("    --max-concurrency N     parallel semantic chunks in flight (default: 4; set 1 for local LLMs)")
-        print("    --api-timeout S         per-request timeout in seconds for the LLM client (default: 600)")
-        print("    --out DIR               output dir (default: <path>); writes <DIR>/graphify-out/")
-        print("    --google-workspace      export .gdoc/.gsheet/.gslides shortcuts via gws before extraction")
+        print(
+            "    --token-budget N        per-chunk token cap for semantic extraction (default: 60000)"
+        )
+        print(
+            "    --max-concurrency N     parallel semantic chunks in flight (default: 4; set 1 for local LLMs)"
+        )
+        print(
+            "    --api-timeout S         per-request timeout in seconds for the LLM client (default: 600)"
+        )
+        print(
+            "    --out DIR               output dir (default: <path>); writes <DIR>/graphify-out/"
+        )
+        print(
+            "    --google-workspace      export .gdoc/.gsheet/.gslides shortcuts via gws before extraction"
+        )
         print("    --no-cluster            skip clustering, write raw extraction only")
         print("    --global                also merge the resulting graph into the global graph")
         print("    --as <tag>              repo tag for --global (default: target directory name)")
-        print("  global add <graph.json>  add/update a project graph in the global graph (~/.graphify/global-graph.json)")
+        print(
+            "  global add <graph.json>  add/update a project graph in the global graph (~/.graphify/global-graph.json)"
+        )
         print("    --as <tag>               repo tag (default: parent directory name)")
         print("  global remove <tag>      remove a repo's nodes from the global graph")
         print("  global list              list repos in the global graph")
         print("  global path              print path to the global graph file")
         print("  benchmark [graph.json]  measure token reduction vs naive full-corpus approach")
         print("  export callflow-html    emit Mermaid-based architecture/call-flow HTML")
-        print("  hook install            install post-commit/post-checkout git hooks (all platforms)")
+        print(
+            "  hook install            install post-commit/post-checkout git hooks (all platforms)"
+        )
         print("  hook uninstall          remove git hooks")
         print("  hook status             check if git hooks are installed")
         print("  gemini install          write GEMINI.md section + BeforeTool hook (Gemini CLI)")
         print("  gemini uninstall        remove GEMINI.md section + BeforeTool hook")
         print("  cursor install          write .cursor/rules/graphify.mdc (Cursor)")
         print("  cursor uninstall        remove .cursor/rules/graphify.mdc")
-        print("  claude install          write graphify section to CLAUDE.md + PreToolUse hook (Claude Code)")
+        print(
+            "  claude install          write graphify section to CLAUDE.md + PreToolUse hook (Claude Code)"
+        )
         print("  claude uninstall        remove graphify section from CLAUDE.md + PreToolUse hook")
         print("  codex install           write graphify section to AGENTS.md (Codex)")
         print("  codex uninstall         remove graphify section from AGENTS.md")
-        print("  opencode install        write graphify section to AGENTS.md + tool.execute.before plugin (OpenCode)")
+        print(
+            "  opencode install        write graphify section to AGENTS.md + tool.execute.before plugin (OpenCode)"
+        )
         print("  opencode uninstall      remove graphify section from AGENTS.md + plugin")
         print("  aider install           write graphify section to AGENTS.md (Aider)")
         print("  aider uninstall         remove graphify section from AGENTS.md")
-        print("  copilot install         copy graphify skill to ~/.copilot/skills (GitHub Copilot CLI)")
+        print(
+            "  copilot install         copy graphify skill to ~/.copilot/skills (GitHub Copilot CLI)"
+        )
         print("  copilot uninstall       remove graphify skill from ~/.copilot/skills")
-        print("  vscode install          configure VS Code Copilot Chat (skill + .github/copilot-instructions.md)")
+        print(
+            "  vscode install          configure VS Code Copilot Chat (skill + .github/copilot-instructions.md)"
+        )
         print("  vscode uninstall        remove VS Code Copilot Chat configuration")
         print("  claw install            write graphify section to AGENTS.md (OpenClaw)")
         print("  claw uninstall          remove graphify section from AGENTS.md")
@@ -1551,15 +1670,23 @@ def main() -> None:
         print("  trae uninstall         remove graphify section from AGENTS.md")
         print("  trae-cn install         write graphify section to AGENTS.md (Trae CN)")
         print("  trae-cn uninstall      remove graphify section from AGENTS.md")
-        print("  antigravity install     write .agents/rules + .agents/workflows + skill (Google Antigravity)")
+        print(
+            "  antigravity install     write .agents/rules + .agents/workflows + skill (Google Antigravity)"
+        )
         print("  antigravity uninstall   remove .agents/rules, .agents/workflows, and skill")
         print("  hermes install          write skill to ~/.hermes/skills/graphify/ (Hermes)")
         print("  hermes uninstall        remove skill from ~/.hermes/skills/graphify/")
-        print("  kiro install            write skill to .kiro/skills/graphify/ + steering file (Kiro IDE/CLI)")
+        print(
+            "  kiro install            write skill to .kiro/skills/graphify/ + steering file (Kiro IDE/CLI)"
+        )
         print("  kiro uninstall          remove skill + steering file")
-        print("  pi install              write skill to ~/.pi/agent/skills/graphify/ (Pi coding agent)")
+        print(
+            "  pi install              write skill to ~/.pi/agent/skills/graphify/ (Pi coding agent)"
+        )
         print("  pi uninstall            remove skill from ~/.pi/agent/skills/graphify/")
-        print("  devin install           write skill to ~/.config/devin/skills/graphify/ (Devin CLI)")
+        print(
+            "  devin install           write skill to ~/.config/devin/skills/graphify/ (Devin CLI)"
+        )
         print("  devin uninstall         remove skill from ~/.config/devin/skills/graphify/")
         print()
         return
@@ -1573,7 +1700,7 @@ def main() -> None:
     # "install"/"uninstall" which have their own per-subcommand help handlers.
     _FREE_TEXT_CMDS = {"query", "explain", "path", "save-result", "install", "uninstall"}
     if cmd not in _FREE_TEXT_CMDS and any(a in {"-h", "--help", "-?"} for a in sys.argv[2:]):
-        print(f"Run 'graphify --help' for full usage.")
+        print("Run 'graphify --help' for full usage.")
         return
 
     if cmd == "install":
@@ -1899,9 +2026,15 @@ def main() -> None:
                 sys.exit(1)
     elif cmd == "prs":
         from graphify.prs import cmd_prs
+
         cmd_prs(sys.argv[2:])
     elif cmd == "hook":
-        from graphify.hooks import install as hook_install, uninstall as hook_uninstall, status as hook_status
+        from graphify.hooks import (
+            install as hook_install,
+            uninstall as hook_uninstall,
+            status as hook_status,
+        )
+
         subcmd = sys.argv[2] if len(sys.argv) > 2 else ""
         if subcmd == "install":
             print(hook_install(Path(".")))
@@ -1914,11 +2047,14 @@ def main() -> None:
             sys.exit(1)
     elif cmd == "query":
         if len(sys.argv) < 3:
-            print("Usage: graphify query \"<question>\" [--dfs] [--context C] [--budget N] [--graph path]", file=sys.stderr)
+            print(
+                'Usage: graphify query "<question>" [--dfs] [--context C] [--budget N] [--graph path]',
+                file=sys.stderr,
+            )
             sys.exit(1)
         from graphify.serve import _query_graph_text
-        from graphify.security import sanitize_label
         from networkx.readwrite import json_graph
+
         question = sys.argv[2]
         use_dfs = "--dfs" in sys.argv
         budget = 2000
@@ -1931,14 +2067,14 @@ def main() -> None:
                 try:
                     budget = int(args[i + 1])
                 except ValueError:
-                    print(f"error: --budget must be an integer", file=sys.stderr)
+                    print("error: --budget must be an integer", file=sys.stderr)
                     sys.exit(1)
                 i += 2
             elif args[i].startswith("--budget="):
                 try:
                     budget = int(args[i].split("=", 1)[1])
                 except ValueError:
-                    print(f"error: --budget must be an integer", file=sys.stderr)
+                    print("error: --budget must be an integer", file=sys.stderr)
                     sys.exit(1)
                 i += 1
             elif args[i] == "--context" and i + 1 < len(args):
@@ -1948,7 +2084,8 @@ def main() -> None:
                 context_filters.append(args[i].split("=", 1)[1])
                 i += 1
             elif args[i] == "--graph" and i + 1 < len(args):
-                graph_path = args[i + 1]; i += 2
+                graph_path = args[i + 1]
+                i += 2
             else:
                 i += 1
         gp = Path(graph_path).resolve()
@@ -1956,12 +2093,13 @@ def main() -> None:
             print(f"error: graph file not found: {gp}", file=sys.stderr)
             sys.exit(1)
         if not gp.suffix == ".json":
-            print(f"error: graph file must be a .json file", file=sys.stderr)
+            print("error: graph file must be a .json file", file=sys.stderr)
             sys.exit(1)
         _enforce_graph_size_cap_or_exit(gp)
         try:
             import json as _json
             import networkx as _nx
+
             _raw = _json.loads(gp.read_text(encoding="utf-8"))
             if "links" not in _raw and "edges" in _raw:
                 _raw = dict(_raw, links=_raw["edges"])
@@ -1984,9 +2122,13 @@ def main() -> None:
         )
     elif cmd == "affected":
         if len(sys.argv) < 3:
-            print("Usage: graphify affected \"<node-or-label>\" [--relation R] [--depth N] [--graph path]", file=sys.stderr)
+            print(
+                'Usage: graphify affected "<node-or-label>" [--relation R] [--depth N] [--graph path]',
+                file=sys.stderr,
+            )
             sys.exit(1)
         from graphify.affected import DEFAULT_AFFECTED_RELATIONS, format_affected, load_graph
+
         query = sys.argv[2]
         graph_path = "graphify-out/graph.json"
         depth = 2
@@ -2045,6 +2187,7 @@ def main() -> None:
     elif cmd == "save-result":
         # graphify save-result --question Q --answer A --type T [--nodes N1 N2 ...]
         import argparse as _ap
+
         p = _ap.ArgumentParser(prog="graphify save-result")
         p.add_argument("--question", required=True)
         p.add_argument("--answer", required=True)
@@ -2053,6 +2196,7 @@ def main() -> None:
         p.add_argument("--memory-dir", default="graphify-out/memory")
         opts = p.parse_args(sys.argv[2:])
         from graphify.ingest import save_query_result as _sqr
+
         out = _sqr(
             question=opts.question,
             answer=opts.answer,
@@ -2063,11 +2207,12 @@ def main() -> None:
         print(f"Saved to {out}")
     elif cmd == "path":
         if len(sys.argv) < 4:
-            print("Usage: graphify path \"<source>\" \"<target>\" [--graph path]", file=sys.stderr)
+            print('Usage: graphify path "<source>" "<target>" [--graph path]', file=sys.stderr)
             sys.exit(1)
         from graphify.serve import _score_nodes
         from networkx.readwrite import json_graph
         import networkx as _nx
+
         source_label = sys.argv[2]
         target_label = sys.argv[3]
         graph_path = _default_graph_path()
@@ -2125,6 +2270,7 @@ def main() -> None:
         hops = len(path_nodes) - 1
         segments = []
         from graphify.build import edge_data
+
         for i in range(len(path_nodes) - 1):
             u, v = path_nodes[i], path_nodes[i + 1]
             # Check which direction the stored edge points.
@@ -2147,10 +2293,11 @@ def main() -> None:
 
     elif cmd == "explain":
         if len(sys.argv) < 3:
-            print("Usage: graphify explain \"<node>\" [--graph path]", file=sys.stderr)
+            print('Usage: graphify explain "<node>" [--graph path]', file=sys.stderr)
             sys.exit(1)
         from graphify.serve import _find_node
         from networkx.readwrite import json_graph
+
         label = sys.argv[2]
         graph_path = _default_graph_path()
         args = sys.argv[3:]
@@ -2184,6 +2331,7 @@ def main() -> None:
         print(f"  Community: {d.get('community', '')}")
         print(f"  Degree:    {G.degree(nid)}")
         from graphify.build import edge_data
+
         connections: list[tuple[str, str, dict]] = []  # (direction, neighbor_id, edge_data)
         for nb in G.successors(nid):
             connections.append(("out", nb, edge_data(G, nid, nb)))
@@ -2296,9 +2444,13 @@ def main() -> None:
 
     elif cmd == "add":
         if len(sys.argv) < 3:
-            print("Usage: graphify add <url> [--author Name] [--contributor Name] [--dir ./raw]", file=sys.stderr)
+            print(
+                "Usage: graphify add <url> [--author Name] [--contributor Name] [--dir ./raw]",
+                file=sys.stderr,
+            )
             sys.exit(1)
         from graphify.ingest import ingest as _ingest
+
         url = sys.argv[2]
         author: str | None = None
         contributor: str | None = None
@@ -2307,11 +2459,14 @@ def main() -> None:
         i = 0
         while i < len(args):
             if args[i] == "--author" and i + 1 < len(args):
-                author = args[i + 1]; i += 2
+                author = args[i + 1]
+                i += 2
             elif args[i] == "--contributor" and i + 1 < len(args):
-                contributor = args[i + 1]; i += 2
+                contributor = args[i + 1]
+                i += 2
             elif args[i] == "--dir" and i + 1 < len(args):
-                target_dir = Path(args[i + 1]); i += 2
+                target_dir = Path(args[i + 1])
+                i += 2
             else:
                 i += 1
         try:
@@ -2328,6 +2483,7 @@ def main() -> None:
             print(f"error: path not found: {watch_path}", file=sys.stderr)
             sys.exit(1)
         from graphify.watch import watch as _watch
+
         try:
             _watch(watch_path)
         except ImportError as exc:
@@ -2349,26 +2505,36 @@ def main() -> None:
         while i_arg < len(args):
             a = args[i_arg]
             if a == "--graph" and i_arg + 1 < len(args):
-                graph_override = Path(args[i_arg + 1]); i_arg += 2
+                graph_override = Path(args[i_arg + 1])
+                i_arg += 2
             elif a == "--resolution" and i_arg + 1 < len(args):
-                co_resolution = float(args[i_arg + 1]); i_arg += 2
+                co_resolution = float(args[i_arg + 1])
+                i_arg += 2
             elif a.startswith("--resolution="):
-                co_resolution = float(a.split("=", 1)[1]); i_arg += 1
+                co_resolution = float(a.split("=", 1)[1])
+                i_arg += 1
             elif a == "--exclude-hubs" and i_arg + 1 < len(args):
-                co_exclude_hubs = float(args[i_arg + 1]); i_arg += 2
+                co_exclude_hubs = float(args[i_arg + 1])
+                i_arg += 2
             elif a.startswith("--exclude-hubs="):
-                co_exclude_hubs = float(a.split("=", 1)[1]); i_arg += 1
+                co_exclude_hubs = float(a.split("=", 1)[1])
+                i_arg += 1
             elif a == "--no-viz" or a.startswith("--min-community-size="):
                 i_arg += 1
             elif a.startswith("--"):
                 i_arg += 1
             elif watch_path is None:
-                watch_path = Path(a); i_arg += 1
+                watch_path = Path(a)
+                i_arg += 1
             else:
                 i_arg += 1
         if watch_path is None:
             watch_path = Path(".")
-        graph_json = graph_override if graph_override is not None else watch_path / "graphify-out" / "graph.json"
+        graph_json = (
+            graph_override
+            if graph_override is not None
+            else watch_path / "graphify-out" / "graph.json"
+        )
         if not graph_json.exists():
             print(f"error: no graph found at {graph_json} - run /graphify first", file=sys.stderr)
             sys.exit(1)
@@ -2378,6 +2544,7 @@ def main() -> None:
         from graphify.analyze import god_nodes, surprising_connections, suggest_questions
         from graphify.report import generate
         from graphify.export import to_json, to_html
+
         print("Loading existing graph...")
         _enforce_graph_size_cap_or_exit(graph_json)
         _raw = json.loads(graph_json.read_text(encoding="utf-8"))
@@ -2406,7 +2573,10 @@ def main() -> None:
         labels_path = out / ".graphify_labels.json"
         if labels_path.exists():
             try:
-                labels = {int(k): v for k, v in json.loads(labels_path.read_text(encoding="utf-8")).items()}
+                labels = {
+                    int(k): v
+                    for k, v in json.loads(labels_path.read_text(encoding="utf-8")).items()
+                }
             except Exception:
                 labels = {cid: f"Community {cid}" for cid in communities}
         else:
@@ -2414,16 +2584,30 @@ def main() -> None:
         questions = suggest_questions(G, communities, labels)
         tokens = {"input": 0, "output": 0}
         from graphify.export import _git_head as _gh
+
         _commit = _gh()
-        report = generate(G, communities, cohesion, labels, gods, surprises,
-                          {"warning": "cluster-only mode — file stats not available"},
-                          tokens, str(watch_path), suggested_questions=questions,
-                          min_community_size=min_community_size, built_at_commit=_commit)
+        report = generate(
+            G,
+            communities,
+            cohesion,
+            labels,
+            gods,
+            surprises,
+            {"warning": "cluster-only mode — file stats not available"},
+            tokens,
+            str(watch_path),
+            suggested_questions=questions,
+            min_community_size=min_community_size,
+            built_at_commit=_commit,
+        )
         (out / "GRAPH_REPORT.md").write_text(report, encoding="utf-8")
         from graphify.export import backup_if_protected as _backup
+
         _backup(out)
         to_json(G, communities, str(out / "graph.json"))
-        labels_path.write_text(json.dumps({str(k): v for k, v in labels.items()}, ensure_ascii=False), encoding="utf-8")
+        labels_path.write_text(
+            json.dumps({str(k): v for k, v in labels.items()}, ensure_ascii=False), encoding="utf-8"
+        )
 
         # Mirror watch.py pattern: gate to_html so core outputs (graph.json +
         # GRAPH_REPORT.md) always land. Honor --no-viz explicitly; otherwise
@@ -2433,20 +2617,27 @@ def main() -> None:
         if no_viz:
             if html_target.exists():
                 html_target.unlink()
-            print(f"Done - {len(communities)} communities. GRAPH_REPORT.md and graph.json updated (--no-viz; graph.html removed).")
+            print(
+                f"Done - {len(communities)} communities. GRAPH_REPORT.md and graph.json updated (--no-viz; graph.html removed)."
+            )
         else:
             try:
                 to_html(G, communities, str(html_target), community_labels=labels or None)
-                print(f"Done - {len(communities)} communities. GRAPH_REPORT.md, graph.json and graph.html updated.")
+                print(
+                    f"Done - {len(communities)} communities. GRAPH_REPORT.md, graph.json and graph.html updated."
+                )
             except ValueError as viz_err:
                 if html_target.exists():
                     html_target.unlink()
                 print(f"Skipped graph.html: {viz_err}")
-                print(f"Done - {len(communities)} communities. GRAPH_REPORT.md and graph.json updated.")
+                print(
+                    f"Done - {len(communities)} communities. GRAPH_REPORT.md and graph.json updated."
+                )
 
     elif cmd == "update":
         force = os.environ.get("GRAPHIFY_FORCE", "").lower() in ("1", "true", "yes")
         no_cluster = False
+        no_viz = False
         args = sys.argv[2:]
         watch_arg: str | None = None
         for a in args:
@@ -2456,6 +2647,9 @@ def main() -> None:
             if a == "--no-cluster":
                 no_cluster = True
                 continue
+            if a == "--no-viz":
+                no_viz = True
+                continue
             if a.startswith("-"):
                 print(f"error: unknown update option: {a}", file=sys.stderr)
                 sys.exit(2)
@@ -2477,13 +2671,22 @@ def main() -> None:
             print(f"error: path not found: {watch_path}", file=sys.stderr)
             sys.exit(1)
         from graphify.watch import _rebuild_code
+
         print(f"Re-extracting code files in {watch_path} (no LLM needed)...")
         # Interactive CLI: block on the per-repo lock rather than skip, so the
         # user sees their explicit `graphify update` complete instead of
         # exiting silently when a hook-driven rebuild happens to be running.
-        ok = _rebuild_code(watch_path, force=force, no_cluster=no_cluster, block_on_lock=True)
+        ok = _rebuild_code(
+            watch_path,
+            force=force,
+            no_cluster=no_cluster,
+            no_viz=no_viz,
+            block_on_lock=True,
+        )
         if ok:
-            print("Code graph updated. For doc/paper/image changes run /graphify --update in your AI assistant.")
+            print(
+                "Code graph updated. For doc/paper/image changes run /graphify --update in your AI assistant."
+            )
             if not (
                 os.environ.get("GEMINI_API_KEY")
                 or os.environ.get("GOOGLE_API_KEY")
@@ -2491,7 +2694,9 @@ def main() -> None:
                 or os.environ.get("DEEPSEEK_API_KEY")
                 or os.environ.get("GRAPHIFY_NO_TIPS")
             ):
-                print("Tip: set GEMINI_API_KEY or GOOGLE_API_KEY to use Gemini for semantic extraction.")
+                print(
+                    "Tip: set GEMINI_API_KEY or GOOGLE_API_KEY to use Gemini for semantic extraction."
+                )
         else:
             print("Nothing to update or rebuild failed - check output above.", file=sys.stderr)
             sys.exit(1)
@@ -2506,6 +2711,7 @@ def main() -> None:
             print("Usage: graphify check-update <path>", file=sys.stderr)
             sys.exit(1)
         from graphify.watch import check_update
+
         check_update(Path(sys.argv[2]).resolve())
         sys.exit(0)
     elif cmd == "tree":
@@ -2516,6 +2722,7 @@ def main() -> None:
         # showing top-K outbound edges per symbol.
         from typing import Optional as _Opt
         from graphify.tree_html import write_tree_html, DEFAULT_MAX_CHILDREN
+
         graph_path = Path(_GRAPHIFY_OUT) / "graph.json"
         output_path: "_Opt[Path]" = None
         root: "_Opt[str]" = None
@@ -2527,24 +2734,34 @@ def main() -> None:
         while i_arg < len(args):
             a = args[i_arg]
             if a == "--graph" and i_arg + 1 < len(args):
-                graph_path = Path(args[i_arg + 1]); i_arg += 2
+                graph_path = Path(args[i_arg + 1])
+                i_arg += 2
             elif a == "--output" and i_arg + 1 < len(args):
-                output_path = Path(args[i_arg + 1]); i_arg += 2
+                output_path = Path(args[i_arg + 1])
+                i_arg += 2
             elif a == "--root" and i_arg + 1 < len(args):
-                root = args[i_arg + 1]; i_arg += 2
+                root = args[i_arg + 1]
+                i_arg += 2
             elif a == "--max-children" and i_arg + 1 < len(args):
-                max_children = int(args[i_arg + 1]); i_arg += 2
+                max_children = int(args[i_arg + 1])
+                i_arg += 2
             elif a == "--top-k-edges" and i_arg + 1 < len(args):
-                top_k_edges = int(args[i_arg + 1]); i_arg += 2
+                top_k_edges = int(args[i_arg + 1])
+                i_arg += 2
             elif a == "--label" and i_arg + 1 < len(args):
-                project_label = args[i_arg + 1]; i_arg += 2
+                project_label = args[i_arg + 1]
+                i_arg += 2
             elif a in ("-h", "--help"):
                 print("Usage: graphify tree [--graph PATH] [--output HTML]")
                 print("  --graph PATH         path to graph.json (default graphify-out/graph.json)")
                 print("  --output HTML        output path (default graphify-out/GRAPH_TREE.html)")
-                print("  --root PATH          filesystem root (default: longest common dir of all source_files)")
+                print(
+                    "  --root PATH          filesystem root (default: longest common dir of all source_files)"
+                )
                 print("  --max-children N     cap visible children per node (default 200)")
-                print("  --top-k-edges N      pre-compute top-K outbound edges per symbol (default 12)")
+                print(
+                    "  --top-k-edges N      pre-compute top-K outbound edges per symbol (default 12)"
+                )
                 print("  --label NAME         project label shown in the page header")
                 return
             else:
@@ -2556,9 +2773,12 @@ def main() -> None:
         if output_path is None:
             output_path = graph_path.parent / "GRAPH_TREE.html"
         out = write_tree_html(
-            graph_path=graph_path, output_path=output_path,
-            root=root, max_children=max_children,
-            top_k_edges=top_k_edges, project_label=project_label,
+            graph_path=graph_path,
+            output_path=output_path,
+            root=root,
+            max_children=max_children,
+            top_k_edges=top_k_edges,
+            project_label=project_label,
         )
         size_kb = out.stat().st_size / 1024
         print(f"wrote {out} ({size_kb:.1f} KB)")
@@ -2583,6 +2803,7 @@ def main() -> None:
         _MERGE_MAX_NODES = 100_000
         import networkx as _nx
         from networkx.readwrite import json_graph as _jg
+
         def _load_graph(p: str):
             path_obj = Path(p)
             try:
@@ -2598,24 +2819,25 @@ def _load_graph(p: str):
                 return _jg.node_link_graph(data, edges="links"), data
             except TypeError:
                 return _jg.node_link_graph(data), data
+
         try:
-            G_cur, _ = _load_graph(_current_path)
-            G_oth, _ = _load_graph(_other_path)
+            current_graph, _ = _load_graph(_current_path)
+            other_graph, _ = _load_graph(_other_path)
         except Exception as exc:
             print(f"[graphify merge-driver] error loading graphs: {exc}", file=sys.stderr)
             sys.exit(1)  # surface the conflict so git doesn't accept a corrupt merge
-        merged = _nx.compose(G_cur, G_oth)
-        if merged.number_of_nodes() > _MERGE_MAX_NODES:
+        merged_graph = _nx.compose(current_graph, other_graph)
+        if merged_graph.number_of_nodes() > _MERGE_MAX_NODES:
             print(
-                f"[graphify merge-driver] merged graph has {merged.number_of_nodes()} nodes, "
+                f"[graphify merge-driver] merged graph has {merged_graph.number_of_nodes()} nodes, "
                 f"exceeds {_MERGE_MAX_NODES}-node cap; aborting merge.",
                 file=sys.stderr,
             )
             sys.exit(1)
         try:
-            out_data = _jg.node_link_data(merged, edges="links")
+            out_data = _jg.node_link_data(merged_graph, edges="links")
         except TypeError:
-            out_data = _jg.node_link_data(merged)
+            out_data = _jg.node_link_data(merged_graph)
         Path(_current_path).write_text(json.dumps(out_data, indent=2), encoding="utf-8")
         sys.exit(0)
 
@@ -2627,16 +2849,22 @@ def _load_graph(p: str):
         i = 0
         while i < len(args):
             if args[i] == "--out" and i + 1 < len(args):
-                out_path = Path(args[i + 1]); i += 2
+                out_path = Path(args[i + 1])
+                i += 2
             else:
-                graph_paths.append(Path(args[i])); i += 1
+                graph_paths.append(Path(args[i]))
+                i += 1
         if len(graph_paths) < 2:
-            print("Usage: graphify merge-graphs <graph1.json> <graph2.json> [...] [--out merged.json]", file=sys.stderr)
+            print(
+                "Usage: graphify merge-graphs <graph1.json> <graph2.json> [...] [--out merged.json]",
+                file=sys.stderr,
+            )
             sys.exit(1)
         import networkx as _nx
         from networkx.readwrite import json_graph as _jg
         from graphify.build import prefix_graph_for_global as _prefix
-        graphs = []
+
+        loaded_graphs = []
         for gp in graph_paths:
             if not gp.exists():
                 print(f"error: not found: {gp}", file=sys.stderr)
@@ -2648,27 +2876,32 @@ def _load_graph(p: str):
             if "links" not in data and "edges" in data:
                 data = dict(data, links=data["edges"])
             try:
-                G = _jg.node_link_graph(data, edges="links")
+                input_graph = _jg.node_link_graph(data, edges="links")
             except TypeError:
-                G = _jg.node_link_graph(data)
-            graphs.append(G)
-        merged = _nx.Graph()
-        for G, gp in zip(graphs, graph_paths):
+                input_graph = _jg.node_link_graph(data)
+            loaded_graphs.append(input_graph)
+        merged_graph = _nx.Graph()
+        for input_graph, gp in zip(loaded_graphs, graph_paths):
             repo_tag = gp.parent.parent.name  # graphify-out/../ → repo dir name
-            prefixed = _prefix(G, repo_tag)
-            merged = _nx.compose(merged, prefixed)
+            prefixed = _prefix(input_graph, repo_tag)
+            merged_graph = _nx.compose(merged_graph, prefixed)
         try:
-            out_data = _jg.node_link_data(merged, edges="links")
+            out_data = _jg.node_link_data(merged_graph, edges="links")
         except TypeError:
-            out_data = _jg.node_link_data(merged)
+            out_data = _jg.node_link_data(merged_graph)
         out_path.parent.mkdir(parents=True, exist_ok=True)
         out_path.write_text(json.dumps(out_data, indent=2), encoding="utf-8")
-        print(f"Merged {len(graphs)} graphs -> {merged.number_of_nodes()} nodes, {merged.number_of_edges()} edges")
+        print(
+            f"Merged {len(loaded_graphs)} graphs -> {merged_graph.number_of_nodes()} nodes, {merged_graph.number_of_edges()} edges"
+        )
         print(f"Written to: {out_path}")
 
     elif cmd == "clone":
         if len(sys.argv) < 3:
-            print("Usage: graphify clone <github-url> [--branch <branch>] [--out <dir>]", file=sys.stderr)
+            print(
+                "Usage: graphify clone <github-url> [--branch <branch>] [--out <dir>]",
+                file=sys.stderr,
+            )
             sys.exit(1)
         url = sys.argv[2]
         branch: str | None = None
@@ -2677,9 +2910,11 @@ def _load_graph(p: str):
         i = 0
         while i < len(args):
             if args[i] == "--branch" and i + 1 < len(args):
-                branch = args[i + 1]; i += 2
+                branch = args[i + 1]
+                i += 2
             elif args[i] == "--out" and i + 1 < len(args):
-                out_dir = Path(args[i + 1]); i += 2
+                out_dir = Path(args[i + 1])
+                i += 2
             else:
                 i += 1
         local_path = _clone_repo(url, branch=branch, out_dir=out_dir)
@@ -2689,15 +2924,29 @@ def _load_graph(p: str):
         subcmd = sys.argv[2] if len(sys.argv) > 2 else ""
         if subcmd not in ("html", "callflow-html", "obsidian", "wiki", "svg", "graphml", "neo4j"):
             print("Usage: graphify export <format>", file=sys.stderr)
-            print("  html      [--graph PATH] [--labels PATH] [--node-limit N] [--no-viz]", file=sys.stderr)
-            print("  callflow-html [GRAPH|DIR] [--graph PATH] [--labels PATH] [--report PATH] [--sections PATH] [--output HTML]", file=sys.stderr)
-            print("            [--lang auto|zh-CN|en] [--max-sections N] [--diagram-scale N]", file=sys.stderr)
+            print(
+                "  html      [--graph PATH] [--labels PATH] [--node-limit N] [--no-viz]",
+                file=sys.stderr,
+            )
+            print(
+                "  callflow-html [GRAPH|DIR] [--graph PATH] [--labels PATH] [--report PATH] [--sections PATH] [--output HTML]",
+                file=sys.stderr,
+            )
+            print(
+                "            [--lang auto|zh-CN|en] [--max-sections N] [--diagram-scale N]",
+                file=sys.stderr,
+            )
             print("  obsidian  [--graph PATH] [--labels PATH] [--dir PATH]", file=sys.stderr)
             print("  wiki      [--graph PATH] [--labels PATH]", file=sys.stderr)
             print("  svg       [--graph PATH] [--labels PATH]", file=sys.stderr)
             print("  graphml   [--graph PATH]", file=sys.stderr)
-            print("  neo4j     [--graph PATH] [--push URI] [--user U] [--password P]", file=sys.stderr)
-            print("            (or set NEO4J_PASSWORD instead of --password to keep it off argv)", file=sys.stderr)
+            print(
+                "  neo4j     [--graph PATH] [--push URI] [--user U] [--password P]", file=sys.stderr
+            )
+            print(
+                "            (or set NEO4J_PASSWORD instead of --password to keep it off argv)",
+                file=sys.stderr,
+            )
             sys.exit(1)
 
         # Parse shared args
@@ -2741,27 +2990,37 @@ def _load_graph(p: str):
                 report_path_explicit = True
                 i += 2
             elif a == "--sections" and i + 1 < len(args):
-                sections_path = Path(args[i + 1]); i += 2
+                sections_path = Path(args[i + 1])
+                i += 2
             elif a == "--output" and i + 1 < len(args):
                 callflow_output = Path(args[i + 1]).expanduser()
                 if not callflow_output.is_absolute():
                     callflow_output = Path.cwd() / callflow_output
                 i += 2
             elif a == "--lang" and i + 1 < len(args):
-                callflow_lang = args[i + 1]; i += 2
+                callflow_lang = args[i + 1]
+                i += 2
             elif a == "--max-sections" and i + 1 < len(args):
-                callflow_max_sections = int(args[i + 1]); i += 2
+                callflow_max_sections = int(args[i + 1])
+                i += 2
             elif a == "--diagram-scale" and i + 1 < len(args):
-                callflow_diagram_scale = float(args[i + 1]); i += 2
+                callflow_diagram_scale = float(args[i + 1])
+                i += 2
             elif a == "--max-diagram-nodes" and i + 1 < len(args):
-                callflow_max_diagram_nodes = int(args[i + 1]); i += 2
+                callflow_max_diagram_nodes = int(args[i + 1])
+                i += 2
             elif a == "--max-diagram-edges" and i + 1 < len(args):
-                callflow_max_diagram_edges = int(args[i + 1]); i += 2
+                callflow_max_diagram_edges = int(args[i + 1])
+                i += 2
             elif a in ("-h", "--help") and subcmd == "callflow-html":
-                print("Usage: graphify export callflow-html [GRAPH|DIR] [--graph PATH] [--labels PATH]")
+                print(
+                    "Usage: graphify export callflow-html [GRAPH|DIR] [--graph PATH] [--labels PATH]"
+                )
                 print("  --report PATH          path to GRAPH_REPORT.md")
                 print("  --sections PATH        JSON section definitions")
-                print("  --output HTML          output path (default graphify-out/<project>-callflow.html)")
+                print(
+                    "  --output HTML          output path (default graphify-out/<project>-callflow.html)"
+                )
                 print("  --lang LANG            auto, zh-CN, en, etc. (default auto)")
                 print("  --max-sections N       maximum auto-derived sections (default 15)")
                 print("  --diagram-scale N      Mermaid diagram scale (default 1.0)")
@@ -2769,17 +3028,23 @@ def _load_graph(p: str):
                 print("  --max-diagram-edges N  representative edges per section (default 24)")
                 sys.exit(0)
             elif a == "--node-limit" and i + 1 < len(args):
-                node_limit = int(args[i + 1]); i += 2
+                node_limit = int(args[i + 1])
+                i += 2
             elif a == "--no-viz":
-                no_viz = True; i += 1
+                no_viz = True
+                i += 1
             elif a == "--dir" and i + 1 < len(args):
-                obsidian_dir = Path(args[i + 1]); i += 2
+                obsidian_dir = Path(args[i + 1])
+                i += 2
             elif a == "--push" and i + 1 < len(args):
-                neo4j_uri = args[i + 1]; i += 2
+                neo4j_uri = args[i + 1]
+                i += 2
             elif a == "--user" and i + 1 < len(args):
-                neo4j_user = args[i + 1]; i += 2
+                neo4j_user = args[i + 1]
+                i += 2
             elif a == "--password" and i + 1 < len(args):
-                neo4j_password = args[i + 1]; i += 2
+                neo4j_password = args[i + 1]
+                i += 2
             elif subcmd == "callflow-html" and not a.startswith("-") and not graph_path_explicit:
                 candidate = Path(a)
                 if candidate.name == "graph.json" or candidate.suffix.lower() == ".json":
@@ -2804,11 +3069,15 @@ def _load_graph(p: str):
         report_path = report_path.expanduser()
 
         if not graph_path.exists():
-            print(f"error: graph not found: {graph_path}. Run /graphify <path> first.", file=sys.stderr)
+            print(
+                f"error: graph not found: {graph_path}. Run /graphify <path> first.",
+                file=sys.stderr,
+            )
             sys.exit(1)
 
         if subcmd == "callflow-html":
             from graphify.callflow_html import write_callflow_html as _write_callflow_html
+
             out = _write_callflow_html(
                 graph=graph_path,
                 report=report_path,
@@ -2826,7 +3095,6 @@ def _load_graph(p: str):
             sys.exit(0)
 
         from networkx.readwrite import json_graph as _jg
-        from graphify.build import build_from_json as _bfj
 
         _enforce_graph_size_cap_or_exit(graph_path)
         _raw = json.loads(graph_path.read_text(encoding="utf-8"))
@@ -2873,36 +3141,52 @@ def _load_graph(p: str):
 
         labels: dict[int, str] = {}
         if labels_path.exists():
-            labels = {int(k): v for k, v in json.loads(labels_path.read_text(encoding="utf-8")).items()}
+            labels = {
+                int(k): v for k, v in json.loads(labels_path.read_text(encoding="utf-8")).items()
+            }
 
         out_dir = graph_path.parent
 
         if subcmd == "html":
             from graphify.export import to_html as _to_html
+
             if no_viz:
                 html_target = out_dir / "graph.html"
                 if html_target.exists():
                     html_target.unlink()
                 print("--no-viz: skipped graph.html")
             else:
-                _to_html(G, communities, str(out_dir / "graph.html"),
-                         community_labels=labels or None, node_limit=node_limit)
+                _to_html(
+                    G,
+                    communities,
+                    str(out_dir / "graph.html"),
+                    community_labels=labels or None,
+                    node_limit=node_limit,
+                )
                 if G.number_of_nodes() <= node_limit:
-                    print(f"graph.html written - open in any browser, no server needed")
+                    print("graph.html written - open in any browser, no server needed")
 
         elif subcmd == "obsidian":
             from graphify.export import to_obsidian as _to_obsidian, to_canvas as _to_canvas
-            n = _to_obsidian(G, communities, str(obsidian_dir),
-                             community_labels=labels or None, cohesion=cohesion or None)
+
+            n = _to_obsidian(
+                G,
+                communities,
+                str(obsidian_dir),
+                community_labels=labels or None,
+                cohesion=cohesion or None,
+            )
             print(f"Obsidian vault: {n} notes in {obsidian_dir}/")
-            _to_canvas(G, communities, str(obsidian_dir / "graph.canvas"),
-                       community_labels=labels or None)
+            _to_canvas(
+                G, communities, str(obsidian_dir / "graph.canvas"), community_labels=labels or None
+            )
             print(f"Canvas: {obsidian_dir}/graph.canvas")
             print(f"Open {obsidian_dir}/ as a vault in Obsidian.")
 
         elif subcmd == "wiki":
             from graphify.wiki import to_wiki as _to_wiki
             from graphify.analyze import god_nodes as _god_nodes
+
             if not communities:
                 print(
                     "error: .graphify_analysis.json is missing or empty — refusing to export wiki to prevent data loss.\n"
@@ -2912,39 +3196,53 @@ def _load_graph(p: str):
                 sys.exit(1)
             if not gods_data:
                 gods_data = _god_nodes(G)
-            n = _to_wiki(G, communities, str(out_dir / "wiki"),
-                         community_labels=labels or None, cohesion=cohesion or None,
-                         god_nodes_data=gods_data)
+            n = _to_wiki(
+                G,
+                communities,
+                str(out_dir / "wiki"),
+                community_labels=labels or None,
+                cohesion=cohesion or None,
+                god_nodes_data=gods_data,
+            )
             print(f"Wiki: {n} articles written to {out_dir}/wiki/")
             print(f"  {out_dir}/wiki/index.md  ->  agent entry point")
 
         elif subcmd == "svg":
             from graphify.export import to_svg as _to_svg
-            _to_svg(G, communities, str(out_dir / "graph.svg"),
-                    community_labels=labels or None)
-            print(f"graph.svg written - embeds in Obsidian, Notion, GitHub READMEs")
+
+            _to_svg(G, communities, str(out_dir / "graph.svg"), community_labels=labels or None)
+            print("graph.svg written - embeds in Obsidian, Notion, GitHub READMEs")
 
         elif subcmd == "graphml":
             from graphify.export import to_graphml as _to_graphml
+
             _to_graphml(G, communities, str(out_dir / "graph.graphml"))
-            print(f"graph.graphml written - open in Gephi, yEd, or any GraphML tool")
+            print("graph.graphml written - open in Gephi, yEd, or any GraphML tool")
 
         elif subcmd == "neo4j":
             if neo4j_uri:
                 from graphify.export import push_to_neo4j as _push
+
                 if neo4j_password is None:
                     print("error: --password required for --push", file=sys.stderr)
                     sys.exit(1)
-                result = _push(G, uri=neo4j_uri, user=neo4j_user,
-                               password=neo4j_password, communities=communities)
+                result = _push(
+                    G,
+                    uri=neo4j_uri,
+                    user=neo4j_user,
+                    password=neo4j_password,
+                    communities=communities,
+                )
                 print(f"Pushed to Neo4j: {result['nodes']} nodes, {result['edges']} edges")
             else:
                 from graphify.export import to_cypher as _to_cypher
+
                 _to_cypher(G, str(out_dir / "cypher.txt"))
                 print(f"cypher.txt written - import with: cypher-shell < {out_dir}/cypher.txt")
 
     elif cmd == "benchmark":
         from graphify.benchmark import run_benchmark, print_benchmark
+
         graph_path = sys.argv[2] if len(sys.argv) > 2 else "graphify-out/graph.json"
         _enforce_graph_size_cap_or_exit(Path(graph_path))
         # Try to load corpus_words from detect output
@@ -2954,8 +3252,11 @@ def _load_graph(p: str):
             try:
                 detect_data = json.loads(detect_path.read_text(encoding="utf-8"))
                 corpus_words = detect_data.get("total_words")
-            except Exception:
-                pass
+            except Exception as exc:
+                print(
+                    f"[graphify] warning: could not read .graphify_detect.json: {exc}",
+                    file=sys.stderr,
+                )
         result = run_benchmark(graph_path, corpus_words=corpus_words)
         print_benchmark(result)
 
@@ -2967,6 +3268,7 @@ def _load_graph(p: str):
             global_list as _global_list,
             global_path as _global_path,
         )
+
         if subcmd == "add":
             # graphify global add <graph.json> [--as <tag>]
             args = sys.argv[3:]
@@ -2975,9 +3277,11 @@ def _load_graph(p: str):
             i = 0
             while i < len(args):
                 if args[i] == "--as" and i + 1 < len(args):
-                    tag = args[i + 1]; i += 2
+                    tag = args[i + 1]
+                    i += 2
                 elif not source:
-                    source = Path(args[i]); i += 1
+                    source = Path(args[i])
+                    i += 1
                 else:
                     i += 1
             if not source:
@@ -2989,19 +3293,24 @@ def _load_graph(p: str):
                 if result["skipped"]:
                     print(f"'{tag}' unchanged since last add - global graph not modified.")
                 else:
-                    print(f"Added '{tag}' to global graph: +{result['nodes_added']} nodes, "
-                          f"-{result['nodes_removed']} pruned. Global: {_global_path()}")
+                    print(
+                        f"Added '{tag}' to global graph: +{result['nodes_added']} nodes, "
+                        f"-{result['nodes_removed']} pruned. Global: {_global_path()}"
+                    )
             except Exception as exc:
-                print(f"error: {exc}", file=sys.stderr); sys.exit(1)
+                print(f"error: {exc}", file=sys.stderr)
+                sys.exit(1)
         elif subcmd == "remove":
             tag = sys.argv[3] if len(sys.argv) > 3 else ""
             if not tag:
-                print("Usage: graphify global remove <repo-tag>", file=sys.stderr); sys.exit(1)
+                print("Usage: graphify global remove <repo-tag>", file=sys.stderr)
+                sys.exit(1)
             try:
                 removed = _global_remove(tag)
                 print(f"Removed '{tag}' from global graph ({removed} nodes pruned).")
             except KeyError as exc:
-                print(f"error: {exc}", file=sys.stderr); sys.exit(1)
+                print(f"error: {exc}", file=sys.stderr)
+                sys.exit(1)
         elif subcmd == "list":
             repos = _global_list()
             if not repos:
@@ -3009,12 +3318,14 @@ def _load_graph(p: str):
             else:
                 print(f"Global graph: {_global_path()}")
                 for tag, info in repos.items():
-                    print(f"  {tag}: {info.get('node_count', '?')} nodes, added {info.get('added_at', '?')[:10]}")
+                    print(
+                        f"  {tag}: {info.get('node_count', '?')} nodes, added {info.get('added_at', '?')[:10]}"
+                    )
         elif subcmd == "path":
             print(_global_path())
         else:
-            print("Usage: graphify global [add|remove|list|path]", file=sys.stderr); sys.exit(1)
-
+            print("Usage: graphify global [add|remove|list|path]", file=sys.stderr)
+            sys.exit(1)
     elif cmd == "extract":
         # Headless full-pipeline extraction for CI / scripts (#698).
         # Runs detect -> AST extraction on code -> semantic LLM extraction on
@@ -3083,59 +3394,86 @@ def _parse_float(name: str, raw: str) -> float:
         while i < len(args):
             a = args[i]
             if a == "--backend" and i + 1 < len(args):
-                backend = args[i + 1]; i += 2
+                backend = args[i + 1]
+                i += 2
             elif a.startswith("--backend="):
-                backend = a.split("=", 1)[1]; i += 1
+                backend = a.split("=", 1)[1]
+                i += 1
             elif a == "--model" and i + 1 < len(args):
-                model = args[i + 1]; i += 2
+                model = args[i + 1]
+                i += 2
             elif a.startswith("--model="):
-                model = a.split("=", 1)[1]; i += 1
+                model = a.split("=", 1)[1]
+                i += 1
             elif a == "--mode" and i + 1 < len(args):
-                extract_mode = args[i + 1]; i += 2
+                extract_mode = args[i + 1]
+                i += 2
             elif a.startswith("--mode="):
-                extract_mode = a.split("=", 1)[1]; i += 1
+                extract_mode = a.split("=", 1)[1]
+                i += 1
             elif a == "--out" and i + 1 < len(args):
-                out_dir = Path(args[i + 1]); i += 2
+                out_dir = Path(args[i + 1])
+                i += 2
             elif a.startswith("--out="):
-                out_dir = Path(a.split("=", 1)[1]); i += 1
+                out_dir = Path(a.split("=", 1)[1])
+                i += 1
             elif a == "--no-cluster":
-                no_cluster = True; i += 1
+                no_cluster = True
+                i += 1
             elif a == "--dedup-llm":
-                dedup_llm = True; i += 1
+                dedup_llm = True
+                i += 1
             elif a == "--google-workspace":
-                google_workspace = True; i += 1
+                google_workspace = True
+                i += 1
             elif a == "--global":
-                global_merge = True; i += 1
+                global_merge = True
+                i += 1
             elif a == "--as" and i + 1 < len(args):
-                global_repo_tag = args[i + 1]; i += 2
+                global_repo_tag = args[i + 1]
+                i += 2
             elif a == "--max-workers" and i + 1 < len(args):
-                cli_max_workers = _parse_int("--max-workers", args[i + 1]); i += 2
+                cli_max_workers = _parse_int("--max-workers", args[i + 1])
+                i += 2
             elif a.startswith("--max-workers="):
-                cli_max_workers = _parse_int("--max-workers", a.split("=", 1)[1]); i += 1
+                cli_max_workers = _parse_int("--max-workers", a.split("=", 1)[1])
+                i += 1
             elif a == "--token-budget" and i + 1 < len(args):
-                cli_token_budget = _parse_int("--token-budget", args[i + 1]); i += 2
+                cli_token_budget = _parse_int("--token-budget", args[i + 1])
+                i += 2
             elif a.startswith("--token-budget="):
-                cli_token_budget = _parse_int("--token-budget", a.split("=", 1)[1]); i += 1
+                cli_token_budget = _parse_int("--token-budget", a.split("=", 1)[1])
+                i += 1
             elif a == "--max-concurrency" and i + 1 < len(args):
-                cli_max_concurrency = _parse_int("--max-concurrency", args[i + 1]); i += 2
+                cli_max_concurrency = _parse_int("--max-concurrency", args[i + 1])
+                i += 2
             elif a.startswith("--max-concurrency="):
-                cli_max_concurrency = _parse_int("--max-concurrency", a.split("=", 1)[1]); i += 1
+                cli_max_concurrency = _parse_int("--max-concurrency", a.split("=", 1)[1])
+                i += 1
             elif a == "--api-timeout" and i + 1 < len(args):
-                cli_api_timeout = _parse_float("--api-timeout", args[i + 1]); i += 2
+                cli_api_timeout = _parse_float("--api-timeout", args[i + 1])
+                i += 2
             elif a.startswith("--api-timeout="):
-                cli_api_timeout = _parse_float("--api-timeout", a.split("=", 1)[1]); i += 1
+                cli_api_timeout = _parse_float("--api-timeout", a.split("=", 1)[1])
+                i += 1
             elif a == "--resolution" and i + 1 < len(args):
-                cli_resolution = _parse_float("--resolution", args[i + 1]); i += 2
+                cli_resolution = _parse_float("--resolution", args[i + 1])
+                i += 2
             elif a.startswith("--resolution="):
-                cli_resolution = _parse_float("--resolution", a.split("=", 1)[1]); i += 1
+                cli_resolution = _parse_float("--resolution", a.split("=", 1)[1])
+                i += 1
             elif a == "--exclude-hubs" and i + 1 < len(args):
-                cli_exclude_hubs = float(args[i + 1]); i += 2
+                cli_exclude_hubs = float(args[i + 1])
+                i += 2
             elif a.startswith("--exclude-hubs="):
-                cli_exclude_hubs = float(a.split("=", 1)[1]); i += 1
+                cli_exclude_hubs = float(a.split("=", 1)[1])
+                i += 1
             elif a == "--exclude" and i + 1 < len(args):
-                cli_excludes.append(args[i + 1]); i += 2
+                cli_excludes.append(args[i + 1])
+                i += 2
             elif a.startswith("--exclude="):
-                cli_excludes.append(a.split("=", 1)[1]); i += 1
+                cli_excludes.append(a.split("=", 1)[1])
+                i += 1
             else:
                 i += 1
 
@@ -3170,6 +3508,7 @@ def _parse_float(name: str, raw: str) -> float:
             _format_backend_env_keys,
             _get_backend_api_key,
         )
+
         if backend is None:
             backend = _detect_backend()
             if backend is None:
@@ -3183,8 +3522,7 @@ def _parse_float(name: str, raw: str) -> float:
                 sys.exit(1)
         if backend not in _BACKENDS:
             print(
-                f"error: unknown backend '{backend}'. "
-                f"Available: {', '.join(sorted(_BACKENDS))}",
+                f"error: unknown backend '{backend}'. Available: {', '.join(sorted(_BACKENDS))}",
                 file=sys.stderr,
             )
             sys.exit(1)
@@ -3196,18 +3534,18 @@ def _parse_float(name: str, raw: str) -> float:
             allow_no_key = False
             if backend == "ollama":
                 from urllib.parse import urlparse
-                ollama_url = os.environ.get(
-                    "OLLAMA_BASE_URL",
-                    _BACKENDS["ollama"].get("base_url", ""),
+
+                ollama_url = str(
+                    os.environ.get(
+                        "OLLAMA_BASE_URL",
+                        _BACKENDS["ollama"].get("base_url", ""),
+                    )
                 )
                 try:
                     host = (urlparse(ollama_url).hostname or "").lower()
                 except Exception:
                     host = ""
-                allow_no_key = (
-                    host in ("localhost", "127.0.0.1", "::1")
-                    or host.startswith("127.")
-                )
+                allow_no_key = host in ("localhost", "127.0.0.1", "::1") or host.startswith("127.")
             elif backend == "bedrock":
                 allow_no_key = bool(
                     os.environ.get("AWS_PROFILE")
@@ -3217,6 +3555,7 @@ def _parse_float(name: str, raw: str) -> float:
                 )
             elif backend == "claude-cli":
                 import shutil as _shutil
+
                 allow_no_key = _shutil.which("claude") is not None
                 if not allow_no_key:
                     print(
@@ -3235,7 +3574,7 @@ def _parse_float(name: str, raw: str) -> float:
         # Resolve output dir. The user-facing contract is "<out>/graphify-out/"
         # so a fresh checkout writes graphify-out/ at the project root, matching
         # the skill.md pipeline.
-        out_root = (out_dir.resolve() if out_dir else target)
+        out_root = out_dir.resolve() if out_dir else target
         graphify_out = out_root / "graphify-out"
         graphify_out.mkdir(parents=True, exist_ok=True)
 
@@ -3244,6 +3583,7 @@ def _parse_float(name: str, raw: str) -> float:
             detect_incremental as _detect_incremental,
             save_manifest as _save_manifest,
         )
+
         manifest_path = graphify_out / "manifest.json"
         existing_graph_path = graphify_out / "graph.json"
         incremental_mode = manifest_path.exists() and existing_graph_path.exists()
@@ -3258,7 +3598,11 @@ def _parse_float(name: str, raw: str) -> float:
             )
         else:
             print(f"[graphify extract] scanning {target}")
-            detection = _detect(target, google_workspace=google_workspace or None, extra_excludes=cli_excludes or None)
+            detection = _detect(
+                target,
+                google_workspace=google_workspace or None,
+                extra_excludes=cli_excludes or None,
+            )
 
         files_by_type = detection.get("files", {})
         if incremental_mode:
@@ -3296,6 +3640,7 @@ def _parse_float(name: str, raw: str) -> float:
         ast_result: dict = {"nodes": [], "edges": [], "input_tokens": 0, "output_tokens": 0}
         if code_files:
             from graphify.extract import extract as _ast_extract
+
             ast_kwargs: dict = {"cache_root": target}
             if cli_max_workers is not None:
                 ast_kwargs["max_workers"] = cli_max_workers
@@ -3311,16 +3656,20 @@ def _parse_float(name: str, raw: str) -> float:
             check_semantic_cache as _check_semantic_cache,
             save_semantic_cache as _save_semantic_cache,
         )
+
         sem_result: dict = {
-            "nodes": [], "edges": [], "hyperedges": [],
-            "input_tokens": 0, "output_tokens": 0,
+            "nodes": [],
+            "edges": [],
+            "hyperedges": [],
+            "input_tokens": 0,
+            "output_tokens": 0,
         }
         sem_cache_hits = 0
         sem_cache_misses = 0
         if semantic_files:
             sem_paths_str = [str(p) for p in semantic_files]
-            cached_nodes, cached_edges, cached_hyperedges, uncached_paths = (
-                _check_semantic_cache(sem_paths_str, root=target)
+            cached_nodes, cached_edges, cached_hyperedges, uncached_paths = _check_semantic_cache(
+                sem_paths_str, root=target
             )
             sem_cache_hits = len(semantic_files) - len(uncached_paths)
             sem_cache_misses = len(uncached_paths)
@@ -3328,10 +3677,14 @@ def _parse_float(name: str, raw: str) -> float:
             sem_result["edges"].extend(cached_edges)
             sem_result["hyperedges"].extend(cached_hyperedges)
             if sem_cache_hits:
-                print(f"[graphify extract] semantic cache: {sem_cache_hits} hit / {sem_cache_misses} miss")
+                print(
+                    f"[graphify extract] semantic cache: {sem_cache_hits} hit / {sem_cache_misses} miss"
+                )
 
             if uncached_paths:
-                print(f"[graphify extract] semantic extraction on {len(uncached_paths)} files via {backend}...")
+                print(
+                    f"[graphify extract] semantic extraction on {len(uncached_paths)} files via {backend}..."
+                )
                 corpus_kwargs: dict = {
                     "backend": backend,
                     "model": model,
@@ -3349,6 +3702,7 @@ def _parse_float(name: str, raw: str) -> float:
                 # Also track per-chunk success so we can fail loudly when
                 # every chunk errors (e.g. missing backend SDK package).
                 _chunk_stats = {"total": 0, "succeeded": 0}
+
                 def _progress(idx: int, total: int, _result: dict) -> None:
                     _chunk_stats["total"] = total
                     _chunk_stats["succeeded"] += 1
@@ -3356,6 +3710,7 @@ def _progress(idx: int, total: int, _result: dict) -> None:
                         f"[graphify extract] chunk {idx + 1}/{total} done",
                         flush=True,
                     )
+
                 corpus_kwargs["on_chunk_done"] = _progress
 
                 try:
@@ -3371,7 +3726,13 @@ def _progress(idx: int, total: int, _result: dict) -> None:
                         f"[graphify extract] semantic extraction failed: {exc}",
                         file=sys.stderr,
                     )
-                    fresh = {"nodes": [], "edges": [], "hyperedges": [], "input_tokens": 0, "output_tokens": 0}
+                    fresh = {
+                        "nodes": [],
+                        "edges": [],
+                        "hyperedges": [],
+                        "input_tokens": 0,
+                        "output_tokens": 0,
+                    }
 
                 # on_chunk_done only fires after a chunk succeeds. If fresh
                 # semantic extraction was requested and no chunks completed,
@@ -3393,7 +3754,10 @@ def _progress(idx: int, total: int, _result: dict) -> None:
                         root=target,
                     )
                 except Exception as exc:
-                    print(f"[graphify extract] warning: could not write semantic cache: {exc}", file=sys.stderr)
+                    print(
+                        f"[graphify extract] warning: could not write semantic cache: {exc}",
+                        file=sys.stderr,
+                    )
                 sem_result["nodes"].extend(fresh.get("nodes", []))
                 sem_result["edges"].extend(fresh.get("edges", []))
                 sem_result["hyperedges"].extend(fresh.get("hyperedges", []))
@@ -3409,7 +3773,8 @@ def _progress(idx: int, total: int, _result: dict) -> None:
             "edges": list(ast_result.get("edges", [])) + list(sem_result.get("edges", [])),
             "hyperedges": list(sem_result.get("hyperedges", [])),
             "input_tokens": ast_result.get("input_tokens", 0) + sem_result.get("input_tokens", 0),
-            "output_tokens": ast_result.get("output_tokens", 0) + sem_result.get("output_tokens", 0),
+            "output_tokens": ast_result.get("output_tokens", 0)
+            + sem_result.get("output_tokens", 0),
         }
 
         graph_json_path = graphify_out / "graph.json"
@@ -3421,9 +3786,7 @@ def _progress(idx: int, total: int, _result: dict) -> None:
         # their semantic_hash empty so detect_incremental re-queues them (#933).
         _sem_extracted: set[str] = {
             n.get("source_file", "") for n in sem_result.get("nodes", [])
-        } | {
-            e.get("source_file", "") for e in sem_result.get("edges", [])
-        }
+        } | {e.get("source_file", "") for e in sem_result.get("edges", [])}
         _sem_extracted.discard("")
         _sem_types = {"document", "paper", "image"}
         _manifest_files = {
@@ -3435,13 +3798,10 @@ def _progress(idx: int, total: int, _result: dict) -> None:
             # --no-cluster: dump the raw merged extraction as graph.json.
             # No NetworkX, no community detection, no analysis sidecar.
             from graphify.export import backup_if_protected as _backup
+
             _backup(graphify_out)
-            graph_json_path.write_text(
-                json.dumps(merged, indent=2), encoding="utf-8"
-            )
-            cost = _estimate_cost(
-                backend, merged["input_tokens"], merged["output_tokens"]
-            )
+            graph_json_path.write_text(json.dumps(merged, indent=2), encoding="utf-8")
+            cost = _estimate_cost(backend, merged["input_tokens"], merged["output_tokens"])
             print(
                 f"[graphify extract] wrote {graph_json_path} — "
                 f"{len(merged['nodes'])} nodes, {len(merged['edges'])} edges "
@@ -3457,30 +3817,38 @@ def _progress(idx: int, total: int, _result: dict) -> None:
             try:
                 _save_manifest(_manifest_files, manifest_path=str(manifest_path), kind="both")
             except Exception as exc:
-                print(f"[graphify extract] warning: could not write manifest: {exc}", file=sys.stderr)
+                print(
+                    f"[graphify extract] warning: could not write manifest: {exc}", file=sys.stderr
+                )
             if global_merge:
                 from graphify.global_graph import global_add as _global_add
+
                 _tag = global_repo_tag or target.name
                 try:
                     result = _global_add(graphify_out / "graph.json", _tag)
                     if result["skipped"]:
                         print(f"[graphify global] '{_tag}' unchanged since last add - skipped.")
                     else:
-                        print(f"[graphify global] '{_tag}' merged into global graph "
-                              f"(+{result['nodes_added']} nodes, -{result['nodes_removed']} pruned).")
+                        print(
+                            f"[graphify global] '{_tag}' merged into global graph "
+                            f"(+{result['nodes_added']} nodes, -{result['nodes_removed']} pruned)."
+                        )
                 except Exception as exc:
-                    print(f"[graphify global] warning: failed to merge into global graph: {exc}", file=sys.stderr)
+                    print(
+                        f"[graphify global] warning: failed to merge into global graph: {exc}",
+                        file=sys.stderr,
+                    )
             sys.exit(0)
 
         # Build graph + cluster + score + write.
         from graphify.build import (
             build as _build,
-            build_from_json as _build_from_json,
             build_merge as _build_merge,
         )
         from graphify.cluster import cluster as _cluster, score_all as _score_all
         from graphify.export import to_json as _to_json
         from graphify.analyze import god_nodes as _god_nodes, surprising_connections as _surprising
+
         dedup_backend = backend if dedup_llm else None
         if incremental_mode:
             G = _build_merge(
@@ -3502,7 +3870,9 @@ def _progress(idx: int, total: int, _result: dict) -> None:
             )
             sys.exit(1)
 
-        communities = _cluster(G, resolution=cli_resolution, exclude_hubs_percentile=cli_exclude_hubs)
+        communities = _cluster(
+            G, resolution=cli_resolution, exclude_hubs_percentile=cli_exclude_hubs
+        )
         cohesion = _score_all(G, communities)
         try:
             gods = _god_nodes(G)
@@ -3514,6 +3884,7 @@ def _progress(idx: int, total: int, _result: dict) -> None:
             surprises = []
 
         from graphify.export import backup_if_protected as _backup
+
         _backup(graphify_out)
         _to_json(G, communities, str(graph_json_path), force=True)
         if merged.get("output_tokens", 0) > 0:
@@ -3522,16 +3893,22 @@ def _progress(idx: int, total: int, _result: dict) -> None:
             )
         if global_merge:
             from graphify.global_graph import global_add as _global_add
+
             _tag = global_repo_tag or target.name
             try:
                 result = _global_add(graphify_out / "graph.json", _tag)
                 if result["skipped"]:
                     print(f"[graphify global] '{_tag}' unchanged since last add - skipped.")
                 else:
-                    print(f"[graphify global] '{_tag}' merged into global graph "
-                          f"(+{result['nodes_added']} nodes, -{result['nodes_removed']} pruned).")
+                    print(
+                        f"[graphify global] '{_tag}' merged into global graph "
+                        f"(+{result['nodes_added']} nodes, -{result['nodes_removed']} pruned)."
+                    )
             except Exception as exc:
-                print(f"[graphify global] warning: failed to merge into global graph: {exc}", file=sys.stderr)
+                print(
+                    f"[graphify global] warning: failed to merge into global graph: {exc}",
+                    file=sys.stderr,
+                )
         analysis = {
             "communities": {str(k): v for k, v in communities.items()},
             "cohesion": {str(k): v for k, v in cohesion.items()},
@@ -3563,7 +3940,9 @@ def _progress(idx: int, total: int, _result: dict) -> None:
                 f"{len(deleted_files)} deleted"
             )
         elif sem_cache_hits:
-            print(f"[graphify extract] semantic cache: {sem_cache_hits} cached, {sem_cache_misses} re-extracted")
+            print(
+                f"[graphify extract] semantic cache: {sem_cache_hits} cached, {sem_cache_misses} re-extracted"
+            )
         if merged["input_tokens"] or merged["output_tokens"]:
             print(
                 f"[graphify extract] tokens: "
@@ -3580,26 +3959,31 @@ def _progress(idx: int, total: int, _result: dict) -> None:
         #   graphify-out/.graphify_uncached.txt  — paths that need extraction
         # Stdout: "Cache: N hit, M miss"
         from graphify.cache import check_semantic_cache
+
         if len(sys.argv) < 3:
             print("Usage: graphify cache-check <files_from> [--root <dir>]", file=sys.stderr)
             sys.exit(1)
         files_from = Path(sys.argv[2])
-        root = Path(".")
+        cache_root = Path(".")
         i = 3
         while i < len(sys.argv):
             if sys.argv[i] == "--root" and i + 1 < len(sys.argv):
-                root = Path(sys.argv[i + 1])
+                cache_root = Path(sys.argv[i + 1])
                 i += 2
             else:
                 i += 1
         files = [f for f in files_from.read_text(encoding="utf-8").splitlines() if f.strip()]
-        cached_nodes, cached_edges, cached_hyperedges, uncached = check_semantic_cache(files, root)
-        out = root / "graphify-out"
+        cached_nodes, cached_edges, cached_hyperedges, uncached = check_semantic_cache(
+            files, cache_root
+        )
+        out = cache_root / "graphify-out"
         out.mkdir(parents=True, exist_ok=True)
         if cached_nodes or cached_edges or cached_hyperedges:
             (out / ".graphify_cached.json").write_text(
-                json.dumps({"nodes": cached_nodes, "edges": cached_edges, "hyperedges": cached_hyperedges},
-                           ensure_ascii=False),
+                json.dumps(
+                    {"nodes": cached_nodes, "edges": cached_edges, "hyperedges": cached_hyperedges},
+                    ensure_ascii=False,
+                ),
                 encoding="utf-8",
             )
         (out / ".graphify_uncached.txt").write_text("\n".join(uncached), encoding="utf-8")
@@ -3610,6 +3994,7 @@ def _progress(idx: int, total: int, _result: dict) -> None:
         # Concatenates .graphify_chunk_*.json files written by semantic subagents.
         # Deduplicates nodes by id (first writer wins). Sums token counts.
         import glob as _glob
+
         if len(sys.argv) < 3:
             print("Usage: graphify merge-chunks <chunk_files...> --out <path>", file=sys.stderr)
             sys.exit(1)
@@ -3630,7 +4015,13 @@ def _progress(idx: int, total: int, _result: dict) -> None:
         for arg in chunk_args:
             expanded = _glob.glob(arg)
             chunk_files.extend(sorted(expanded) if expanded else [arg])
-        merged: dict = {"nodes": [], "edges": [], "hyperedges": [], "input_tokens": 0, "output_tokens": 0}
+        merged: dict = {
+            "nodes": [],
+            "edges": [],
+            "hyperedges": [],
+            "input_tokens": 0,
+            "output_tokens": 0,
+        }
         seen_ids: set[str] = set()
         for cf in chunk_files:
             try:
@@ -3658,7 +4049,10 @@ def _progress(idx: int, total: int, _result: dict) -> None:
         # Merges cached semantic results with freshly-extracted chunk results.
         # Deduplicates nodes by id (cached entries take priority over new ones).
         if len(sys.argv) < 3:
-            print("Usage: graphify merge-semantic --cached <path> --new <path> --out <path>", file=sys.stderr)
+            print(
+                "Usage: graphify merge-semantic --cached <path> --new <path> --out <path>",
+                file=sys.stderr,
+            )
             sys.exit(1)
         cached_path: Path | None = None
         new_path: Path | None = None
@@ -3666,19 +4060,30 @@ def _progress(idx: int, total: int, _result: dict) -> None:
         i = 2
         while i < len(sys.argv):
             if sys.argv[i] == "--cached" and i + 1 < len(sys.argv):
-                cached_path = Path(sys.argv[i + 1]); i += 2
+                cached_path = Path(sys.argv[i + 1])
+                i += 2
             elif sys.argv[i] == "--new" and i + 1 < len(sys.argv):
-                new_path = Path(sys.argv[i + 1]); i += 2
+                new_path = Path(sys.argv[i + 1])
+                i += 2
             elif sys.argv[i] == "--out" and i + 1 < len(sys.argv):
-                out_path2 = Path(sys.argv[i + 1]); i += 2
+                out_path2 = Path(sys.argv[i + 1])
+                i += 2
             else:
                 i += 1
         if not out_path2:
             print("error: --out <path> required", file=sys.stderr)
             sys.exit(1)
         empty: dict = {"nodes": [], "edges": [], "hyperedges": []}
-        cached_data = json.loads(cached_path.read_text(encoding="utf-8")) if cached_path and cached_path.exists() else empty
-        new_data = json.loads(new_path.read_text(encoding="utf-8")) if new_path and new_path.exists() else empty
+        cached_data = (
+            json.loads(cached_path.read_text(encoding="utf-8"))
+            if cached_path and cached_path.exists()
+            else empty
+        )
+        new_data = (
+            json.loads(new_path.read_text(encoding="utf-8"))
+            if new_path and new_path.exists()
+            else empty
+        )
         seen_ids2: set[str] = set()
         all_nodes: list[dict] = []
         for n in cached_data.get("nodes", []) + new_data.get("nodes", []):
diff --git a/graphify/affected.py b/graphify/affected.py
index 109eaa95e..0d81e6eda 100644
--- a/graphify/affected.py
+++ b/graphify/affected.py
@@ -3,7 +3,7 @@
 from collections import deque
 from dataclasses import dataclass
 from pathlib import Path
-from typing import Iterable
+from typing import Any, Iterable, cast
 
 import networkx as nx
 
@@ -87,8 +87,9 @@ def affected_nodes(
         current, current_depth = queue.popleft()
         if current_depth >= depth:
             continue
-        if hasattr(graph, "in_edges"):
-            incoming = graph.in_edges(current, data=True)
+        graph_any = cast(Any, graph)
+        if hasattr(graph_any, "in_edges"):
+            incoming = graph_any.in_edges(current, data=True)
         else:
             incoming = (
                 (source, target, data)
diff --git a/graphify/analyze.py b/graphify/analyze.py
index f3e08103d..0812d528d 100644
--- a/graphify/analyze.py
+++ b/graphify/analyze.py
@@ -1,9 +1,11 @@
 """Graph analysis: god nodes (most connected), surprising connections (cross-community), suggested questions."""
+
 from __future__ import annotations
 from pathlib import Path
 import networkx as nx
 
 from graphify.build import edge_data
+from graphify.detect import CODE_EXTENSIONS, IMAGE_EXTENSIONS, PAPER_EXTENSIONS
 
 # Language families — extensions sharing a runtime can legitimately call each other
 _LANG_FAMILY: dict[str, str] = {
@@ -53,6 +55,7 @@ def _is_file_node(G: nx.Graph, node_id: str) -> bool:
     source_file = attrs.get("source_file", "")
     if source_file:
         from pathlib import Path as _Path
+
         if label == _Path(source_file).name:
             return True
     # Method stub: AST extractor labels methods as '.method_name()'
@@ -65,12 +68,29 @@ def _is_file_node(G: nx.Graph, node_id: str) -> bool:
     return False
 
 
-_JSON_NOISE_LABELS: frozenset[str] = frozenset({
-    "start", "end", "name", "id", "type", "properties",
-    "value", "key", "data", "items", "title", "description", "version",
-    "dependencies", "devdependencies", "peerdependencies",
-    "optionaldependencies", "bundleddependencies", "bundledependencies",
-})
+_JSON_NOISE_LABELS: frozenset[str] = frozenset(
+    {
+        "start",
+        "end",
+        "name",
+        "id",
+        "type",
+        "properties",
+        "value",
+        "key",
+        "data",
+        "items",
+        "title",
+        "description",
+        "version",
+        "dependencies",
+        "devdependencies",
+        "peerdependencies",
+        "optionaldependencies",
+        "bundleddependencies",
+        "bundledependencies",
+    }
+)
 
 
 def _is_json_key_node(G: nx.Graph, node_id: str) -> bool:
@@ -92,13 +112,19 @@ def god_nodes(G: nx.Graph, top_n: int = 10) -> list[dict]:
     sorted_nodes = sorted(degree.items(), key=lambda x: x[1], reverse=True)
     result = []
     for node_id, deg in sorted_nodes:
-        if _is_file_node(G, node_id) or _is_concept_node(G, node_id) or _is_json_key_node(G, node_id):
+        if (
+            _is_file_node(G, node_id)
+            or _is_concept_node(G, node_id)
+            or _is_json_key_node(G, node_id)
+        ):
             continue
-        result.append({
-            "id": node_id,
-            "label": G.nodes[node_id].get("label", node_id),
-            "degree": deg,
-        })
+        result.append(
+            {
+                "id": node_id,
+                "label": G.nodes[node_id].get("label", node_id),
+                "degree": deg,
+            }
+        )
         if len(result) >= top_n:
             break
     return result
@@ -124,9 +150,7 @@ def surprising_connections(
     """
     # Identify unique source files (ignore empty/null source_file)
     source_files = {
-        data.get("source_file", "")
-        for _, data in G.nodes(data=True)
-        if data.get("source_file", "")
+        data.get("source_file", "") for _, data in G.nodes(data=True) if data.get("source_file", "")
     }
     is_multi_source = len(source_files) > 1
 
@@ -155,9 +179,6 @@ def _is_concept_node(G: nx.Graph, node_id: str) -> bool:
     return False
 
 
-from graphify.detect import CODE_EXTENSIONS, DOC_EXTENSIONS, PAPER_EXTENSIONS, IMAGE_EXTENSIONS
-
-
 def _file_category(path: str) -> str:
     ext = ("." + path.rsplit(".", 1)[-1].lower()) if "." in path else ""
     if ext in CODE_EXTENSIONS:
@@ -288,18 +309,20 @@ def _cross_file_surprises(G: nx.Graph, communities: dict[int, list[str]], top_n:
         tgt_id = data.get("_tgt", v)
         if tgt_id not in G.nodes:
             tgt_id = v
-        candidates.append({
-            "_score": score,
-            "source": G.nodes[src_id].get("label", src_id),
-            "target": G.nodes[tgt_id].get("label", tgt_id),
-            "source_files": [
-                G.nodes[src_id].get("source_file", ""),
-                G.nodes[tgt_id].get("source_file", ""),
-            ],
-            "confidence": data.get("confidence", "EXTRACTED"),
-            "relation": relation,
-            "why": "; ".join(reasons) if reasons else "cross-file semantic connection",
-        })
+        candidates.append(
+            {
+                "_score": score,
+                "source": G.nodes[src_id].get("label", src_id),
+                "target": G.nodes[tgt_id].get("label", tgt_id),
+                "source_files": [
+                    G.nodes[src_id].get("source_file", ""),
+                    G.nodes[tgt_id].get("source_file", ""),
+                ],
+                "confidence": data.get("confidence", "EXTRACTED"),
+                "relation": relation,
+                "why": "; ".join(reasons) if reasons else "cross-file semantic connection",
+            }
+        )
 
     candidates.sort(key=lambda x: x["_score"], reverse=True)
     for c in candidates:
@@ -334,17 +357,19 @@ def _cross_community_surprises(
         result = []
         for (u, v), score in top_edges:
             data = edge_data(G, u, v)
-            result.append({
-                "source": G.nodes[u].get("label", u),
-                "target": G.nodes[v].get("label", v),
-                "source_files": [
-                    G.nodes[u].get("source_file", ""),
-                    G.nodes[v].get("source_file", ""),
-                ],
-                "confidence": data.get("confidence", "EXTRACTED"),
-                "relation": data.get("relation", ""),
-                "note": f"Bridges graph structure (betweenness={score:.3f})",
-            })
+            result.append(
+                {
+                    "source": G.nodes[u].get("label", u),
+                    "target": G.nodes[v].get("label", v),
+                    "source_files": [
+                        G.nodes[u].get("source_file", ""),
+                        G.nodes[v].get("source_file", ""),
+                    ],
+                    "confidence": data.get("confidence", "EXTRACTED"),
+                    "relation": data.get("relation", ""),
+                    "note": f"Bridges graph structure (betweenness={score:.3f})",
+                }
+            )
         return result
 
     # Build node → community map
@@ -370,18 +395,20 @@ def _cross_community_surprises(
         tgt_id = data.get("_tgt", v)
         if tgt_id not in G.nodes:
             tgt_id = v
-        surprises.append({
-            "source": G.nodes[src_id].get("label", src_id),
-            "target": G.nodes[tgt_id].get("label", tgt_id),
-            "source_files": [
-                G.nodes[src_id].get("source_file", ""),
-                G.nodes[tgt_id].get("source_file", ""),
-            ],
-            "confidence": confidence,
-            "relation": relation,
-            "note": f"Bridges community {cid_u} → community {cid_v}",
-            "_pair": tuple(sorted([cid_u, cid_v])),
-        })
+        surprises.append(
+            {
+                "source": G.nodes[src_id].get("label", src_id),
+                "target": G.nodes[tgt_id].get("label", tgt_id),
+                "source_files": [
+                    G.nodes[src_id].get("source_file", ""),
+                    G.nodes[tgt_id].get("source_file", ""),
+                ],
+                "confidence": confidence,
+                "relation": relation,
+                "note": f"Bridges community {cid_u} → community {cid_v}",
+                "_pair": tuple(sorted([cid_u, cid_v])),
+            }
+        )
 
     # Sort: AMBIGUOUS first, then INFERRED, then EXTRACTED
     order = {"AMBIGUOUS": 0, "INFERRED": 1, "EXTRACTED": 2}
@@ -411,7 +438,9 @@ def suggest_questions(
     Each question has a 'type', 'question', and 'why' field.
     """
     if community_labels:
-        community_labels = {int(k) if isinstance(k, str) else k: v for k, v in community_labels.items()}
+        community_labels = {
+            int(k) if isinstance(k, str) else k: v for k, v in community_labels.items()
+        }
 
     questions = []
     node_community = _node_community_map(communities)
@@ -422,11 +451,13 @@ def suggest_questions(
             ul = G.nodes[u].get("label", u)
             vl = G.nodes[v].get("label", v)
             relation = data.get("relation", "related to")
-            questions.append({
-                "type": "ambiguous_edge",
-                "question": f"What is the exact relationship between `{ul}` and `{vl}`?",
-                "why": f"Edge tagged AMBIGUOUS (relation: {relation}) - confidence is low.",
-            })
+            questions.append(
+                {
+                    "type": "ambiguous_edge",
+                    "question": f"What is the exact relationship between `{ul}` and `{vl}`?",
+                    "why": f"Edge tagged AMBIGUOUS (relation: {relation}) - confidence is low.",
+                }
+            )
 
     # 2. Bridge nodes (high betweenness) → cross-cutting concern questions
     if G.number_of_edges() > 0:
@@ -434,24 +465,35 @@ def suggest_questions(
         betweenness = nx.betweenness_centrality(G, k=k, seed=42)
         # Top bridge nodes that are NOT file-level hubs
         bridges = sorted(
-            [(n, s) for n, s in betweenness.items()
-             if not _is_file_node(G, n) and not _is_concept_node(G, n) and s > 0],
+            [
+                (n, s)
+                for n, s in betweenness.items()
+                if not _is_file_node(G, n) and not _is_concept_node(G, n) and s > 0
+            ],
             key=lambda x: x[1],
             reverse=True,
         )[:3]
         for node_id, score in bridges:
             label = G.nodes[node_id].get("label", node_id)
             cid = node_community.get(node_id)
-            comm_label = community_labels.get(cid, f"Community {cid}") if cid is not None else "unknown"
+            comm_label = (
+                community_labels.get(cid, f"Community {cid}") if cid is not None else "unknown"
+            )
             neighbors = list(G.neighbors(node_id))
-            neighbor_comms = {node_community.get(n) for n in neighbors if node_community.get(n) != cid}
+            neighbor_comms = {
+                other_cid
+                for n in neighbors
+                if (other_cid := node_community.get(n)) is not None and other_cid != cid
+            }
             if neighbor_comms:
                 other_labels = [community_labels.get(c, f"Community {c}") for c in neighbor_comms]
-                questions.append({
-                    "type": "bridge_node",
-                    "question": f"Why does `{label}` connect `{comm_label}` to {', '.join(f'`{l}`' for l in other_labels)}?",
-                    "why": f"High betweenness centrality ({score:.3f}) - this node is a cross-community bridge.",
-                })
+                questions.append(
+                    {
+                        "type": "bridge_node",
+                        "question": f"Why does `{label}` connect `{comm_label}` to {', '.join(f'`{label}`' for label in other_labels)}?",
+                        "why": f"High betweenness centrality ({score:.3f}) - this node is a cross-community bridge.",
+                    }
+                )
 
     # 3. God nodes with many INFERRED edges → verification questions
     degree = dict(G.degree())
@@ -462,7 +504,8 @@ def suggest_questions(
     )[:5]
     for node_id, _ in top_nodes:
         inferred = [
-            (u, v, d) for u, v, d in G.edges(node_id, data=True)
+            (u, v, d)
+            for u, v, d in G.edges(node_id, data=True)
             if d.get("confidence") == "INFERRED"
         ]
         if len(inferred) >= 2:
@@ -478,48 +521,58 @@ def suggest_questions(
                     tgt_id = v
                 other_id = tgt_id if src_id == node_id else src_id
                 others.append(G.nodes[other_id].get("label", other_id))
-            questions.append({
-                "type": "verify_inferred",
-                "question": f"Are the {len(inferred)} inferred relationships involving `{label}` (e.g. with `{others[0]}` and `{others[1]}`) actually correct?",
-                "why": f"`{label}` has {len(inferred)} INFERRED edges - model-reasoned connections that need verification.",
-            })
+            questions.append(
+                {
+                    "type": "verify_inferred",
+                    "question": f"Are the {len(inferred)} inferred relationships involving `{label}` (e.g. with `{others[0]}` and `{others[1]}`) actually correct?",
+                    "why": f"`{label}` has {len(inferred)} INFERRED edges - model-reasoned connections that need verification.",
+                }
+            )
 
     # 4. Isolated or weakly-connected nodes → exploration questions
     isolated = [
-        n for n in G.nodes()
+        n
+        for n in G.nodes()
         if G.degree(n) <= 1 and not _is_file_node(G, n) and not _is_concept_node(G, n)
     ]
     if isolated:
         labels = [G.nodes[n].get("label", n) for n in isolated[:3]]
-        questions.append({
-            "type": "isolated_nodes",
-            "question": f"What connects {', '.join(f'`{l}`' for l in labels)} to the rest of the system?",
-            "why": f"{len(isolated)} weakly-connected nodes found - possible documentation gaps or missing edges.",
-        })
+        questions.append(
+            {
+                "type": "isolated_nodes",
+                "question": f"What connects {', '.join(f'`{label}`' for label in labels)} to the rest of the system?",
+                "why": f"{len(isolated)} weakly-connected nodes found - possible documentation gaps or missing edges.",
+            }
+        )
 
     # 5. Low-cohesion communities → structural questions
     from .cluster import cohesion_score
+
     for cid, nodes in communities.items():
         score = cohesion_score(G, nodes)
         if score < 0.15 and len(nodes) >= 5:
             label = community_labels.get(cid, f"Community {cid}")
-            questions.append({
-                "type": "low_cohesion",
-                "question": f"Should `{label}` be split into smaller, more focused modules?",
-                "why": f"Cohesion score {score} - nodes in this community are weakly interconnected.",
-            })
+            questions.append(
+                {
+                    "type": "low_cohesion",
+                    "question": f"Should `{label}` be split into smaller, more focused modules?",
+                    "why": f"Cohesion score {score} - nodes in this community are weakly interconnected.",
+                }
+            )
 
     if not questions:
-        return [{
-            "type": "no_signal",
-            "question": None,
-            "why": (
-                "Not enough signal to generate questions. "
-                "This usually means the corpus has no AMBIGUOUS edges, no bridge nodes, "
-                "no INFERRED relationships, and all communities are tightly cohesive. "
-                "Add more files or run with --mode deep to extract richer edges."
-            ),
-        }]
+        return [
+            {
+                "type": "no_signal",
+                "question": None,
+                "why": (
+                    "Not enough signal to generate questions. "
+                    "This usually means the corpus has no AMBIGUOUS edges, no bridge nodes, "
+                    "no INFERRED relationships, and all communities are tightly cohesive. "
+                    "Add more files or run with --mode deep to extract richer edges."
+                ),
+            }
+        ]
 
     return questions[:top_n]
 
@@ -542,13 +595,9 @@ def graph_diff(G_old: nx.Graph, G_new: nx.Graph) -> dict:
     added_node_ids = new_nodes - old_nodes
     removed_node_ids = old_nodes - new_nodes
 
-    new_nodes_list = [
-        {"id": n, "label": G_new.nodes[n].get("label", n)}
-        for n in added_node_ids
-    ]
+    new_nodes_list = [{"id": n, "label": G_new.nodes[n].get("label", n)} for n in added_node_ids]
     removed_nodes_list = [
-        {"id": n, "label": G_old.nodes[n].get("label", n)}
-        for n in removed_node_ids
+        {"id": n, "label": G_old.nodes[n].get("label", n)} for n in removed_node_ids
     ]
 
     def edge_key(G: nx.Graph, u: str, v: str, data: dict) -> tuple:
@@ -556,14 +605,8 @@ def edge_key(G: nx.Graph, u: str, v: str, data: dict) -> tuple:
             return (u, v, data.get("relation", ""))
         return (min(u, v), max(u, v), data.get("relation", ""))
 
-    old_edge_keys = {
-        edge_key(G_old, u, v, d)
-        for u, v, d in G_old.edges(data=True)
-    }
-    new_edge_keys = {
-        edge_key(G_new, u, v, d)
-        for u, v, d in G_new.edges(data=True)
-    }
+    old_edge_keys = {edge_key(G_old, u, v, d) for u, v, d in G_old.edges(data=True)}
+    new_edge_keys = {edge_key(G_new, u, v, d) for u, v, d in G_new.edges(data=True)}
 
     added_edge_keys = new_edge_keys - old_edge_keys
     removed_edge_keys = old_edge_keys - new_edge_keys
@@ -571,22 +614,26 @@ def edge_key(G: nx.Graph, u: str, v: str, data: dict) -> tuple:
     new_edges_list = []
     for u, v, d in G_new.edges(data=True):
         if edge_key(G_new, u, v, d) in added_edge_keys:
-            new_edges_list.append({
-                "source": u,
-                "target": v,
-                "relation": d.get("relation", ""),
-                "confidence": d.get("confidence", ""),
-            })
+            new_edges_list.append(
+                {
+                    "source": u,
+                    "target": v,
+                    "relation": d.get("relation", ""),
+                    "confidence": d.get("confidence", ""),
+                }
+            )
 
     removed_edges_list = []
     for u, v, d in G_old.edges(data=True):
         if edge_key(G_old, u, v, d) in removed_edge_keys:
-            removed_edges_list.append({
-                "source": u,
-                "target": v,
-                "relation": d.get("relation", ""),
-                "confidence": d.get("confidence", ""),
-            })
+            removed_edges_list.append(
+                {
+                    "source": u,
+                    "target": v,
+                    "relation": d.get("relation", ""),
+                    "confidence": d.get("confidence", ""),
+                }
+            )
 
     parts = []
     if new_nodes_list:
@@ -594,9 +641,13 @@ def edge_key(G: nx.Graph, u: str, v: str, data: dict) -> tuple:
     if new_edges_list:
         parts.append(f"{len(new_edges_list)} new edge{'s' if len(new_edges_list) != 1 else ''}")
     if removed_nodes_list:
-        parts.append(f"{len(removed_nodes_list)} node{'s' if len(removed_nodes_list) != 1 else ''} removed")
+        parts.append(
+            f"{len(removed_nodes_list)} node{'s' if len(removed_nodes_list) != 1 else ''} removed"
+        )
     if removed_edges_list:
-        parts.append(f"{len(removed_edges_list)} edge{'s' if len(removed_edges_list) != 1 else ''} removed")
+        parts.append(
+            f"{len(removed_edges_list)} edge{'s' if len(removed_edges_list) != 1 else ''} removed"
+        )
     summary = ", ".join(parts) if parts else "no changes"
 
     return {
diff --git a/graphify/benchmark.py b/graphify/benchmark.py
index eabade292..cf2ab4666 100644
--- a/graphify/benchmark.py
+++ b/graphify/benchmark.py
@@ -1,4 +1,5 @@
 """Token-reduction benchmark - measures how much context graphify saves vs naive full-corpus approach."""
+
 from __future__ import annotations
 import json
 import sys
@@ -66,11 +67,15 @@ def _query_subgraph_tokens(G: nx.Graph, question: str, depth: int = 3) -> int:
     lines = []
     for nid in visited:
         d = G.nodes[nid]
-        lines.append(f"NODE {d.get('label', nid)} src={d.get('source_file', '')} loc={d.get('source_location', '')}")
+        lines.append(
+            f"NODE {d.get('label', nid)} src={d.get('source_file', '')} loc={d.get('source_location', '')}"
+        )
     for u, v in edges_seen:
         if u in visited and v in visited:
             d = edge_data(G, u, v)
-            lines.append(f"EDGE {G.nodes[u].get('label', u)} --{d.get('relation', '')}--> {G.nodes[v].get('label', v)}")
+            lines.append(
+                f"EDGE {G.nodes[u].get('label', u)} --{d.get('relation', '')}--> {G.nodes[v].get('label', v)}"
+            )
 
     return _estimate_tokens("\n".join(lines))
 
@@ -99,6 +104,7 @@ def run_benchmark(
     Returns dict with: corpus_tokens, avg_query_tokens, reduction_ratio, per_question
     """
     from graphify.security import check_graph_file_size_cap
+
     check_graph_file_size_cap(Path(graph_path))
     data = json.loads(Path(graph_path).read_text(encoding="utf-8"))
     try:
@@ -108,16 +114,20 @@ def run_benchmark(
 
     if corpus_words is None:
         # Rough estimate: each node label is ~3 words, plus source context
-        corpus_words = G.number_of_nodes() * 50
+        estimated_corpus_words = G.number_of_nodes() * 50
+    else:
+        estimated_corpus_words = corpus_words
 
-    corpus_tokens = corpus_words * 100 // 75  # words → tokens (100 words ≈ 133 tokens)
+    corpus_tokens = estimated_corpus_words * 100 // 75  # words → tokens (100 words ≈ 133 tokens)
 
     qs = questions or _SAMPLE_QUESTIONS
     per_question = []
     for q in qs:
         qt = _query_subgraph_tokens(G, q)
         if qt > 0:
-            per_question.append({"question": q, "query_tokens": qt, "reduction": round(corpus_tokens / qt, 1)})
+            per_question.append(
+                {"question": q, "query_tokens": qt, "reduction": round(corpus_tokens / qt, 1)}
+            )
 
     if not per_question:
         return {"error": "No matching nodes found for sample questions. Build the graph first."}
@@ -127,7 +137,7 @@ def run_benchmark(
 
     return {
         "corpus_tokens": corpus_tokens,
-        "corpus_words": corpus_words,
+        "corpus_words": estimated_corpus_words,
         "nodes": G.number_of_nodes(),
         "edges": G.number_of_edges(),
         "avg_query_tokens": avg_query_tokens,
@@ -142,14 +152,16 @@ def print_benchmark(result: dict) -> None:
         print(f"Benchmark error: {result['error']}")
         return
 
-    print(f"\ngraphify token reduction benchmark")
+    print("\ngraphify token reduction benchmark")
     print(_hr(50))
     arrow = _safe("→", "->")
-    print(f"  Corpus:          {result['corpus_words']:,} words {arrow} ~{result['corpus_tokens']:,} tokens (naive)")
+    print(
+        f"  Corpus:          {result['corpus_words']:,} words {arrow} ~{result['corpus_tokens']:,} tokens (naive)"
+    )
     print(f"  Graph:           {result['nodes']:,} nodes, {result['edges']:,} edges")
     print(f"  Avg query cost:  ~{result['avg_query_tokens']:,} tokens")
     print(f"  Reduction:       {result['reduction_ratio']}x fewer tokens per query")
-    print(f"\n  Per question:")
+    print("\n  Per question:")
     for p in result["per_question"]:
         print(f"    [{p['reduction']}x] {p['question'][:55]}")
     print()
diff --git a/graphify/cache.py b/graphify/cache.py
index 2052cf7aa..73fff35e0 100644
--- a/graphify/cache.py
+++ b/graphify/cache.py
@@ -20,7 +20,7 @@ def _body_content(content: bytes) -> bytes:
     if text.startswith("---"):
         end = text.find("\n---", 3)
         if end != -1:
-            return text[end + 4:].encode()
+            return text[end + 4 :].encode()
     return content
 
 
@@ -86,6 +86,7 @@ def _flush_stat_index() -> None:
 def _normalize_path(path: Path) -> Path:
     """Normalize path for consistent cache keys across Windows path spellings."""
     import sys
+
     if sys.platform != "win32":
         return path
     s = str(path)
@@ -120,9 +121,7 @@ def file_hash(path: Path, root: Path = Path(".")) -> str:
     try:
         st = p.stat()
         entry = _stat_index.get(abs_key)
-        if (entry
-                and entry.get("size") == st.st_size
-                and entry.get("mtime_ns") == st.st_mtime_ns):
+        if entry and entry.get("size") == st.st_size and entry.get("mtime_ns") == st.st_mtime_ns:
             return entry["hash"]
     except OSError:
         pass
@@ -216,6 +215,7 @@ def save_cached(path: Path, result: dict, root: Path = Path("."), kind: str = "a
             # Windows: os.replace can fail with WinError 5 if the target is
             # briefly locked. Fall back to copy-then-delete.
             import shutil
+
             shutil.copy2(tmp_path, entry)
             os.unlink(tmp_path)
     except Exception:
@@ -313,7 +313,7 @@ def save_semantic_cache(
         src = e.get("source_file", "")
         if src:
             by_file[src]["edges"].append(e)
-    for h in (hyperedges or []):
+    for h in hyperedges or []:
         src = h.get("source_file", "")
         if src:
             by_file[src]["hyperedges"].append(h)
diff --git a/graphify/callflow_html.py b/graphify/callflow_html.py
index 6195adb98..9f67a9c07 100644
--- a/graphify/callflow_html.py
+++ b/graphify/callflow_html.py
@@ -28,6 +28,7 @@
 from pathlib import Path
 from collections import Counter, defaultdict
 from datetime import datetime, timezone
+from typing import Any, cast
 from html import escape
 
 
@@ -89,7 +90,8 @@
 # 2. Data loading and normalization helpers
 # ──────────────────────────────────────────────
 
-def read_json(path: str | Path, default=None):
+
+def read_json(path: str | Path | None, default: Any = None) -> Any:
     """Read JSON with a useful error message."""
     if not path:
         return default
@@ -204,7 +206,9 @@ def normalize_edge(raw: dict, index: int) -> dict | None:
     if not source or not target:
         return None
 
-    relation = first_present(edge, "relation", "type", "kind", "label", "predicate", default="relates")
+    relation = first_present(
+        edge, "relation", "type", "kind", "label", "predicate", default="relates"
+    )
     confidence = first_present(edge, "confidence", "evidence", "provenance", default="EXTRACTED")
     score = first_present(edge, "confidence_score", "score", "weight", "probability", default=1.0)
 
@@ -254,6 +258,7 @@ def load_graph(path: str | Path) -> tuple:
     """Load graph.json. Returns normalized (nodes, edges, hyperedges, metadata)."""
     if path:
         from graphify.security import check_graph_file_size_cap
+
         try:
             check_graph_file_size_cap(Path(path))
         except ValueError as exc:
@@ -262,16 +267,32 @@ def load_graph(path: str | Path) -> tuple:
     if not isinstance(data, dict):
         raise SystemExit(f"ERROR: graph file must contain a JSON object: {path}")
 
-    graph_block = data.get("graph") if isinstance(data.get("graph"), dict) else {}
-    meta_block = data.get("metadata") if isinstance(data.get("metadata"), dict) else {}
+    graph_block: dict[str, Any] = (
+        cast(dict[str, Any], data.get("graph")) if isinstance(data.get("graph"), dict) else {}
+    )
+    meta_block: dict[str, Any] = (
+        cast(dict[str, Any], data.get("metadata")) if isinstance(data.get("metadata"), dict) else {}
+    )
 
     node_link = _node_link_payload(data)
     if node_link:
         raw_nodes, raw_edges = node_link
     else:
-        raw_nodes = first_list(data.get("nodes"), data.get("vertices"), graph_block.get("nodes"), graph_block.get("vertices"))
-        raw_edges = first_list(data.get("links"), data.get("edges"), graph_block.get("links"), graph_block.get("edges"))
-    hyperedges = first_list(data.get("hyperedges"), graph_block.get("hyperedges"), data.get("groups"), graph_block.get("groups"))
+        raw_nodes = first_list(
+            data.get("nodes"),
+            data.get("vertices"),
+            graph_block.get("nodes"),
+            graph_block.get("vertices"),
+        )
+        raw_edges = first_list(
+            data.get("links"), data.get("edges"), graph_block.get("links"), graph_block.get("edges")
+        )
+    hyperedges = first_list(
+        data.get("hyperedges"),
+        graph_block.get("hyperedges"),
+        data.get("groups"),
+        graph_block.get("groups"),
+    )
 
     nodes = [normalize_node(n, i) for i, n in enumerate(raw_nodes) if isinstance(n, dict)]
     edges = []
@@ -282,9 +303,16 @@ def load_graph(path: str | Path) -> tuple:
         if edge:
             edges.append(edge)
 
-    meta = dict(graph_block)
+    meta: dict[str, Any] = dict(graph_block)
     meta.update(meta_block)
-    for key in ("built_at_commit", "commit", "project_name", "repo", "repository", "language_breakdown"):
+    for key in (
+        "built_at_commit",
+        "commit",
+        "project_name",
+        "repo",
+        "repository",
+        "language_breakdown",
+    ):
         if data.get(key) and not meta.get(key):
             meta[key] = data.get(key)
     if meta.get("commit") and not meta.get("built_at_commit"):
@@ -331,6 +359,7 @@ def load_report(path: str | Path | None) -> str:
 # 3. Mermaid-safe label helpers
 # ──────────────────────────────────────────────
 
+
 def safe_mermaid_text(text: str) -> str:
     """Sanitize text for use inside a Mermaid node label.
 
@@ -344,10 +373,10 @@ def safe_mermaid_text(text: str) -> str:
     """
     text = str(text or "")
     text = text.replace('"', "'")
-    text = text.replace('`', '')
-    text = text.replace('#', '')
-    text = text.replace('|', ' ')
-    text = text.replace('{', '').replace('}', '')
+    text = text.replace("`", "")
+    text = text.replace("#", "")
+    text = text.replace("|", " ")
+    text = text.replace("{", "").replace("}", "")
     text = text.replace("->>", " to ").replace("-->", " to ").replace("->", " to ")
     text = " ".join(text.split())
     return escape(text, quote=False)
@@ -424,7 +453,9 @@ def resolve_graphify_paths(args) -> dict:
     project_root = graphify_out.parent if graphify_out.name == "graphify-out" else base
     graph = Path(args.graph).expanduser() if args.graph else graphify_out / "graph.json"
     report = Path(args.report).expanduser() if args.report else graphify_out / "GRAPH_REPORT.md"
-    labels = Path(args.labels).expanduser() if args.labels else graphify_out / ".graphify_labels.json"
+    labels = (
+        Path(args.labels).expanduser() if args.labels else graphify_out / ".graphify_labels.json"
+    )
     sections = Path(args.sections).expanduser() if args.sections else None
     return {
         "base": project_root,
@@ -509,13 +540,39 @@ def node_kind(node: dict) -> str:
     if any(word in label for word in ("async", "await", "stream", "sse")):
         return "async"
     raw_label = str(node.get("label") or "")
-    hook_like = raw_label.startswith("use") and len(raw_label) > 3 and (raw_label[3].isupper() or raw_label[3] in "_-")
-    if any(word in label for word in ("component", "props", "hook", "store")) or hook_like or source_file.endswith((".tsx", ".jsx", ".vue", ".svelte")):
+    hook_like = (
+        raw_label.startswith("use")
+        and len(raw_label) > 3
+        and (raw_label[3].isupper() or raw_label[3] in "_-")
+    )
+    if (
+        any(word in label for word in ("component", "props", "hook", "store"))
+        or hook_like
+        or source_file.endswith((".tsx", ".jsx", ".vue", ".svelte"))
+    ):
         return "ui"
     raw = raw_label
     if raw[:1].isupper() and not raw.endswith("()"):
         return "klass"
-    if raw.endswith((".py", ".ts", ".tsx", ".js", ".jsx", ".go", ".rs", ".java", ".kt", ".rb", ".php", ".cs", ".swift", ".vue", ".svelte")):
+    if raw.endswith(
+        (
+            ".py",
+            ".ts",
+            ".tsx",
+            ".js",
+            ".jsx",
+            ".go",
+            ".rs",
+            ".java",
+            ".kt",
+            ".rb",
+            ".php",
+            ".cs",
+            ".swift",
+            ".vue",
+            ".svelte",
+        )
+    ):
         return "module"
     return "function"
 
@@ -632,6 +689,7 @@ def mermaid_class_defs() -> list:
 # 4. Community and section indexing
 # ──────────────────────────────────────────────
 
+
 def build_community_index(nodes: list) -> dict:
     """Map community_id (str) -> list of nodes."""
     idx = defaultdict(list)
@@ -652,7 +710,9 @@ def html_anchor_id(raw: str, fallback: str, used: set) -> str:
     base = base[:48].strip("-") or "section"
     candidate = base
     if candidate in used:
-        candidate = f"{base}-{hashlib.sha1(raw.encode('utf-8'), usedforsecurity=False).hexdigest()[:6]}"
+        candidate = (
+            f"{base}-{hashlib.sha1(raw.encode('utf-8'), usedforsecurity=False).hexdigest()[:6]}"
+        )
     suffix = 2
     while candidate in used:
         candidate = f"{base}-{suffix}"
@@ -688,11 +748,13 @@ def normalize_sections(sections: list, lang: str) -> list:
             continue
 
         sid = html_anchor_id(raw_id, f"section-{index}", used)
-        normalized.append({
-            "id": sid,
-            "name": raw_name,
-            "communities": normalize_communities(raw.get("communities", raw.get("community"))),
-        })
+        normalized.append(
+            {
+                "id": sid,
+                "name": raw_name,
+                "communities": normalize_communities(raw.get("communities", raw.get("community"))),
+            }
+        )
     return normalized
 
 
@@ -712,9 +774,22 @@ def label_for_community(cid: str, labels: dict, nodes: list, lang: str) -> str:
         "提取管线",
         "Extraction Pipeline",
         {
-            "extract", "extractor", "tree", "sitter", "parser", "language",
-            "python", "javascript", "typescript", "rust", "java", "go",
-            "ast", "calls", "imports", "multilang",
+            "extract",
+            "extractor",
+            "tree",
+            "sitter",
+            "parser",
+            "language",
+            "python",
+            "javascript",
+            "typescript",
+            "rust",
+            "java",
+            "go",
+            "ast",
+            "calls",
+            "imports",
+            "multilang",
         },
     ),
     (
@@ -722,8 +797,17 @@ def label_for_community(cid: str, labels: dict, nodes: list, lang: str) -> str:
         "图谱构建",
         "Graph Build",
         {
-            "build", "graph", "merge", "dedup", "node", "edge", "hyperedge",
-            "json", "schema", "normalize", "confidence",
+            "build",
+            "graph",
+            "merge",
+            "dedup",
+            "node",
+            "edge",
+            "hyperedge",
+            "json",
+            "schema",
+            "normalize",
+            "confidence",
         },
     ),
     (
@@ -731,8 +815,18 @@ def label_for_community(cid: str, labels: dict, nodes: list, lang: str) -> str:
         "分析聚类",
         "Analysis & Clustering",
         {
-            "cluster", "community", "leiden", "cohesion", "analyze", "god",
-            "surprise", "question", "query", "path", "explain", "benchmark",
+            "cluster",
+            "community",
+            "leiden",
+            "cohesion",
+            "analyze",
+            "god",
+            "surprise",
+            "question",
+            "query",
+            "path",
+            "explain",
+            "benchmark",
         },
     ),
     (
@@ -740,8 +834,18 @@ def label_for_community(cid: str, labels: dict, nodes: list, lang: str) -> str:
         "输出文档",
         "Outputs & Docs",
         {
-            "export", "html", "wiki", "obsidian", "canvas", "svg", "graphml",
-            "report", "callflow", "mermaid", "tree", "documentation",
+            "export",
+            "html",
+            "wiki",
+            "obsidian",
+            "canvas",
+            "svg",
+            "graphml",
+            "report",
+            "callflow",
+            "mermaid",
+            "tree",
+            "documentation",
         },
     ),
     (
@@ -749,9 +853,20 @@ def label_for_community(cid: str, labels: dict, nodes: list, lang: str) -> str:
         "CLI 与技能安装",
         "CLI & Skill Installers",
         {
-            "main", "install", "uninstall", "skill", "agent", "claude",
-            "codex", "opencode", "aider", "copilot", "kiro", "vscode",
-            "hook", "command",
+            "main",
+            "install",
+            "uninstall",
+            "skill",
+            "agent",
+            "claude",
+            "codex",
+            "opencode",
+            "aider",
+            "copilot",
+            "kiro",
+            "vscode",
+            "hook",
+            "command",
         },
     ),
     (
@@ -759,9 +874,21 @@ def label_for_community(cid: str, labels: dict, nodes: list, lang: str) -> str:
         "摄取与增量更新",
         "Ingestion & Updates",
         {
-            "ingest", "fetch", "download", "url", "html", "markdown",
-            "cache", "manifest", "watch", "update", "incremental",
-            "transcribe", "video", "audio", "google",
+            "ingest",
+            "fetch",
+            "download",
+            "url",
+            "html",
+            "markdown",
+            "cache",
+            "manifest",
+            "watch",
+            "update",
+            "incremental",
+            "transcribe",
+            "video",
+            "audio",
+            "google",
         },
     ),
     (
@@ -769,8 +896,17 @@ def label_for_community(cid: str, labels: dict, nodes: list, lang: str) -> str:
         "服务 API",
         "Serving API",
         {
-            "serve", "api", "request", "response", "endpoint", "router",
-            "handle", "upload", "search", "delete", "enrich",
+            "serve",
+            "api",
+            "request",
+            "response",
+            "endpoint",
+            "router",
+            "handle",
+            "upload",
+            "search",
+            "delete",
+            "enrich",
         },
     ),
     (
@@ -778,8 +914,17 @@ def label_for_community(cid: str, labels: dict, nodes: list, lang: str) -> str:
         "安全与全局图",
         "Security & Global Graph",
         {
-            "security", "safe", "ssrf", "xss", "path", "traversal",
-            "global", "prefix", "prune", "repo", "clone",
+            "security",
+            "safe",
+            "ssrf",
+            "xss",
+            "path",
+            "traversal",
+            "global",
+            "prefix",
+            "prune",
+            "repo",
+            "clone",
         },
     ),
     (
@@ -787,8 +932,14 @@ def label_for_community(cid: str, labels: dict, nodes: list, lang: str) -> str:
         "测试与样例",
         "Tests & Fixtures",
         {
-            "test", "tests", "fixture", "fixtures", "sample", "assert",
-            "pytest", "mock",
+            "test",
+            "tests",
+            "fixture",
+            "fixtures",
+            "sample",
+            "assert",
+            "pytest",
+            "mock",
         },
     ),
 ]
@@ -826,14 +977,24 @@ def _rank_grouped_sections(grouped: dict, max_sections: int) -> tuple[list, list
     return selected, overflow_communities
 
 
-def derive_sections_from_communities(nodes: list, labels: dict, lang: str, max_sections: int) -> list:
+def derive_sections_from_communities(
+    nodes: list, labels: dict, lang: str, max_sections: int
+) -> list:
     """Derive architecture-oriented sections when no sections JSON is supplied."""
     comm_idx = build_community_index(nodes)
-    sections = [{"id": "overview", "name": pick_text(lang, "架构总览", "Architecture Overview"), "communities": []}]
+    sections = [
+        {
+            "id": "overview",
+            "name": pick_text(lang, "架构总览", "Architecture Overview"),
+            "communities": [],
+        }
+    ]
     grouped = {}
     unassigned = []
 
-    for cid, community_nodes in sorted(comm_idx.items(), key=lambda item: (-len(item[1]), str(item[0]))):
+    for cid, community_nodes in sorted(
+        comm_idx.items(), key=lambda item: (-len(item[1]), str(item[0]))
+    ):
         label = label_for_community(cid, labels, community_nodes, lang)
         text = _community_text(community_nodes, label)
         best = None
@@ -861,7 +1022,9 @@ def derive_sections_from_communities(nodes: list, labels: dict, lang: str, max_s
         else:
             unassigned.append((cid, community_nodes, label))
 
-    selected, overflow_communities = _rank_grouped_sections(grouped, max(1, int(max_sections or 15)) - 1)
+    selected, overflow_communities = _rank_grouped_sections(
+        grouped, max(1, int(max_sections or 15)) - 1
+    )
     sections.extend(
         {"id": sec["id"], "name": sec["name"], "communities": sec["communities"]}
         for sec in selected
@@ -869,15 +1032,19 @@ def derive_sections_from_communities(nodes: list, labels: dict, lang: str, max_s
 
     remaining_slots = max(0, int(max_sections or 15) - (len(sections) - 1) - 1)
     for cid, community_nodes, label in unassigned[:remaining_slots]:
-        sections.append({"id": str(label or f"community-{cid}"), "name": label, "communities": [cid]})
+        sections.append(
+            {"id": str(label or f"community-{cid}"), "name": label, "communities": [cid]}
+        )
 
     other_communities = overflow_communities + [cid for cid, _, _ in unassigned[remaining_slots:]]
     if other_communities:
-        sections.append({
-            "id": "other",
-            "name": pick_text(lang, "其他", "Other"),
-            "communities": other_communities,
-        })
+        sections.append(
+            {
+                "id": "other",
+                "name": pick_text(lang, "其他", "Other"),
+                "communities": other_communities,
+            }
+        )
     return sections
 
 
@@ -905,6 +1072,7 @@ def node_in_section(node_id: str, section_node_ids: set) -> bool:
 # 5. Edge analysis
 # ──────────────────────────────────────────────
 
+
 def classify_edges(edges: list, section_nodes_map: dict) -> dict:
     """Classify edges as intra-section or inter-section.
 
@@ -958,14 +1126,15 @@ def should_include_edge(edge: dict) -> bool:
 # 6. Mermaid diagram generators
 # ──────────────────────────────────────────────
 
-def node_degree_scores(edges: list) -> Counter:
+
+def node_degree_scores(edges: list) -> dict[str, float]:
     """Score nodes by useful edge participation."""
-    scores = Counter()
+    scores: defaultdict[str, float] = defaultdict(float)
     for edge in edges:
         score = edge_score(edge)
-        scores[edge.get("source", "")] += score
-        scores[edge.get("target", "")] += score
-    return scores
+        scores[str(edge.get("source", ""))] += score
+        scores[str(edge.get("target", ""))] += score
+    return dict(scores)
 
 
 def node_importance(node: dict) -> float:
@@ -1037,7 +1206,10 @@ def fallback_key(node: dict) -> tuple:
 
 def node_label(node: dict) -> str:
     """Build a readable Mermaid node label."""
-    label = humanize_label(node.get("label") or node.get("id"), node.get("source_file", ""))
+    label = humanize_label(
+        str(node.get("label") or node.get("id") or ""),
+        node.get("source_file", ""),
+    )
     source_file = safe_file_path(node.get("source_file", ""))
     if source_file and not label.endswith(Path(source_file).name):
         return f"{safe_mermaid_text(label)}<br/><small>{safe_mermaid_text(source_file)}</small>"
@@ -1056,7 +1228,9 @@ def group_nodes_by_file(nodes: list) -> dict:
 def section_edge_summary(classified_edges: dict) -> dict:
     """Aggregate inter-section edge counts and relation names."""
     node_section = classified_edges.get("node_section", {})
-    summary = defaultdict(lambda: {"count": 0, "relations": Counter()})
+    summary: defaultdict[tuple[Any, Any], dict[str, Any]] = defaultdict(
+        lambda: {"count": 0, "relations": Counter()}
+    )
     for edge in classified_edges.get("inter", []):
         if not should_include_edge(edge):
             continue
@@ -1070,9 +1244,14 @@ def section_edge_summary(classified_edges: dict) -> dict:
     return summary
 
 
-def generate_overview_graph(sections: list, section_nodes_map: dict,
-                             classified_edges: dict, labels: dict, lang: str,
-                             diagram_scale: float) -> str:
+def generate_overview_graph(
+    sections: list,
+    section_nodes_map: dict,
+    classified_edges: dict,
+    labels: dict,
+    lang: str,
+    diagram_scale: float,
+) -> str:
     """Generate a readable section-level architecture overview."""
     lines = [mermaid_init(diagram_scale, "LR")]
     section_defs = [sec for sec in sections if sec["id"] != "overview"]
@@ -1088,7 +1267,9 @@ def generate_overview_graph(sections: list, section_nodes_map: dict,
         lines.append(f"    class {sid} module;")
 
     aggregated = section_edge_summary(classified_edges)
-    for (src, tgt), data in sorted(aggregated.items(), key=lambda item: item[1]["count"], reverse=True)[:12]:
+    for (src, tgt), data in sorted(
+        aggregated.items(), key=lambda item: item[1]["count"], reverse=True
+    )[:12]:
         src_id = mermaid_section_id(src)
         tgt_id = mermaid_section_id(tgt)
         relation, _ = data["relations"].most_common(1)[0]
@@ -1099,19 +1280,29 @@ def generate_overview_graph(sections: list, section_nodes_map: dict,
 
     if not aggregated and len(section_defs) > 1:
         for prev, cur in zip(section_defs, section_defs[1:]):
-            lines.append(f"    {mermaid_section_id(prev['id'])} -.-> {mermaid_section_id(cur['id'])}")
+            lines.append(
+                f"    {mermaid_section_id(prev['id'])} -.-> {mermaid_section_id(cur['id'])}"
+            )
 
     lines.extend(mermaid_class_defs())
     return "\n".join(lines)
 
 
-def generate_section_flowchart(section_id: str, section_name: str,
-                                nodes: list, edges: list, lang: str,
-                                diagram_scale: float, max_nodes: int,
-                                max_edges: int) -> str:
+def generate_section_flowchart(
+    section_id: str,
+    section_name: str,
+    nodes: list,
+    edges: list,
+    lang: str,
+    diagram_scale: float,
+    max_nodes: int,
+    max_edges: int,
+) -> str:
     """Generate a compact, human-readable call-flow chart for a section."""
     lines = [mermaid_init(diagram_scale, "LR")]
-    lines.append(f"    %% Section: {safe_mermaid_text(section_name)} ({len(nodes)} nodes, {len(edges)} edges)")
+    lines.append(
+        f"    %% Section: {safe_mermaid_text(section_name)} ({len(nodes)} nodes, {len(edges)} edges)"
+    )
 
     if not nodes:
         empty_label = pick_text(lang, f"{section_name} - 无节点", f"{section_name} - no nodes")
@@ -1122,12 +1313,14 @@ def generate_section_flowchart(section_id: str, section_name: str,
     selected_nodes = select_diagram_nodes(nodes, edges, max_nodes)
     selected_ids = {node.get("id") for node in selected_nodes}
     visible_edges = [
-        edge for edge in preferred_edges(edges, allow_structure=False)
+        edge
+        for edge in preferred_edges(edges, allow_structure=False)
         if edge.get("source") in selected_ids and edge.get("target") in selected_ids
     ]
     if not visible_edges:
         visible_edges = [
-            edge for edge in preferred_edges(edges, allow_structure=True)
+            edge
+            for edge in preferred_edges(edges, allow_structure=True)
             if edge.get("source") in selected_ids and edge.get("target") in selected_ids
         ]
 
@@ -1160,7 +1353,9 @@ def generate_section_flowchart(section_id: str, section_name: str,
     omitted_nodes = max(0, len(nodes) - len(selected_nodes))
     omitted_edges = max(0, len(visible_edges) - included)
     if omitted_nodes or omitted_edges:
-        lines.append(f"    %% Omitted for readability: {omitted_nodes} nodes, {omitted_edges} edges")
+        lines.append(
+            f"    %% Omitted for readability: {omitted_nodes} nodes, {omitted_edges} edges"
+        )
     lines.extend(class_lines)
     lines.extend(mermaid_class_defs())
     return "\n".join(lines)
@@ -1170,6 +1365,7 @@ def generate_section_flowchart(section_id: str, section_name: str,
 # 7. HTML generators
 # ──────────────────────────────────────────────
 
+
 def generate_nav(sections: list) -> str:
     """Generate the sticky navigation bar."""
     links = []
@@ -1186,21 +1382,33 @@ def node_display_name(node: dict | None, fallback: str = "") -> str:
     return humanize_label(label, node.get("source_file", ""))
 
 
-def format_node_refs(node_ids: set, node_by_id: dict, lang: str, empty_text: str, limit: int = 3) -> str:
+def format_node_refs(
+    node_ids: set, node_by_id: dict, lang: str, empty_text: str, limit: int = 3
+) -> str:
     """Render node references as readable labels instead of internal IDs."""
     if not node_ids:
         return escape(empty_text)
     parts = []
-    for nid in sorted(node_ids, key=lambda item: node_display_name(node_by_id.get(item), item).lower())[:limit]:
+    for nid in sorted(
+        node_ids, key=lambda item: node_display_name(node_by_id.get(item), item).lower()
+    )[:limit]:
         node = node_by_id.get(nid)
         label = node_display_name(node, nid)
         source = safe_file_path((node or {}).get("source_file", ""))
         if source:
-            parts.append(f"<code>{escape(label)}</code><br><small style=\"color:var(--muted)\">{escape(source)}</small>")
+            parts.append(
+                f'<code>{escape(label)}</code><br><small style="color:var(--muted)">{escape(source)}</small>'
+            )
         else:
             parts.append(f"<code>{escape(label)}</code>")
     if len(node_ids) > limit:
-        parts.append(escape(pick_text(lang, f"+{len(node_ids) - limit} 个更多", f"+{len(node_ids) - limit} more")))
+        parts.append(
+            escape(
+                pick_text(
+                    lang, f"+{len(node_ids) - limit} 个更多", f"+{len(node_ids) - limit} more"
+                )
+            )
+        )
     return "<br>".join(parts)
 
 
@@ -1297,31 +1505,71 @@ def _describe_node(label: str, source_file: str, file_type: str, lang: str) -> s
     if file_type == "rationale":
         return pick_text(lang, f"设计说明：{label}", f"Design note for {label}.")
     if file_type == "document":
-        return pick_text(lang, f"文档入口，描述 {label} 相关能力。", f"Documentation node describing {label}.")
+        return pick_text(
+            lang, f"文档入口，描述 {label} 相关能力。", f"Documentation node describing {label}."
+        )
     if label.endswith(".py") or label.endswith(".tsx") or label.endswith(".ts"):
-        return pick_text(lang, f"{source} 中的模块文件，承载该层主要实现。", f"Module file in {source}.")
+        return pick_text(
+            lang, f"{source} 中的模块文件，承载该层主要实现。", f"Module file in {source}."
+        )
     if "config" in lower:
-        return pick_text(lang, "读取、解析或持久化项目配置。", "Reads, resolves, or persists project configuration.")
+        return pick_text(
+            lang,
+            "读取、解析或持久化项目配置。",
+            "Reads, resolves, or persists project configuration.",
+        )
     if "scan" in lower:
-        return pick_text(lang, "触发项目扫描或处理扫描状态。", "Starts scanning or handles scan status.")
+        return pick_text(
+            lang, "触发项目扫描或处理扫描状态。", "Starts scanning or handles scan status."
+        )
     if "ingest" in lower or "clone" in lower or "git" in lower:
-        return pick_text(lang, "把本地目录或远程仓库转换为分析上下文。", "Turns a local path or remote repository into analysis context.")
+        return pick_text(
+            lang,
+            "把本地目录或远程仓库转换为分析上下文。",
+            "Turns a local path or remote repository into analysis context.",
+        )
     if "prompt" in lower:
-        return pick_text(lang, "构造发送给 LLM 的结构化提示。", "Builds structured prompts for model calls.")
+        return pick_text(
+            lang, "构造发送给 LLM 的结构化提示。", "Builds structured prompts for model calls."
+        )
     if "analy" in lower:
-        return pick_text(lang, "编排分析流程并产出结构化文档数据。", "Orchestrates analysis and returns structured documentation data.")
+        return pick_text(
+            lang,
+            "编排分析流程并产出结构化文档数据。",
+            "Orchestrates analysis and returns structured documentation data.",
+        )
     if "graph" in lower or "dependency" in lower:
-        return pick_text(lang, "构建依赖关系并提供排序或图形化数据。", "Builds dependency relationships and graph data.")
+        return pick_text(
+            lang,
+            "构建依赖关系并提供排序或图形化数据。",
+            "Builds dependency relationships and graph data.",
+        )
     if "export" in lower or "markdown" in lower or "html" in lower:
-        return pick_text(lang, "将文档数据导出为目标格式。", "Exports documentation data to a target format.")
+        return pick_text(
+            lang, "将文档数据导出为目标格式。", "Exports documentation data to a target format."
+        )
     if "chat" in lower or "rag" in lower or "retrieve" in lower:
-        return pick_text(lang, "支撑检索增强问答或流式聊天。", "Supports retrieval-augmented Q&A or streaming chat.")
+        return pick_text(
+            lang,
+            "支撑检索增强问答或流式聊天。",
+            "Supports retrieval-augmented Q&A or streaming chat.",
+        )
     if "wiki" in lower or "page" in lower or "sidebar" in lower:
-        return pick_text(lang, "组织文档页面、侧边栏或内容读取。", "Organizes documentation pages, navigation, or content lookup.")
+        return pick_text(
+            lang,
+            "组织文档页面、侧边栏或内容读取。",
+            "Organizes documentation pages, navigation, or content lookup.",
+        )
     if "cache" in lower or "hash" in lower:
-        return pick_text(lang, "缓存分析结果或生成缓存键。", "Caches analysis results or computes cache keys.")
+        return pick_text(
+            lang, "缓存分析结果或生成缓存键。", "Caches analysis results or computes cache keys."
+        )
     if "test" in lower:
-        return pick_text(lang, "验证导入、入口点或版本等基础行为。", "Verifies imports, entry points, or version behavior.")
+        return pick_text(
+            lang,
+            "验证导入、入口点或版本等基础行为。",
+            "Verifies imports, entry points, or version behavior.",
+        )
     return pick_text(lang, f"{source} 中的 {label} 节点。", f"{label} node in {source}.")
 
 
@@ -1370,7 +1618,9 @@ def derive_flow_chain(sections: list, classified_edges: dict) -> str:
     seen = {start}
     current = start
     while len(chain) < min(7, len(order)):
-        candidates = [(count, tgt) for tgt, count in outgoing.get(current, {}).items() if tgt not in seen]
+        candidates = [
+            (count, tgt) for tgt, count in outgoing.get(current, {}).items() if tgt not in seen
+        ]
         if candidates:
             _, nxt = max(candidates)
         else:
@@ -1384,9 +1634,14 @@ def derive_flow_chain(sections: list, classified_edges: dict) -> str:
     return " -> ".join(section_names.get(sid, sid) for sid in chain)
 
 
-def generate_overview_cards(meta: dict, report_text: str, sections: list,
-                            section_nodes_map: dict, classified_edges: dict,
-                            lang: str) -> str:
+def generate_overview_cards(
+    meta: dict,
+    report_text: str,
+    sections: list,
+    section_nodes_map: dict,
+    classified_edges: dict,
+    lang: str,
+) -> str:
     """Generate generic overview cards."""
     rows = []
     for sec in sections:
@@ -1400,14 +1655,18 @@ def generate_overview_cards(meta: dict, report_text: str, sections: list,
 
     flow = derive_flow_chain(sections, classified_edges)
     layer_title = pick_text(lang, "架构层次", "Architecture Layers")
-    layer_cols = pick_text(lang, "<tr><th>层</th><th>节点</th><th>社区</th></tr>", "<tr><th>Layer</th><th>Nodes</th><th>Communities</th></tr>")
+    layer_cols = pick_text(
+        lang,
+        "<tr><th>层</th><th>节点</th><th>社区</th></tr>",
+        "<tr><th>Layer</th><th>Nodes</th><th>Communities</th></tr>",
+    )
     flow_title = pick_text(lang, "核心数据流", "Core Flow")
     return f"""<div class="grid">
   <div class="card">
     <h4>{layer_title}</h4>
     <table style="width:100%;font-size:0.85rem;">
       {layer_cols}
-      {''.join(rows)}
+      {"".join(rows)}
     </table>
   </div>
   <div class="card">
@@ -1421,12 +1680,40 @@ def section_keywords(nodes: list, limit: int = 5) -> list:
     """Pick representative words from labels and file names."""
     counts = Counter()
     stopwords = {
-        "the", "and", "for", "with", "from", "this", "that", "class", "function",
-        "method", "file", "src", "lib", "core", "index", "main", "init", "py",
-        "ts", "tsx", "js", "jsx", "go", "rs", "java", "html", "css",
+        "the",
+        "and",
+        "for",
+        "with",
+        "from",
+        "this",
+        "that",
+        "class",
+        "function",
+        "method",
+        "file",
+        "src",
+        "lib",
+        "core",
+        "index",
+        "main",
+        "init",
+        "py",
+        "ts",
+        "tsx",
+        "js",
+        "jsx",
+        "go",
+        "rs",
+        "java",
+        "html",
+        "css",
     }
     for node in nodes:
-        text = f"{node.get('label', '')} {node.get('source_file', '')}".replace("/", " ").replace("_", " ").replace("-", " ")
+        text = (
+            f"{node.get('label', '')} {node.get('source_file', '')}".replace("/", " ")
+            .replace("_", " ")
+            .replace("-", " ")
+        )
         for raw in text.split():
             word = "".join(ch for ch in raw.lower() if ch.isalnum())
             if len(word) < 3 or word in stopwords:
@@ -1475,10 +1762,16 @@ def generate_section_cards(sec: dict, nodes: list, section_edges: list, lang: st
     else:
         file_rows = f'<tr><td colspan="2">{escape(pick_text(lang, "无源文件映射", "No source file mapping"))}</td></tr>'
 
-    relation_counts = Counter(edge.get("relation", "relates") for edge in section_edges if should_include_edge(edge))
-    relation_text = ", ".join(f"{relation_label(rel, lang)} x{count}" for rel, count in relation_counts.most_common(4))
+    relation_counts = Counter(
+        edge.get("relation", "relates") for edge in section_edges if should_include_edge(edge)
+    )
+    relation_text = ", ".join(
+        f"{relation_label(rel, lang)} x{count}" for rel, count in relation_counts.most_common(4)
+    )
     if not relation_text:
-        relation_text = pick_text(lang, "未检测到高置信调用边", "No high-confidence call edges detected")
+        relation_text = pick_text(
+            lang, "未检测到高置信调用边", "No high-confidence call edges detected"
+        )
     note = pick_text(
         lang,
         f"本节由 graphify 社区聚类生成。关系概况：{relation_text}。图表优先展示高置信、跨节点调用或使用关系，完整节点清单位于表格中。",
@@ -1506,6 +1799,7 @@ def generate_section_cards(sec: dict, nodes: list, section_edges: list, lang: st
 # 8. Main entry point
 # ──────────────────────────────────────────────
 
+
 class CallflowOptions:
     """Options for call-flow architecture HTML generation."""
 
@@ -1615,27 +1909,39 @@ def write_callflow_html(
 
     # Load data
     nodes, edges, hyperedges, meta = load_graph(paths["graph"])
-    labels = load_labels(paths["labels"])
-    lang = detect_lang(args.lang, nodes, labels)
+    loaded_labels = load_labels(paths["labels"])
+    lang = detect_lang(args.lang, nodes, loaded_labels)
     if paths["sections"]:
-        sections = load_sections(paths["sections"])
+        flow_sections = load_sections(paths["sections"])
     else:
-        sections = derive_sections_from_communities(nodes, labels, lang, args.max_sections)
-    sections = normalize_sections(sections, lang)
+        flow_sections = derive_sections_from_communities(
+            nodes, loaded_labels, lang, args.max_sections
+        )
+    flow_sections = normalize_sections(flow_sections, lang)
     report_text = load_report(paths["report"])
 
     if not nodes:
         raise ValueError("graph.json contains 0 nodes")
-    if len(sections) <= 1:
+    if len(flow_sections) <= 1:
         raise ValueError("no sections defined")
 
     if verbose and len(nodes) >= 5000:
-        print("WARNING: Large graph -- Mermaid rendering may be slow. Consider --max-sections 5.", file=sys.stderr)
+        print(
+            "WARNING: Large graph -- Mermaid rendering may be slow. Consider --max-sections 5.",
+            file=sys.stderr,
+        )
 
     node_ids = {node.get("id") for node in nodes}
-    missing_endpoint_edges = [edge for edge in edges if edge.get("source") not in node_ids or edge.get("target") not in node_ids]
+    missing_endpoint_edges = [
+        edge
+        for edge in edges
+        if edge.get("source") not in node_ids or edge.get("target") not in node_ids
+    ]
     if verbose and missing_endpoint_edges:
-        print(f"WARNING: {len(missing_endpoint_edges)} edges reference nodes not present in graph.json.", file=sys.stderr)
+        print(
+            f"WARNING: {len(missing_endpoint_edges)} edges reference nodes not present in graph.json.",
+            file=sys.stderr,
+        )
 
     meta["project_name"] = infer_project_name(str(paths["graph"]), meta)
     meta["node_count"] = len(nodes)
@@ -1650,13 +1956,13 @@ def write_callflow_html(
         output_path = paths["graphify_out"] / f"{safe_filename(meta['project_name'])}-callflow.html"
 
     if verbose:
-        print(f"Loaded: {len(nodes)} nodes, {len(edges)} edges, {len(sections)} sections")
+        print(f"Loaded: {len(nodes)} nodes, {len(edges)} edges, {len(flow_sections)} sections")
         print(f"Graph: {paths['graph']}")
 
     # Build index
     comm_idx = build_community_index(nodes)
     meta["community_count"] = len(comm_idx)
-    section_nodes_map = build_section_node_map(sections, comm_idx)
+    section_nodes_map = build_section_node_map(flow_sections, comm_idx)
     classified = classify_edges(edges, section_nodes_map)
 
     # Build HTML
@@ -1684,19 +1990,31 @@ def write_callflow_html(
 """)
 
     # Header + nav
-    html.append(generate_header(sections, meta, lang))
+    html.append(generate_header(flow_sections, meta, lang))
 
     # ── Architecture Overview (Section "overview") ──
-    overview_name = sections[0].get("name", "Architecture Overview") if sections else "Architecture Overview"
+    overview_name = (
+        flow_sections[0].get("name", "Architecture Overview")
+        if flow_sections
+        else "Architecture Overview"
+    )
     html.append(f"""<!-- ====== Architecture Overview ====== -->
 <h2 id="overview">1. {escape(str(overview_name))}</h2>
 
 <div class="mermaid">
 """)
-    html.append(generate_overview_graph(sections, section_nodes_map, classified, labels, lang, args.diagram_scale))
+    html.append(
+        generate_overview_graph(
+            flow_sections, section_nodes_map, classified, loaded_labels, lang, args.diagram_scale
+        )
+    )
     html.append("""</div>
 """)
-    html.append(generate_overview_cards(meta, report_text, sections, section_nodes_map, classified, lang))
+    html.append(
+        generate_overview_cards(
+            meta, report_text, flow_sections, section_nodes_map, classified, lang
+        )
+    )
     report_card = _report_highlights(report_text, lang)
     if report_card:
         html.append(f'<div class="grid">\n  {report_card}\n</div>')
@@ -1704,7 +2022,7 @@ def write_callflow_html(
 
     # ── Per-section content ──
     section_num = 1  # overview was #1
-    for sec in sections:
+    for sec in flow_sections:
         if sec["id"] == "overview":
             continue
         section_num += 1
@@ -1769,7 +2087,7 @@ def write_callflow_html(
         html.append("</div>\n<hr>")
 
     # ── Section: Statistics ──
-    total_sections = sum(1 for s in sections if s["id"] != "overview")
+    total_sections = sum(1 for s in flow_sections if s["id"] != "overview")
     html.append(f"""<h2 id="stats">Project Statistics</h2>
 
 <div class="grid">
@@ -1786,9 +2104,9 @@ def write_callflow_html(
   <div class="card">
     <h4>Edge Confidence</h4>
     <table style="width:100%;font-size:0.85rem;">
-      <tr><td>EXTRACTED</td><td>{sum(1 for e in edges if e.get('confidence') == 'EXTRACTED')}</td></tr>
-      <tr><td>INFERRED</td><td>{sum(1 for e in edges if e.get('confidence') == 'INFERRED')}</td></tr>
-      <tr><td>AMBIGUOUS</td><td>{sum(1 for e in edges if e.get('confidence') == 'AMBIGUOUS')}</td></tr>
+      <tr><td>EXTRACTED</td><td>{sum(1 for e in edges if e.get("confidence") == "EXTRACTED")}</td></tr>
+      <tr><td>INFERRED</td><td>{sum(1 for e in edges if e.get("confidence") == "INFERRED")}</td></tr>
+      <tr><td>AMBIGUOUS</td><td>{sum(1 for e in edges if e.get("confidence") == "AMBIGUOUS")}</td></tr>
     </table>
   </div>
 </div>
@@ -1796,8 +2114,8 @@ def write_callflow_html(
 
     # ── Footer ──
     html.append(f"""<div style="text-align:center; padding:40px 0; color: var(--muted); font-size:0.9rem;">
-  <p>{escape(str(meta.get('project_name', 'Project')))} — Architecture Documentation</p>
-  <p>Generated: {datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M UTC')} · graphify callflow-html</p>
+  <p>{escape(str(meta.get("project_name", "Project")))} — Architecture Documentation</p>
+  <p>Generated: {datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC")} · graphify callflow-html</p>
 </div>
 """)
 
@@ -1967,11 +2285,13 @@ def write_callflow_html(
     # Summary
     mermaid_count = output.count('<div class="mermaid">')
     table_count = output.count('<table class="call-table">')
-    section_count = output.count('<h2 id=')
+    section_count = output.count("<h2 id=")
 
     if verbose:
         print(f"Call-flow HTML written: {output_path}")
-        print(f"  Sections: {section_count}  |  Mermaid diagrams: {mermaid_count}  |  Call tables: {table_count}")
+        print(
+            f"  Sections: {section_count}  |  Mermaid diagrams: {mermaid_count}  |  Call tables: {table_count}"
+        )
         print("  Diagrams use Mermaid init directives plus interactive zoom/pan controls.")
 
     return output_path
@@ -1981,18 +2301,44 @@ def main():
     parser = argparse.ArgumentParser(
         description="Generate call-flow architecture HTML from graphify knowledge graph outputs"
     )
-    parser.add_argument("project", nargs="?", default=None, help="Project root or graphify output directory")
+    parser.add_argument(
+        "project", nargs="?", default=None, help="Project root or graphify output directory"
+    )
     parser.add_argument("--graphify-out", default=None, help="Path to graphify output directory")
     parser.add_argument("--graph", default=None, help="Path to graph.json")
     parser.add_argument("--report", default=None, help="Path to GRAPH_REPORT.md")
     parser.add_argument("--labels", default=None, help="Path to .graphify_labels.json")
-    parser.add_argument("--sections", default=None, help="Path to sections JSON file; auto-derived when omitted")
+    parser.add_argument(
+        "--sections", default=None, help="Path to sections JSON file; auto-derived when omitted"
+    )
     parser.add_argument("--output", default=None, help="Output HTML path")
-    parser.add_argument("--lang", default="auto", help="HTML language: auto, zh-CN, en, etc. (default: auto)")
-    parser.add_argument("--max-sections", type=int, default=15, help="Maximum auto-derived sections, excluding overview")
-    parser.add_argument("--diagram-scale", type=float, default=1.0, help="Mermaid-native diagram scale via init directive (0.65-1.8)")
-    parser.add_argument("--max-diagram-nodes", type=int, default=18, help="Maximum representative nodes per section diagram")
-    parser.add_argument("--max-diagram-edges", type=int, default=24, help="Maximum representative edges per section diagram")
+    parser.add_argument(
+        "--lang", default="auto", help="HTML language: auto, zh-CN, en, etc. (default: auto)"
+    )
+    parser.add_argument(
+        "--max-sections",
+        type=int,
+        default=15,
+        help="Maximum auto-derived sections, excluding overview",
+    )
+    parser.add_argument(
+        "--diagram-scale",
+        type=float,
+        default=1.0,
+        help="Mermaid-native diagram scale via init directive (0.65-1.8)",
+    )
+    parser.add_argument(
+        "--max-diagram-nodes",
+        type=int,
+        default=18,
+        help="Maximum representative nodes per section diagram",
+    )
+    parser.add_argument(
+        "--max-diagram-edges",
+        type=int,
+        default=24,
+        help="Maximum representative edges per section diagram",
+    )
     args = parser.parse_args()
 
     try:
diff --git a/graphify/cluster.py b/graphify/cluster.py
index a567e0ccf..04e9c2729 100644
--- a/graphify/cluster.py
+++ b/graphify/cluster.py
@@ -1,4 +1,5 @@
 """Community detection on NetworkX graphs. Uses Leiden (graspologic) if available, falls back to Louvain (networkx). Splits oversized communities. Returns cohesion scores."""
+
 from __future__ import annotations
 import contextlib
 import inspect
@@ -46,6 +47,7 @@ def _partition(G: nx.Graph, resolution: float = 1.0) -> dict[str, int]:
 
     try:
         from graspologic.partition import leiden
+
         lsig = inspect.signature(leiden).parameters
         kwargs: dict = {}
         if "random_seed" in lsig:
@@ -64,7 +66,7 @@ def _partition(G: nx.Graph, resolution: float = 1.0) -> dict[str, int]:
         finally:
             sys.stderr = old_stderr
         return result
-    except ImportError:
+    except (ImportError, SyntaxError, Warning):
         pass
 
     # Fallback: networkx louvain (available since networkx 2.7).
@@ -77,10 +79,10 @@ def _partition(G: nx.Graph, resolution: float = 1.0) -> dict[str, int]:
     return {node: cid for cid, nodes in enumerate(communities) for node in nodes}
 
 
-_MAX_COMMUNITY_FRACTION = 0.25   # communities larger than 25% of graph get split
-_MIN_SPLIT_SIZE = 10             # only split if community has at least this many nodes
-_COHESION_SPLIT_THRESHOLD = 0.05 # re-split communities with cohesion below this
-_COHESION_SPLIT_MIN_SIZE = 50    # only cohesion-split if community has at least this many nodes
+_MAX_COMMUNITY_FRACTION = 0.25  # communities larger than 25% of graph get split
+_MIN_SPLIT_SIZE = 10  # only split if community has at least this many nodes
+_COHESION_SPLIT_THRESHOLD = 0.05  # re-split communities with cohesion below this
+_COHESION_SPLIT_MIN_SIZE = 50  # only cohesion-split if community has at least this many nodes
 
 
 def cluster(
@@ -171,7 +173,10 @@ def cluster(
     # that bridge otherwise-unrelated subsystems (e.g. CLAUDE.md connected to everything).
     second_pass: list[list[str]] = []
     for nodes in final_communities:
-        if len(nodes) >= _COHESION_SPLIT_MIN_SIZE and cohesion_score(G, nodes) < _COHESION_SPLIT_THRESHOLD:
+        if (
+            len(nodes) >= _COHESION_SPLIT_MIN_SIZE
+            and cohesion_score(G, nodes) < _COHESION_SPLIT_THRESHOLD
+        ):
             splits = _split_community(G, nodes)
             second_pass.extend(splits if len(splits) > 1 else [nodes])
         else:
diff --git a/graphify/detect.py b/graphify/detect.py
index 36a0a184f..3b9ed81a8 100644
--- a/graphify/detect.py
+++ b/graphify/detect.py
@@ -5,6 +5,7 @@
 import os
 import re
 import shlex
+import sys
 from enum import Enum
 from pathlib import Path
 
@@ -25,23 +26,110 @@ class FileType(str, Enum):
 
 _MANIFEST_PATH = "graphify-out/manifest.json"
 
-CODE_EXTENSIONS = {'.py', '.ts', '.tsx', '.js', '.jsx', '.mjs', '.ejs', '.ets', '.go', '.rs', '.java', '.groovy', '.gradle', '.cpp', '.cc', '.cxx', '.c', '.h', '.hpp', '.rb', '.swift', '.kt', '.kts', '.cs', '.scala', '.php', '.lua', '.luau', '.toc', '.zig', '.ps1', '.ex', '.exs', '.m', '.mm', '.jl', '.vue', '.svelte', '.astro', '.dart', '.v', '.sv', '.svh', '.sql', '.r', '.f', '.F', '.f90', '.F90', '.f95', '.F95', '.f03', '.F03', '.f08', '.F08', '.pas', '.pp', '.dpr', '.dpk', '.lpr', '.inc', '.dfm', '.lfm', '.lpk', '.sh', '.bash', '.json', '.dm', '.dme', '.dmi', '.dmm', '.dmf', '.sln', '.csproj', '.fsproj', '.vbproj', '.razor', '.cshtml'}
-DOC_EXTENSIONS = {'.md', '.mdx', '.qmd', '.txt', '.rst', '.html', '.yaml', '.yml'}
-PAPER_EXTENSIONS = {'.pdf'}
-IMAGE_EXTENSIONS = {'.png', '.jpg', '.jpeg', '.gif', '.webp', '.svg'}
-OFFICE_EXTENSIONS = {'.docx', '.xlsx'}
-VIDEO_EXTENSIONS = {'.mp4', '.mov', '.webm', '.mkv', '.avi', '.m4v', '.mp3', '.wav', '.m4a', '.ogg'}
+CODE_EXTENSIONS = {
+    ".py",
+    ".ts",
+    ".tsx",
+    ".js",
+    ".jsx",
+    ".mjs",
+    ".ejs",
+    ".ets",
+    ".go",
+    ".rs",
+    ".java",
+    ".groovy",
+    ".gradle",
+    ".cpp",
+    ".cc",
+    ".cxx",
+    ".c",
+    ".h",
+    ".hpp",
+    ".rb",
+    ".swift",
+    ".kt",
+    ".kts",
+    ".cs",
+    ".scala",
+    ".php",
+    ".lua",
+    ".luau",
+    ".toc",
+    ".zig",
+    ".ps1",
+    ".ex",
+    ".exs",
+    ".m",
+    ".mm",
+    ".jl",
+    ".vue",
+    ".svelte",
+    ".astro",
+    ".dart",
+    ".v",
+    ".sv",
+    ".svh",
+    ".sql",
+    ".r",
+    ".f",
+    ".F",
+    ".f90",
+    ".F90",
+    ".f95",
+    ".F95",
+    ".f03",
+    ".F03",
+    ".f08",
+    ".F08",
+    ".pas",
+    ".pp",
+    ".dpr",
+    ".dpk",
+    ".lpr",
+    ".inc",
+    ".dfm",
+    ".lfm",
+    ".lpk",
+    ".sh",
+    ".bash",
+    ".json",
+    ".dm",
+    ".dme",
+    ".dmi",
+    ".dmm",
+    ".dmf",
+    ".sln",
+    ".csproj",
+    ".fsproj",
+    ".vbproj",
+    ".razor",
+    ".cshtml",
+}
+DOC_EXTENSIONS = {".md", ".mdx", ".qmd", ".txt", ".rst", ".html", ".yaml", ".yml"}
+PAPER_EXTENSIONS = {".pdf"}
+IMAGE_EXTENSIONS = {".png", ".jpg", ".jpeg", ".gif", ".webp", ".svg"}
+OFFICE_EXTENSIONS = {".docx", ".xlsx"}
+VIDEO_EXTENSIONS = {".mp4", ".mov", ".webm", ".mkv", ".avi", ".m4v", ".mp3", ".wav", ".m4a", ".ogg"}
 
-CORPUS_WARN_THRESHOLD = 50_000    # words - below this, warn "you may not need a graph"
+CORPUS_WARN_THRESHOLD = 50_000  # words - below this, warn "you may not need a graph"
 CORPUS_UPPER_THRESHOLD = 500_000  # words - above this, warn about token cost
-FILE_COUNT_UPPER = 500             # files - above this, warn about token cost
+FILE_COUNT_UPPER = 500  # files - above this, warn about token cost
 
 # Parent directories whose contents are always sensitive.
 # Checked against path.parts[:-1] (parents only) so a root-level file named
 # "credentials" or "secrets" is not falsely flagged by this stage.
-_SENSITIVE_DIRS = frozenset({
-    ".ssh", ".gnupg", ".aws", ".gcloud", "secrets", ".secrets", "credentials",
-})
+_SENSITIVE_DIRS = frozenset(
+    {
+        ".ssh",
+        ".gnupg",
+        ".aws",
+        ".gcloud",
+        "secrets",
+        ".secrets",
+        "credentials",
+    }
+)
 
 # Files that may contain secrets - skip silently.
 # Uses lookarounds instead of \b so underscore-prefixed names like api_token.txt
@@ -51,30 +139,33 @@ class FileType(str, Enum):
 # `token` is kept separate because its longer suffix "izer"/"ize" is the only
 # common false-positive; other keywords have no such well-known derivatives.
 _SENSITIVE_PATTERNS = [
-    re.compile(r'(^|[\\/])\.(env|envrc)(\.|$)', re.IGNORECASE),
-    re.compile(r'\.(pem|key|p12|pfx|cert|crt|der|p8)$', re.IGNORECASE),
-    re.compile(r'(?<![a-zA-Z0-9])(credential|secret|passwd|password|private_key)s?(?![a-zA-Z])', re.IGNORECASE),
-    re.compile(r'(?<![a-zA-Z0-9])tokens?(?![a-zA-Z])', re.IGNORECASE),
-    re.compile(r'(id_rsa|id_dsa|id_ecdsa|id_ed25519)(\.pub)?$'),
-    re.compile(r'(\.netrc|\.pgpass|\.htpasswd)$', re.IGNORECASE),
-    re.compile(r'(aws_credentials|gcloud_credentials|service.account)', re.IGNORECASE),
+    re.compile(r"(^|[\\/])\.(env|envrc)(\.|$)", re.IGNORECASE),
+    re.compile(r"\.(pem|key|p12|pfx|cert|crt|der|p8)$", re.IGNORECASE),
+    re.compile(
+        r"(?<![a-zA-Z0-9])(credential|secret|passwd|password|private_key)s?(?![a-zA-Z])",
+        re.IGNORECASE,
+    ),
+    re.compile(r"(?<![a-zA-Z0-9])tokens?(?![a-zA-Z])", re.IGNORECASE),
+    re.compile(r"(id_rsa|id_dsa|id_ecdsa|id_ed25519)(\.pub)?$"),
+    re.compile(r"(\.netrc|\.pgpass|\.htpasswd)$", re.IGNORECASE),
+    re.compile(r"(aws_credentials|gcloud_credentials|service.account)", re.IGNORECASE),
 ]
 
 # Signals that a .md/.txt file is actually a converted academic paper
 _PAPER_SIGNALS = [
-    re.compile(r'\barxiv\b', re.IGNORECASE),
-    re.compile(r'\bdoi\s*:', re.IGNORECASE),
-    re.compile(r'\babstract\b', re.IGNORECASE),
-    re.compile(r'\bproceedings\b', re.IGNORECASE),
-    re.compile(r'\bjournal\b', re.IGNORECASE),
-    re.compile(r'\bpreprint\b', re.IGNORECASE),
-    re.compile(r'\\cite\{'),          # LaTeX citation
-    re.compile(r'\[\d+\]'),           # Numbered citation [1], [23] (inline)
-    re.compile(r'\[\n\d+\n\]'),       # Numbered citation spread across lines (markdown conversion)
-    re.compile(r'eq\.\s*\d+|equation\s+\d+', re.IGNORECASE),
-    re.compile(r'\d{4}\.\d{4,5}'),   # arXiv ID like 1706.03762
-    re.compile(r'\bwe propose\b', re.IGNORECASE),   # common academic phrasing
-    re.compile(r'\bliterature\b', re.IGNORECASE),   # "from the literature"
+    re.compile(r"\barxiv\b", re.IGNORECASE),
+    re.compile(r"\bdoi\s*:", re.IGNORECASE),
+    re.compile(r"\babstract\b", re.IGNORECASE),
+    re.compile(r"\bproceedings\b", re.IGNORECASE),
+    re.compile(r"\bjournal\b", re.IGNORECASE),
+    re.compile(r"\bpreprint\b", re.IGNORECASE),
+    re.compile(r"\\cite\{"),  # LaTeX citation
+    re.compile(r"\[\d+\]"),  # Numbered citation [1], [23] (inline)
+    re.compile(r"\[\n\d+\n\]"),  # Numbered citation spread across lines (markdown conversion)
+    re.compile(r"eq\.\s*\d+|equation\s+\d+", re.IGNORECASE),
+    re.compile(r"\d{4}\.\d{4,5}"),  # arXiv ID like 1706.03762
+    re.compile(r"\bwe propose\b", re.IGNORECASE),  # common academic phrasing
+    re.compile(r"\bliterature\b", re.IGNORECASE),  # "from the literature"
 ]
 _PAPER_SIGNAL_THRESHOLD = 3  # need at least this many signals to call it a paper
 
@@ -106,10 +197,24 @@ def _looks_like_paper(path: Path) -> bool:
 
 
 _SHEBANG_CODE_INTERPRETERS = {
-    "python", "python3", "python2",
-    "ruby", "perl", "node", "nodejs",
-    "bash", "sh", "dash", "zsh", "fish", "ksh", "tcsh",
-    "lua", "php", "julia", "Rscript",
+    "python",
+    "python3",
+    "python2",
+    "ruby",
+    "perl",
+    "node",
+    "nodejs",
+    "bash",
+    "sh",
+    "dash",
+    "zsh",
+    "fish",
+    "ksh",
+    "tcsh",
+    "lua",
+    "php",
+    "julia",
+    "Rscript",
 }
 
 
@@ -152,7 +257,7 @@ def _env_command_args(args: list[str], *, allow_split: bool = True) -> list[str]
         arg = args[i]
 
         if arg == "--":
-            return args[i + 1:]
+            return args[i + 1 :]
 
         # Split-string forms: tokenize the packed payload, then re-parse it
         # as env args (so leading assignments/flags inside the payload are
@@ -162,36 +267,36 @@ def _env_command_args(args: list[str], *, allow_split: bool = True) -> list[str]
                 if i + 1 >= len(args):
                     return []
                 return _env_command_args(
-                    _split_env_s(" ".join(args[i + 1:]), []),
+                    _split_env_s(" ".join(args[i + 1 :]), []),
                     allow_split=False,
                 )
             if arg.startswith("-S") and len(arg) > 2:
                 return _env_command_args(
-                    _split_env_s(arg[2:], args[i + 1:]),
+                    _split_env_s(arg[2:], args[i + 1 :]),
                     allow_split=False,
                 )
             if arg == "-vS":
                 if i + 1 >= len(args):
                     return []
                 return _env_command_args(
-                    _split_env_s(" ".join(args[i + 1:]), []),
+                    _split_env_s(" ".join(args[i + 1 :]), []),
                     allow_split=False,
                 )
             if arg.startswith("-vS") and len(arg) > 3:
                 return _env_command_args(
-                    _split_env_s(arg[3:], args[i + 1:]),
+                    _split_env_s(arg[3:], args[i + 1 :]),
                     allow_split=False,
                 )
             if arg.startswith("--split-string="):
                 return _env_command_args(
-                    _split_env_s(arg.split("=", 1)[1], args[i + 1:]),
+                    _split_env_s(arg.split("=", 1)[1], args[i + 1 :]),
                     allow_split=False,
                 )
             if arg == "--split-string":
                 if i + 1 >= len(args):
                     return []
                 return _env_command_args(
-                    _split_env_s(args[i + 1], args[i + 2:]),
+                    _split_env_s(args[i + 1], args[i + 2 :]),
                     allow_split=False,
                 )
 
@@ -203,11 +308,7 @@ def _env_command_args(args: list[str], *, allow_split: bool = True) -> list[str]
             continue
 
         # Clumped short option + operand
-        if (
-            arg.startswith(("-u", "-C", "-P", "-a"))
-            and len(arg) > 2
-            and not arg.startswith("--")
-        ):
+        if arg.startswith(("-u", "-C", "-P", "-a")) and len(arg) > 2 and not arg.startswith("--"):
             i += 1
             continue
 
@@ -217,8 +318,16 @@ def _env_command_args(args: list[str], *, allow_split: bool = True) -> list[str]
             continue
 
         # No-operand flags
-        if arg in {"-", "-i", "-0", "-v", "--ignore-environment", "--null",
-                   "--debug", "--list-signal-handling"}:
+        if arg in {
+            "-",
+            "-i",
+            "-0",
+            "-v",
+            "--ignore-environment",
+            "--null",
+            "--debug",
+            "--list-signal-handling",
+        }:
             i += 1
             continue
 
@@ -320,6 +429,7 @@ def extract_pdf_text(path: Path) -> str:
     """Extract plain text from a PDF file using pypdf."""
     try:
         from pypdf import PdfReader
+
         reader = PdfReader(str(path))
         pages = []
         for page in reader.pages:
@@ -335,11 +445,11 @@ def docx_to_markdown(path: Path) -> str:
     """Convert a .docx file to markdown text using python-docx."""
     try:
         from docx import Document
-        from docx.oxml.ns import qn
+
         doc = Document(str(path))
         lines = []
         for para in doc.paragraphs:
-            style = para.style.name if para.style else ""
+            style = str(para.style.name or "") if para.style else ""
             text = para.text.strip()
             if not text:
                 lines.append("")
@@ -375,6 +485,7 @@ def xlsx_to_markdown(path: Path) -> str:
     """Convert an .xlsx file to markdown text using openpyxl."""
     try:
         import openpyxl
+
         wb = openpyxl.load_workbook(str(path), read_only=True, data_only=True)
         sections = []
         for sheet_name in wb.sheetnames:
@@ -407,6 +518,7 @@ def xlsx_extract_structure(path: Path) -> dict:
     Returns a nodes/edges dict compatible with the graphify extract pipeline.
     Used in addition to xlsx_to_markdown so Claude sees both structure and content.
     """
+
     def _nid(*parts: str) -> str:
         return re.sub(r"[^a-z0-9_]", "_", "_".join(p.lower() for p in parts).strip("_"))
 
@@ -428,21 +540,43 @@ def _nid(*parts: str) -> str:
     stem = re.sub(r"[^a-z0-9]", "_", path.stem.lower())
     str_path = str(path)
     file_nid = _nid(str_path)
-    nodes: list[dict] = [{"id": file_nid, "label": path.name, "file_type": "document",
-                           "source_file": str_path, "source_location": None}]
+    nodes: list[dict] = [
+        {
+            "id": file_nid,
+            "label": path.name,
+            "file_type": "document",
+            "source_file": str_path,
+            "source_location": None,
+        }
+    ]
     edges: list[dict] = []
     seen: set[str] = {file_nid}
 
     def _add(nid: str, label: str) -> None:
         if nid not in seen:
             seen.add(nid)
-            nodes.append({"id": nid, "label": label, "file_type": "document",
-                           "source_file": str_path, "source_location": None})
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "document",
+                    "source_file": str_path,
+                    "source_location": None,
+                }
+            )
 
     def _edge(src: str, tgt: str, relation: str) -> None:
-        edges.append({"source": src, "target": tgt, "relation": relation,
-                       "confidence": "EXTRACTED", "source_file": str_path,
-                       "source_location": None, "weight": 1.0})
+        edges.append(
+            {
+                "source": src,
+                "target": tgt,
+                "relation": relation,
+                "confidence": "EXTRACTED",
+                "source_file": str_path,
+                "source_location": None,
+                "weight": 1.0,
+            }
+        )
 
     for sheet_name in wb.sheetnames:
         ws = wb[sheet_name]
@@ -461,18 +595,28 @@ def _edge(src: str, tgt: str, relation: str) -> None:
                 if ref:
                     try:
                         from openpyxl.utils import range_boundaries
+
                         min_col, min_row, max_col, _ = range_boundaries(ref)
-                        header_row = list(ws.iter_rows(min_row=min_row, max_row=min_row,
-                                                       min_col=min_col, max_col=max_col,
-                                                       values_only=True))
+                        header_row = list(
+                            ws.iter_rows(
+                                min_row=min_row,
+                                max_row=min_row,
+                                min_col=min_col,
+                                max_col=max_col,
+                                values_only=True,
+                            )
+                        )
                         if header_row:
                             for col_name in header_row[0]:
                                 if col_name:
                                     col_nid = _nid(stem, tbl.name, str(col_name))
                                     _add(col_nid, str(col_name))
                                     _edge(tbl_nid, col_nid, "contains")
-                    except Exception:
-                        pass
+                    except Exception as exc:
+                        print(
+                            f"[graphify] warning: could not read spreadsheet table columns: {exc}",
+                            file=sys.stderr,
+                        )
         else:
             # Fallback: first non-empty row as column headers
             for row in ws.iter_rows(max_row=1, values_only=True):
@@ -485,8 +629,8 @@ def _edge(src: str, tgt: str, relation: str) -> None:
 
     try:
         wb.close()
-    except Exception:
-        pass
+    except Exception as exc:
+        print(f"[graphify] warning: could not close workbook {path}: {exc}", file=sys.stderr)
 
     return {"nodes": nodes, "edges": edges}
 
@@ -511,6 +655,7 @@ def convert_office_file(path: Path, out_dir: Path) -> Path | None:
     out_dir.mkdir(parents=True, exist_ok=True)
     # Use a stable name derived from the original path to avoid collisions
     import hashlib
+
     name_hash = hashlib.sha256(str(path.resolve()).encode()).hexdigest()[:8]
     out_path = out_dir / f"{path.stem}_{name_hash}.md"
     out_path.write_text(
@@ -536,33 +681,64 @@ def count_words(path: Path) -> int:
 
 # Directory names to always skip - venvs, caches, build artifacts, deps
 _SKIP_DIRS = {
-    "venv", ".venv", "env", ".env",
-    "node_modules", "__pycache__", ".git",
-    "dist", "build", "target", "out",
-    "site-packages", "lib64",
-    ".pytest_cache", ".mypy_cache", ".ruff_cache",
-    ".tox", ".eggs", "*.egg-info",
+    "venv",
+    ".venv",
+    "env",
+    ".env",
+    "node_modules",
+    "__pycache__",
+    ".git",
+    "dist",
+    "build",
+    "target",
+    "out",
+    "site-packages",
+    "lib64",
+    ".pytest_cache",
+    ".mypy_cache",
+    ".ruff_cache",
+    ".tox",
+    ".eggs",
+    "*.egg-info",
     "graphify-out",  # never treat own output as source input (#524)
     # Coverage/test-artefact dirs — generated, never architecturally meaningful
-    "coverage", "lcov-report",              # Vitest/Istanbul/nyc HTML reports (#870)
-    "visual-tests", "visual-test",          # Playwright/visual-regression bundles (#869)
-    "__snapshots__", "snapshots",           # Jest/Vitest snapshot dirs
-    "storybook-static",                     # Storybook production build output
-    "dist-protected",                       # Protected dist variants (same noise as dist)
+    "coverage",
+    "lcov-report",  # Vitest/Istanbul/nyc HTML reports (#870)
+    "visual-tests",
+    "visual-test",  # Playwright/visual-regression bundles (#869)
+    "__snapshots__",
+    "snapshots",  # Jest/Vitest snapshot dirs
+    "storybook-static",  # Storybook production build output
+    "dist-protected",  # Protected dist variants (same noise as dist)
     # Framework cache/build dirs — generated, never architecturally meaningful (#873)
-    ".next", ".nuxt", ".turbo", ".angular",
-    ".idea", ".cache", ".parcel-cache", ".svelte-kit", ".terraform", ".serverless",
+    ".next",
+    ".nuxt",
+    ".turbo",
+    ".angular",
+    ".idea",
+    ".cache",
+    ".parcel-cache",
+    ".svelte-kit",
+    ".terraform",
+    ".serverless",
     ".graphify",  # graphify's own extraction cache — never index self-generated data
     ".worktrees",  # git worktree convention (#947) — sibling checkouts, always redundant
 }
 
 # Large generated files that are never useful to extract
 _SKIP_FILES = {
-    "package-lock.json", "yarn.lock", "pnpm-lock.yaml",
-    "Cargo.lock", "poetry.lock", "Gemfile.lock",
-    "composer.lock", "go.sum", "go.work.sum",
+    "package-lock.json",
+    "yarn.lock",
+    "pnpm-lock.yaml",
+    "Cargo.lock",
+    "poetry.lock",
+    "Gemfile.lock",
+    "composer.lock",
+    "go.sum",
+    "go.work.sum",
 }
 
+
 def _is_noise_dir(part: str, parent: "Path | None" = None) -> bool:
     """Return True if this directory name looks like a venv, cache, or dep dir."""
     if part in _SKIP_DIRS:
@@ -671,6 +847,7 @@ def _is_ignored(path: Path, root: Path, patterns: list[tuple[Path, str]]) -> boo
 
     def _eval(target: Path) -> bool:
         """Apply last-match-wins to a single target path."""
+
         def _matches(rel: str, p: str, anchored: bool) -> bool:
             if anchored:
                 return fnmatch.fnmatch(rel, p)
@@ -682,7 +859,7 @@ def _matches(rel: str, p: str, anchored: bool) -> bool:
             for i, part in enumerate(parts):
                 if fnmatch.fnmatch(part, p):
                     return True
-                if fnmatch.fnmatch("/".join(parts[:i + 1]), p):
+                if fnmatch.fnmatch("/".join(parts[: i + 1]), p):
                     return True
             return False
 
@@ -782,7 +959,7 @@ def _matches(rel: str, p: str, anchored: bool) -> bool:
         for i, part in enumerate(parts):
             if fnmatch.fnmatch(part, p):
                 return True
-            if fnmatch.fnmatch("/".join(parts[:i + 1]), p):
+            if fnmatch.fnmatch("/".join(parts[: i + 1]), p):
                 return True
         return False
 
@@ -866,7 +1043,13 @@ def _auto_follow_symlinks(root: Path) -> bool:
     return False
 
 
-def detect(root: Path, *, follow_symlinks: bool | None = None, google_workspace: bool | None = None, extra_excludes: list[str] | None = None) -> dict:
+def detect(
+    root: Path,
+    *,
+    follow_symlinks: bool | None = None,
+    google_workspace: bool | None = None,
+    extra_excludes: list[str] | None = None,
+) -> dict:
     root = root.resolve()
     if follow_symlinks is None:
         follow_symlinks = _auto_follow_symlinks(root)
@@ -889,8 +1072,6 @@ def detect(root: Path, *, follow_symlinks: bool | None = None, google_workspace:
             line = _parse_gitignore_line(pat)
             if line:
                 ignore_patterns.append((root, line))
-    include_patterns = _load_graphifyinclude(root)
-
     # Always include graphify-out/memory/ - query results filed back into the graph
     memory_dir = root / "graphify-out" / "memory"
     scan_paths = [root]
@@ -918,7 +1099,8 @@ def detect(root: Path, *, follow_symlinks: bool | None = None, google_workspace:
                 # pruning so negated files inside can still be reached.
                 has_negation = any(p.startswith("!") for _, p in ignore_patterns)
                 dirnames[:] = [
-                    d for d in dirnames
+                    d
+                    for d in dirnames
                     if not _is_noise_dir(d, dp)
                     and (has_negation or not _is_ignored(dp / d, root, ignore_patterns))
                 ]
@@ -949,13 +1131,14 @@ def detect(root: Path, *, follow_symlinks: bool | None = None, google_workspace:
             if p.suffix.lower() in GOOGLE_WORKSPACE_EXTENSIONS:
                 if not google_workspace:
                     skipped_sensitive.append(
-                        str(p)
-                        + " [Google Workspace shortcut skipped - pass --google-workspace "
+                        str(p) + " [Google Workspace shortcut skipped - pass --google-workspace "
                         "or set GRAPHIFY_GOOGLE_WORKSPACE=1]"
                     )
                     continue
                 try:
-                    md_path = convert_google_workspace_file(p, converted_dir, xlsx_to_markdown=xlsx_to_markdown)
+                    md_path = convert_google_workspace_file(
+                        p, converted_dir, xlsx_to_markdown=xlsx_to_markdown
+                    )
                 except Exception as exc:
                     skipped_sensitive.append(str(p) + f" [Google Workspace export failed: {exc}]")
                     continue
@@ -965,7 +1148,9 @@ def detect(root: Path, *, follow_symlinks: bool | None = None, google_workspace:
                     files[ftype].append(str(md_path))
                     total_words += count_words(md_path)
                 else:
-                    skipped_sensitive.append(str(p) + " [Google Workspace export produced no readable text]")
+                    skipped_sensitive.append(
+                        str(p) + " [Google Workspace export produced no readable text]"
+                    )
                 continue
             # Office files: convert to markdown sidecar so subagents can read them
             if p.suffix.lower() in OFFICE_EXTENSIONS:
@@ -977,7 +1162,9 @@ def detect(root: Path, *, follow_symlinks: bool | None = None, google_workspace:
                     total_words += count_words(md_path)
                 else:
                     # Conversion failed (library not installed) - skip with note
-                    skipped_sensitive.append(str(p) + " [office conversion failed - pip install graphifyy[office]]")
+                    skipped_sensitive.append(
+                        str(p) + " [office conversion failed - pip install graphifyy[office]]"
+                    )
                 continue
             files[ftype].append(str(p))
             if ftype != FileType.VIDEO:
@@ -1015,6 +1202,7 @@ def detect(root: Path, *, follow_symlinks: bool | None = None, google_workspace:
 def _md5_file(path: Path) -> str:
     """MD5 of file contents streamed in 64KB chunks — for change detection only."""
     import hashlib as _hl
+
     h = _hl.md5(usedforsecurity=False)
     try:
         with path.open("rb") as f:
@@ -1092,7 +1280,9 @@ def _normalise_entry(entry):
                 entry["semantic_hash"] = h
             else:
                 # Preserve semantic_hash only when content is unchanged
-                entry["semantic_hash"] = prev.get("semantic_hash", "") if h == prev.get("ast_hash", "") else ""
+                entry["semantic_hash"] = (
+                    prev.get("semantic_hash", "") if h == prev.get("ast_hash", "") else ""
+                )
             manifest[f] = entry
     Path(manifest_path).parent.mkdir(parents=True, exist_ok=True)
     Path(manifest_path).write_text(json.dumps(manifest, indent=2), encoding="utf-8")
@@ -1130,7 +1320,12 @@ def detect_incremental(
     incremental runs. ``None`` (default) means auto-detect: ``True`` when ``root``
     contains at least one direct symlinked child, ``False`` otherwise.
     """
-    full = detect(root, follow_symlinks=follow_symlinks, google_workspace=google_workspace, extra_excludes=extra_excludes)
+    full = detect(
+        root,
+        follow_symlinks=follow_symlinks,
+        google_workspace=google_workspace,
+        extra_excludes=extra_excludes,
+    )
     manifest = load_manifest(manifest_path)
 
     if not manifest:
@@ -1158,7 +1353,11 @@ def detect_incremental(
             elif isinstance(stored, dict):
                 # Normalise legacy {mtime, hash} to new schema
                 if "hash" in stored and "ast_hash" not in stored:
-                    stored = {"mtime": stored.get("mtime", 0), "ast_hash": stored["hash"], "semantic_hash": ""}
+                    stored = {
+                        "mtime": stored.get("mtime", 0),
+                        "ast_hash": stored["hash"],
+                        "semantic_hash": "",
+                    }
                 hash_key = "semantic_hash" if kind == "semantic" else "ast_hash"
                 stored_hash = stored.get(hash_key, "")
                 # Missing semantic_hash means update ran but extract hasn't — always re-extract
diff --git a/graphify/export.py b/graphify/export.py
index ff127c0b2..b8726092e 100644
--- a/graphify/export.py
+++ b/graphify/export.py
@@ -7,9 +7,11 @@
 import os
 import re
 import shutil
+import sys
 from collections import Counter
 from datetime import date
 from pathlib import Path
+from typing import Any, cast
 import networkx as nx
 from networkx.readwrite import json_graph
 from graphify.security import sanitize_label
@@ -54,13 +56,18 @@ def backup_if_protected(out_dir: Path) -> "Path | None":
         try:
             labels = json.loads(labels_file.read_text(encoding="utf-8"))
             is_curated = any(v != f"Community {k}" for k, v in labels.items())
-        except Exception:
-            pass
+        except Exception as exc:
+            print(
+                f"[graphify] warning: could not read community labels for backup check: {exc}",
+                file=sys.stderr,
+            )
 
     if not is_semantic and not is_curated:
         return None
 
-    reason = "+".join(filter(None, ["semantic" if is_semantic else "", "curated" if is_curated else ""]))
+    reason = "+".join(
+        filter(None, ["semantic" if is_semantic else "", "curated" if is_curated else ""])
+    )
     today = date.today().isoformat()
     backup_dir = out / today
     graph_src = out / "graph.json"
@@ -83,16 +90,19 @@ def backup_if_protected(out_dir: Path) -> "Path | None":
                 try:
                     shutil.copy2(src, backup_dir / name)
                     copied += 1
-                except Exception:
-                    pass
+                except Exception as exc:
+                    print(f"[graphify] warning: could not back up {src}: {exc}", file=sys.stderr)
         if copied:
             print(f"[graphify] backed up {reason} graph ({copied} files) -> {backup_dir.name}/")
         return backup_dir
     except Exception as exc:
-        import sys
-        print(f"[graphify] warning: backup failed ({exc}) - continuing with overwrite", file=sys.stderr)
+        print(
+            f"[graphify] warning: backup failed ({exc}) - continuing with overwrite",
+            file=sys.stderr,
+        )
         return None
 
+
 def _obsidian_tag(name: str) -> str:
     """Sanitize a community name for use as an Obsidian tag.
 
@@ -104,6 +114,7 @@ def _obsidian_tag(name: str) -> str:
 
 def _strip_diacritics(text: str) -> str:
     import unicodedata
+
     nfkd = unicodedata.normalize("NFKD", text)
     return "".join(c for c in nfkd if not unicodedata.combining(c))
 
@@ -147,8 +158,16 @@ def _yaml_str(s: str) -> str:
 
 
 COMMUNITY_COLORS = [
-    "#4E79A7", "#F28E2B", "#E15759", "#76B7B2", "#59A14F",
-    "#EDC948", "#B07AA1", "#FF9DA7", "#9C755F", "#BAB0AC",
+    "#4E79A7",
+    "#F28E2B",
+    "#E15759",
+    "#76B7B2",
+    "#59A14F",
+    "#EDC948",
+    "#B07AA1",
+    "#FF9DA7",
+    "#9C755F",
+    "#BAB0AC",
 ]
 
 MAX_NODES_FOR_VIZ = 5_000
@@ -161,6 +180,7 @@ def _viz_node_limit() -> int:
     Set to 0 to disable HTML viz unconditionally (useful for CI runners).
     """
     import os
+
     raw = os.environ.get("GRAPHIFY_VIZ_NODE_LIMIT")
     if raw is None or not raw.strip():
         return MAX_NODES_FOR_VIZ
@@ -472,35 +492,46 @@ def attach_hyperedges(G: nx.Graph, hyperedges: list) -> None:
 def _git_head() -> str | None:
     """Return the current git HEAD commit hash, or None if not in a git repo."""
     import subprocess as _sp
+
     try:
-        r = _sp.run(["git", "rev-parse", "HEAD"], capture_output=True, text=True, timeout=3)
+        r = _sp.run(["git", "rev-parse", "HEAD"], capture_output=True, text=True, timeout=3)  # nosec B603 B607
         return r.stdout.strip() if r.returncode == 0 else None
     except Exception:
         return None
 
 
-def to_json(G: nx.Graph, communities: dict[int, list[str]], output_path: str, *, force: bool = False, built_at_commit: str | None = None) -> bool:
+def to_json(
+    G: nx.Graph,
+    communities: dict[int, list[str]],
+    output_path: str,
+    *,
+    force: bool = False,
+    built_at_commit: str | None = None,
+) -> bool:
     # Safety check: refuse to silently shrink an existing graph (#479)
     existing_path = Path(output_path)
     if not force and existing_path.exists():
         try:
             from graphify.security import check_graph_file_size_cap
+
             check_graph_file_size_cap(existing_path)
             existing_data = json.loads(existing_path.read_text(encoding="utf-8"))
             existing_n = len(existing_data.get("nodes", []))
             new_n = G.number_of_nodes()
             if new_n < existing_n:
-                import sys as _sys
                 print(
                     f"[graphify] WARNING: new graph has {new_n} nodes but existing "
                     f"graph.json has {existing_n}. Refusing to overwrite — you may be "
                     f"missing chunk files from a previous session. "
                     f"Pass force=True to override.",
-                    file=_sys.stderr,
+                    file=sys.stderr,
                 )
                 return False
-        except Exception:
-            pass  # unreadable existing file — proceed with write
+        except Exception as exc:
+            print(
+                f"[graphify] warning: could not inspect existing graph before write: {exc}",
+                file=sys.stderr,
+            )
 
     node_community = _node_community_map(communities)
     try:
@@ -541,8 +572,7 @@ def prune_dangling_edges(graph_data: dict) -> tuple[dict, int]:
     links_key = "links" if "links" in graph_data else "edges"
     before = len(graph_data[links_key])
     graph_data[links_key] = [
-        e for e in graph_data[links_key]
-        if e["source"] in node_ids and e["target"] in node_ids
+        e for e in graph_data[links_key] if e["source"] in node_ids and e["target"] in node_ids
     ]
     return graph_data, before - len(graph_data[links_key])
 
@@ -566,12 +596,7 @@ def _cypher_escape(s: str) -> str:
     """
     # First normalise: drop NUL and other C0 control chars except tab.
     s = "".join(ch for ch in s if ch >= " " or ch == "\t")
-    return (
-        s.replace("\\", "\\\\")
-         .replace("'", "\\'")
-         .replace("\n", "\\n")
-         .replace("\r", "\\r")
-    )
+    return s.replace("\\", "\\\\").replace("'", "\\'").replace("\n", "\\n").replace("\r", "\\r")
 
 
 # Restrict identifier-position values (labels and relationship types are NOT
@@ -645,8 +670,13 @@ def to_html(
             # Build aggregated community meta-graph
             from collections import Counter as _Counter
             import networkx as _nx
-            print(f"Graph has {G.number_of_nodes()} nodes (above {limit} limit). Building aggregated community view...")
-            node_to_community = {nid: cid for cid, members in communities.items() for nid in members}
+
+            print(
+                f"Graph has {G.number_of_nodes()} nodes (above {limit} limit). Building aggregated community view..."
+            )
+            node_to_community = {
+                nid: cid for cid, members in communities.items() for nid in members
+            }
             meta = _nx.Graph()
             for cid, members in communities.items():
                 meta.add_node(str(cid), label=(community_labels or {}).get(cid, f"Community {cid}"))
@@ -656,8 +686,13 @@ def to_html(
                 if cu is not None and cv is not None and cu != cv:
                     edge_counts[(min(cu, cv), max(cu, cv))] += 1
             for (cu, cv), w in edge_counts.items():
-                meta.add_edge(str(cu), str(cv), weight=w,
-                              relation=f"{w} cross-community edges", confidence="AGGREGATED")
+                meta.add_edge(
+                    str(cu),
+                    str(cv),
+                    weight=w,
+                    relation=f"{w} cross-community edges",
+                    confidence="AGGREGATED",
+                )
             if meta.number_of_nodes() <= 1:
                 print("Single community - aggregated view not useful. Skipping graph.html.")
                 return
@@ -666,10 +701,11 @@ def to_html(
             # Remap hyperedges from semantic node IDs to community IDs
             raw_hyperedges = G.graph.get("hyperedges", [])
             if raw_hyperedges:
-                remapped = []
+                remapped: list[dict[str, Any]] = []
                 for he in raw_hyperedges:
                     he_members = he.get("nodes") or he.get("members") or []
-                    comm_ids, seen = [], set()
+                    comm_ids: list[str] = []
+                    seen: set[str] = set()
                     for nid in he_members:
                         c = node_to_community.get(nid)
                         if c is None:
@@ -681,15 +717,24 @@ def to_html(
                         comm_ids.append(s)
                     if len(comm_ids) < 2:
                         continue
-                    remapped.append({
-                        "id": he.get("id", ""),
-                        "label": he.get("label") or he.get("relation", "").replace("_", " "),
-                        "nodes": comm_ids,
-                    })
+                    remapped.append(
+                        {
+                            "id": he.get("id", ""),
+                            "label": he.get("label") or he.get("relation", "").replace("_", " "),
+                            "nodes": comm_ids,
+                        }
+                    )
                 meta.graph["hyperedges"] = remapped
-            to_html(meta, meta_communities, output_path,
-                    community_labels=community_labels, member_counts=mc)
-            print(f"graph.html written (aggregated: {meta.number_of_nodes()} community nodes, {meta.number_of_edges()} cross-community edges)")
+            to_html(
+                meta,
+                meta_communities,
+                output_path,
+                community_labels=community_labels,
+                member_counts=mc,
+            )
+            print(
+                f"graph.html written (aggregated: {meta.number_of_nodes()} community nodes, {meta.number_of_edges()} cross-community edges)"
+            )
             print("Tip: run with --obsidian for full node-level detail.")
             return
         raise ValueError(
@@ -718,19 +763,27 @@ def to_html(
             size = 10 + 30 * (deg / max_deg)
             # Only show label for high-degree nodes by default; others show on hover
             font_size = 12 if deg >= max_deg * 0.15 else 0
-        vis_nodes.append({
-            "id": node_id,
-            "label": label,
-            "color": {"background": color, "border": color, "highlight": {"background": "#ffffff", "border": color}},
-            "size": round(size, 1),
-            "font": {"size": font_size, "color": "#ffffff"},
-            "title": _html.escape(label),
-            "community": cid,
-            "community_name": sanitize_label((community_labels or {}).get(cid, f"Community {cid}")),
-            "source_file": sanitize_label(str(data.get("source_file") or "")),
-            "file_type": data.get("file_type", ""),
-            "degree": deg,
-        })
+        vis_nodes.append(
+            {
+                "id": node_id,
+                "label": label,
+                "color": {
+                    "background": color,
+                    "border": color,
+                    "highlight": {"background": "#ffffff", "border": color},
+                },
+                "size": round(size, 1),
+                "font": {"size": font_size, "color": "#ffffff"},
+                "title": _html.escape(label),
+                "community": cid,
+                "community_name": sanitize_label(
+                    (community_labels or {}).get(cid, f"Community {cid}")
+                ),
+                "source_file": sanitize_label(str(data.get("source_file") or "")),
+                "file_type": data.get("file_type", ""),
+                "degree": deg,
+            }
+        )
 
     # Build edges list. Restore original edge direction from _src/_tgt
     # (stashed by build.py for exactly this reason): undirected NetworkX
@@ -742,23 +795,29 @@ def to_html(
         relation = data.get("relation", "")
         true_src = data.get("_src", u)
         true_tgt = data.get("_tgt", v)
-        vis_edges.append({
-            "from": true_src,
-            "to": true_tgt,
-            "label": relation,
-            "title": _html.escape(f"{relation} [{confidence}]"),
-            "dashes": confidence != "EXTRACTED",
-            "width": 2 if confidence == "EXTRACTED" else 1,
-            "color": {"opacity": 0.7 if confidence == "EXTRACTED" else 0.35},
-            "confidence": confidence,
-        })
+        vis_edges.append(
+            {
+                "from": true_src,
+                "to": true_tgt,
+                "label": relation,
+                "title": _html.escape(f"{relation} [{confidence}]"),
+                "dashes": confidence != "EXTRACTED",
+                "width": 2 if confidence == "EXTRACTED" else 1,
+                "color": {"opacity": 0.7 if confidence == "EXTRACTED" else 0.35},
+                "confidence": confidence,
+            }
+        )
 
     # Build community legend data
     legend_data = []
     for cid in sorted((community_labels or {}).keys()):
         color = COMMUNITY_COLORS[cid % len(COMMUNITY_COLORS)]
         lbl = _html.escape(sanitize_label((community_labels or {}).get(cid, f"Community {cid}")))
-        n = member_counts.get(cid, len(communities.get(cid, []))) if member_counts else len(communities.get(cid, []))
+        n = (
+            member_counts.get(cid, len(communities.get(cid, [])))
+            if member_counts
+            else len(communities.get(cid, []))
+        )
         legend_data.append({"cid": cid, "color": color, "label": lbl, "count": n})
 
     # Escape </script> sequences so embedded JSON cannot break out of the script tag
@@ -837,7 +896,11 @@ def to_obsidian(
     # Map node_id → safe filename so wikilinks stay consistent.
     # Deduplicate: if two nodes produce the same filename, append a numeric suffix.
     def safe_name(label: str) -> str:
-        cleaned = re.sub(r'[\\/*?:"<>|#^[\]]', "", label.replace("\r\n", " ").replace("\r", " ").replace("\n", " ")).strip()
+        cleaned = re.sub(
+            r'[\\/*?:"<>|#^[\]]',
+            "",
+            label.replace("\r\n", " ").replace("\r", " ").replace("\n", " "),
+        ).strip()
         # Strip trailing .md/.mdx/.markdown so "CLAUDE.md" doesn't become "CLAUDE.md.md"
         cleaned = re.sub(r"\.(md|mdx|qmd|markdown)$", "", cleaned, flags=re.IGNORECASE)
         return cleaned or "unnamed"
@@ -975,8 +1038,10 @@ def _community_reach(node_id: str) -> int:
         # Cohesion + member count summary
         if coh_value is not None:
             cohesion_desc = (
-                "tightly connected" if coh_value >= 0.7
-                else "moderately connected" if coh_value >= 0.4
+                "tightly connected"
+                if coh_value >= 0.7
+                else "moderately connected"
+                if coh_value >= 0.4
                 else "loosely connected"
             )
             lines.append(f"**Cohesion:** {coh_value:.2f} - {cohesion_desc}")
@@ -1019,7 +1084,9 @@ def _community_reach(node_id: str) -> int:
                     else f"Community {other_cid}"
                 )
                 other_safe = safe_name(other_name)
-                lines.append(f"- {edge_count} edge{'s' if edge_count != 1 else ''} to [[_COMMUNITY_{other_safe}]]")
+                lines.append(
+                    f"- {edge_count} edge{'s' if edge_count != 1 else ''} to [[_COMMUNITY_{other_safe}]]"
+                )
             lines.append("")
 
         # Top bridge nodes - highest degree nodes that connect to other communities
@@ -1051,7 +1118,10 @@ def _community_reach(node_id: str) -> int:
         "colorGroups": [
             {
                 "query": f"tag:#community/{label.replace(' ', '_')}",
-                "color": {"a": 1, "rgb": int(COMMUNITY_COLORS[cid % len(COMMUNITY_COLORS)].lstrip('#'), 16)}
+                "color": {
+                    "a": 1,
+                    "rgb": int(COMMUNITY_COLORS[cid % len(COMMUNITY_COLORS)].lstrip("#"), 16),
+                },
             }
             for cid, label in sorted((community_labels or {}).items())
         ]
@@ -1078,7 +1148,11 @@ def to_canvas(
     CANVAS_COLORS = ["1", "2", "3", "4", "5", "6"]  # red, orange, yellow, green, cyan, purple
 
     def safe_name(label: str) -> str:
-        cleaned = re.sub(r'[\\/*?:"<>|#^[\]]', "", label.replace("\r\n", " ").replace("\r", " ").replace("\n", " ")).strip()
+        cleaned = re.sub(
+            r'[\\/*?:"<>|#^[\]]',
+            "",
+            label.replace("\r\n", " ").replace("\r", " ").replace("\n", " "),
+        ).strip()
         cleaned = re.sub(r"\.(md|mdx|qmd|markdown)$", "", cleaned, flags=re.IGNORECASE)
         return cleaned or "unnamed"
 
@@ -1104,8 +1178,6 @@ def safe_name(label: str) -> str:
 
     # Lay out communities in a grid
     gap = 80
-    group_x_offsets: list[int] = []
-    group_y_offsets: list[int] = []
 
     # Precompute group sizes so we can calculate offsets
     sorted_cids = sorted(communities.keys())
@@ -1168,16 +1240,18 @@ def safe_name(label: str) -> str:
         canvas_color = CANVAS_COLORS[idx % len(CANVAS_COLORS)]
 
         # Group node
-        canvas_nodes.append({
-            "id": f"g{cid}",
-            "type": "group",
-            "label": community_name,
-            "x": gx,
-            "y": gy,
-            "width": gw,
-            "height": gh,
-            "color": canvas_color,
-        })
+        canvas_nodes.append(
+            {
+                "id": f"g{cid}",
+                "type": "group",
+                "label": community_name,
+                "x": gx,
+                "y": gy,
+                "width": gw,
+                "height": gh,
+                "color": canvas_color,
+            }
+        )
 
         # Node cards inside the group - rows of 3
         sorted_members = sorted(members, key=lambda n: G.nodes[n].get("label", n))
@@ -1187,15 +1261,17 @@ def safe_name(label: str) -> str:
             nx_x = gx + 20 + col * (180 + 20)
             nx_y = gy + 80 + row * (60 + 20)
             fname = node_filenames.get(node_id, safe_name(G.nodes[node_id].get("label", node_id)))
-            canvas_nodes.append({
-                "id": f"n_{node_id}",
-                "type": "file",
-                "file": f"{fname}.md",
-                "x": nx_x,
-                "y": nx_y,
-                "width": 180,
-                "height": 60,
-            })
+            canvas_nodes.append(
+                {
+                    "id": f"n_{node_id}",
+                    "type": "file",
+                    "file": f"{fname}.md",
+                    "x": nx_x,
+                    "y": nx_y,
+                    "width": 180,
+                    "height": 60,
+                }
+            )
 
     # Generate edges - only between nodes both in canvas, cap at 200 highest-weight
     all_edges_weighted: list[tuple[float, str, str, str]] = []
@@ -1209,12 +1285,14 @@ def safe_name(label: str) -> str:
 
     all_edges_weighted.sort(key=lambda x: -x[0])
     for weight, u, v, label in all_edges_weighted[:200]:
-        canvas_edges.append({
-            "id": f"e_{u}_{v}",
-            "fromNode": f"n_{u}",
-            "toNode": f"n_{v}",
-            "label": label,
-        })
+        canvas_edges.append(
+            {
+                "id": f"e_{u}_{v}",
+                "fromNode": f"n_{u}",
+                "toNode": f"n_{v}",
+                "label": label,
+            }
+        )
 
     canvas_data = {"nodes": canvas_nodes, "edges": canvas_edges}
     Path(output_path).write_text(json.dumps(canvas_data, indent=2), encoding="utf-8")  # nosec
@@ -1237,14 +1315,15 @@ def push_to_neo4j(
     try:
         from neo4j import GraphDatabase
     except ImportError as e:
-        raise ImportError(
-            "neo4j driver not installed. Run: pip install neo4j"
-        ) from e
+        raise ImportError("neo4j driver not installed. Run: pip install neo4j") from e
 
     node_community = _node_community_map(communities) if communities else {}
 
     def _safe_rel(relation: str) -> str:
-        return re.sub(r"[^A-Z0-9_]", "_", relation.upper().replace(" ", "_").replace("-", "_")) or "RELATED_TO"
+        return (
+            re.sub(r"[^A-Z0-9_]", "_", relation.upper().replace(" ", "_").replace("-", "_"))
+            or "RELATED_TO"
+        )
 
     def _safe_label(label: str) -> str:
         """Sanitize a Neo4j node label to prevent Cypher injection."""
@@ -1256,6 +1335,7 @@ def _safe_label(label: str) -> str:
     edges_pushed = 0
 
     with driver.session() as session:
+        session_any = cast(Any, session)
         for node_id, data in G.nodes(data=True):
             props = {k: v for k, v in data.items() if isinstance(v, (str, int, float, bool))}
             props["id"] = node_id
@@ -1263,7 +1343,7 @@ def _safe_label(label: str) -> str:
             if cid is not None:
                 props["community"] = cid
             ftype = _safe_label(data.get("file_type", "Entity").capitalize())
-            session.run(
+            session_any.run(
                 f"MERGE (n:{ftype} {{id: $id}}) SET n += $props",
                 id=node_id,
                 props=props,
@@ -1273,7 +1353,7 @@ def _safe_label(label: str) -> str:
         for u, v, data in G.edges(data=True):
             rel = _safe_rel(data.get("relation", "RELATED_TO"))
             props = {k: v for k, v in data.items() if isinstance(v, (str, int, float, bool))}
-            session.run(
+            session_any.run(
                 f"MATCH (a {{id: $src}}), (b {{id: $tgt}}) "
                 f"MERGE (a)-[r:{rel}]->(b) SET r += $props",
                 src=u,
@@ -1319,6 +1399,7 @@ def to_svg(
     """
     try:
         import matplotlib
+
         matplotlib.use("Agg")
         import matplotlib.pyplot as plt
         import matplotlib.patches as mpatches
@@ -1336,7 +1417,9 @@ def to_svg(
     degree = dict(G.degree())
     max_deg = max(degree.values(), default=1) or 1
 
-    node_colors = [COMMUNITY_COLORS[node_community.get(n, 0) % len(COMMUNITY_COLORS)] for n in G.nodes()]
+    node_colors = [
+        COMMUNITY_COLORS[node_community.get(n, 0) % len(COMMUNITY_COLORS)] for n in G.nodes()
+    ]
     node_sizes = [300 + 1200 * (degree.get(n, 1) / max_deg) for n in G.nodes()]
 
     # Draw edges - dashed for non-EXTRACTED
@@ -1346,14 +1429,25 @@ def to_svg(
         alpha = 0.6 if conf == "EXTRACTED" else 0.3
         x0, y0 = pos[u]
         x1, y1 = pos[v]
-        ax.plot([x0, x1], [y0, y1], color="#aaaaaa", linewidth=0.8,
-                linestyle=style, alpha=alpha, zorder=1)
+        ax.plot(
+            [x0, x1],
+            [y0, y1],
+            color="#aaaaaa",
+            linewidth=0.8,
+            linestyle=style,
+            alpha=alpha,
+            zorder=1,
+        )
 
-    nx.draw_networkx_nodes(G, pos, ax=ax, node_color=node_colors,
-                           node_size=node_sizes, alpha=0.9)
-    nx.draw_networkx_labels(G, pos, ax=ax,
-                            labels={n: G.nodes[n].get("label", n) for n in G.nodes()},
-                            font_size=7, font_color="white")
+    nx.draw_networkx_nodes(G, pos, ax=ax, node_color=node_colors, node_size=node_sizes, alpha=0.9)
+    nx.draw_networkx_labels(
+        G,
+        pos,
+        ax=ax,
+        labels={n: G.nodes[n].get("label", n) for n in G.nodes()},
+        font_size=7,
+        font_color="white",
+    )
 
     # Legend
     if community_labels:
@@ -1364,10 +1458,15 @@ def to_svg(
             )
             for cid, label in sorted(community_labels.items())
         ]
-        ax.legend(handles=patches, loc="upper left", framealpha=0.7,
-                  facecolor="#2a2a4e", labelcolor="white", fontsize=8)
+        ax.legend(
+            handles=patches,
+            loc="upper left",
+            framealpha=0.7,
+            facecolor="#2a2a4e",
+            labelcolor="white",
+            fontsize=8,
+        )
 
     plt.tight_layout()
-    plt.savefig(output_path, format="svg", bbox_inches="tight",
-                facecolor=fig.get_facecolor())
+    plt.savefig(output_path, format="svg", bbox_inches="tight", facecolor=fig.get_facecolor())
     plt.close(fig)
diff --git a/graphify/extract.py b/graphify/extract.py
index 1d396a697..3a7222b66 100644
--- a/graphify/extract.py
+++ b/graphify/extract.py
@@ -1,44 +1,123 @@
 """Deterministic structural extraction from source code using tree-sitter. Outputs nodes+edges dicts."""
+
 from __future__ import annotations
 import importlib
 import json
+import logging
 import os
 import re
 import sys
 import unicodedata
 from dataclasses import dataclass, field
 from pathlib import Path
-from typing import Callable, Any
+from collections.abc import Iterator
+from typing import Any, Callable
 from .cache import load_cached, save_cached
 from .mcp_ingest import extract_mcp_config, is_mcp_config_path
 
 _RECURSION_LIMIT = 10_000
+_LOG = logging.getLogger(__name__)
 
 # Language built-in globals that AST may classify as call targets when used as
 # constructors or coercion functions (e.g. String(x), Number(x), Boolean(x)).
 # Without this filter they become god-nodes accumulating spurious edges from
 # every call site. Filter applied at same-file and cross-file resolution.
 # See issue #726.
-_LANGUAGE_BUILTIN_GLOBALS: frozenset[str] = frozenset({
-    # JavaScript / TypeScript ECMAScript built-ins
-    "String", "Number", "Boolean", "Object", "Array", "Symbol", "BigInt",
-    "Date", "RegExp", "Error", "TypeError", "RangeError", "SyntaxError",
-    "ReferenceError", "EvalError", "URIError",
-    "Promise", "Map", "Set", "WeakMap", "WeakSet", "JSON", "Math",
-    "Reflect", "Proxy", "Intl",
-    "parseInt", "parseFloat", "isNaN", "isFinite",
-    "encodeURIComponent", "decodeURIComponent", "encodeURI", "decodeURI",
-    # Browser / Node common globals
-    "URL", "URLSearchParams", "FormData", "Blob", "File",
-    "Headers", "Request", "Response", "AbortController", "AbortSignal",
-    "TextEncoder", "TextDecoder", "console",
-    # Python built-in callables
-    "str", "int", "float", "bool", "list", "dict", "set", "tuple", "bytes",
-    "len", "range", "enumerate", "zip", "map", "filter", "sum", "min", "max",
-    "print", "open", "isinstance", "type", "super", "sorted", "reversed",
-    "any", "all", "abs", "round", "next", "iter", "hash", "id", "repr",
-    "callable", "getattr", "setattr", "hasattr", "delattr", "vars", "dir",
-})
+_LANGUAGE_BUILTIN_GLOBALS: frozenset[str] = frozenset(
+    {
+        # JavaScript / TypeScript ECMAScript built-ins
+        "String",
+        "Number",
+        "Boolean",
+        "Object",
+        "Array",
+        "Symbol",
+        "BigInt",
+        "Date",
+        "RegExp",
+        "Error",
+        "TypeError",
+        "RangeError",
+        "SyntaxError",
+        "ReferenceError",
+        "EvalError",
+        "URIError",
+        "Promise",
+        "Map",
+        "Set",
+        "WeakMap",
+        "WeakSet",
+        "JSON",
+        "Math",
+        "Reflect",
+        "Proxy",
+        "Intl",
+        "parseInt",
+        "parseFloat",
+        "isNaN",
+        "isFinite",
+        "encodeURIComponent",
+        "decodeURIComponent",
+        "encodeURI",
+        "decodeURI",
+        # Browser / Node common globals
+        "URL",
+        "URLSearchParams",
+        "FormData",
+        "Blob",
+        "File",
+        "Headers",
+        "Request",
+        "Response",
+        "AbortController",
+        "AbortSignal",
+        "TextEncoder",
+        "TextDecoder",
+        "console",
+        # Python built-in callables
+        "str",
+        "int",
+        "float",
+        "bool",
+        "list",
+        "dict",
+        "set",
+        "tuple",
+        "bytes",
+        "len",
+        "range",
+        "enumerate",
+        "zip",
+        "map",
+        "filter",
+        "sum",
+        "min",
+        "max",
+        "print",
+        "open",
+        "isinstance",
+        "type",
+        "super",
+        "sorted",
+        "reversed",
+        "any",
+        "all",
+        "abs",
+        "round",
+        "next",
+        "iter",
+        "hash",
+        "id",
+        "repr",
+        "callable",
+        "getattr",
+        "setattr",
+        "hasattr",
+        "delattr",
+        "vars",
+        "dir",
+    }
+)
 
 
 def _raise_recursion_limit() -> None:
@@ -92,14 +171,33 @@ def _file_stem(path: Path) -> str:
 _JS_INDEX_FILES = ("index.ts", "index.tsx", "index.svelte", "index.js", "index.jsx", "index.mjs")
 
 
-SEMANTIC_RELATIONS = frozenset({
-    "inherits", "implements", "mixes_in", "embeds", "references",
-    "calls", "imports", "imports_from", "re_exports", "contains", "method",
-})
+SEMANTIC_RELATIONS = frozenset(
+    {
+        "inherits",
+        "implements",
+        "mixes_in",
+        "embeds",
+        "references",
+        "calls",
+        "imports",
+        "imports_from",
+        "re_exports",
+        "contains",
+        "method",
+    }
+)
 
-REFERENCE_CONTEXTS = frozenset({
-    "field", "parameter_type", "return_type", "generic_arg", "attribute", "value", "type",
-})
+REFERENCE_CONTEXTS = frozenset(
+    {
+        "field",
+        "parameter_type",
+        "return_type",
+        "generic_arg",
+        "attribute",
+        "value",
+        "type",
+    }
+)
 
 
 def _source_location(line: int | str | None) -> str | None:
@@ -173,9 +271,9 @@ def _strip_jsonc(text: str) -> str:
     """
     # Remove block and line comments while leaving string literals untouched.
     pattern = re.compile(
-        r'"(?:\\.|[^"\\])*"'    # double-quoted string (with escapes)
-        r"|/\*.*?\*/"           # /* block comment */
-        r"|//[^\n]*",           # // line comment
+        r'"(?:\\.|[^"\\])*"'  # double-quoted string (with escapes)
+        r"|/\*.*?\*/"  # /* block comment */
+        r"|//[^\n]*",  # // line comment
         re.DOTALL,
     )
 
@@ -205,7 +303,11 @@ def _read_tsconfig_aliases(tsconfig: Path, base_dir: Path, seen: set) -> dict[st
     try:
         raw = tsconfig.read_text(encoding="utf-8")
     except Exception as e:
-        print(f"  warning: could not read {tsconfig} ({type(e).__name__}: {e})", file=sys.stderr, flush=True)
+        print(
+            f"  warning: could not read {tsconfig} ({type(e).__name__}: {e})",
+            file=sys.stderr,
+            flush=True,
+        )
         return {}
     try:
         data = json.loads(raw)
@@ -213,10 +315,18 @@ def _read_tsconfig_aliases(tsconfig: Path, base_dir: Path, seen: set) -> dict[st
         try:
             data = json.loads(_strip_jsonc(raw))
         except json.JSONDecodeError as e:
-            print(f"  warning: failed to parse {tsconfig} as JSON/JSONC ({e.msg} at line {e.lineno} col {e.colno})", file=sys.stderr, flush=True)
+            print(
+                f"  warning: failed to parse {tsconfig} as JSON/JSONC ({e.msg} at line {e.lineno} col {e.colno})",
+                file=sys.stderr,
+                flush=True,
+            )
             return {}
     except Exception as e:
-        print(f"  warning: failed to parse {tsconfig} ({type(e).__name__}: {e})", file=sys.stderr, flush=True)
+        print(
+            f"  warning: failed to parse {tsconfig} ({type(e).__name__}: {e})",
+            file=sys.stderr,
+            flush=True,
+        )
         return {}
 
     aliases: dict[str, str] = {}
@@ -317,7 +427,8 @@ def _load_workspace_packages(start_dir: Path) -> dict[str, Path]:
                 continue
             try:
                 data = json.loads(manifest.read_text(encoding="utf-8"))
-            except Exception:
+            except Exception as exc:
+                _LOG.debug("could not read package manifest %s: %s", manifest, exc)
                 continue
             name = data.get("name")
             if isinstance(name, str) and name:
@@ -331,8 +442,8 @@ def _package_entry_candidates(package_dir: Path, subpath: str) -> list[Path]:
     manifest_data: dict[str, Any] = {}
     try:
         manifest_data = json.loads(manifest.read_text(encoding="utf-8"))
-    except Exception:
-        pass
+    except Exception as exc:
+        _LOG.debug("could not read package manifest %s: %s", manifest, exc)
 
     if subpath:
         return [package_dir / subpath]
@@ -366,7 +477,7 @@ def _resolve_workspace_import(raw: str, start_dir: Path) -> Path | None:
         if raw == package_name:
             subpath = ""
         elif raw.startswith(package_name + "/"):
-            subpath = raw[len(package_name) + 1:]
+            subpath = raw[len(package_name) + 1 :]
         else:
             continue
         for candidate in _package_entry_candidates(package_dir, subpath):
@@ -394,7 +505,7 @@ def _resolve_js_module_path(raw: str | Path, start_dir: Path | None = None) -> P
     aliases = _load_tsconfig_aliases(start_dir)
     for alias_prefix, alias_base in aliases.items():
         if raw == alias_prefix or raw.startswith(alias_prefix + "/"):
-            rest = raw[len(alias_prefix):].lstrip("/")
+            rest = raw[len(alias_prefix) :].lstrip("/")
             return _resolve_js_import_path(Path(os.path.normpath(Path(alias_base) / rest)))
 
     return _resolve_workspace_import(raw, start_dir)
@@ -402,10 +513,11 @@ def _resolve_js_module_path(raw: str | Path, start_dir: Path | None = None) -> P
 
 # ── LanguageConfig dataclass ─────────────────────────────────────────────────
 
+
 @dataclass
 class LanguageConfig:
-    ts_module: str                                   # e.g. "tree_sitter_python"
-    ts_language_fn: str = "language"                 # attr to call: e.g. tslang.language()
+    ts_module: str  # e.g. "tree_sitter_python"
+    ts_language_fn: str = "language"  # attr to call: e.g. tslang.language()
 
     class_types: frozenset = frozenset()
     function_types: frozenset = frozenset()
@@ -422,12 +534,12 @@ class LanguageConfig:
 
     # Body detection
     body_field: str = "body"
-    body_fallback_child_types: tuple = ()   # e.g. ("declaration_list", "compound_statement")
+    body_fallback_child_types: tuple = ()  # e.g. ("declaration_list", "compound_statement")
 
     # Call name extraction
-    call_function_field: str = "function"           # field on call node for callee
+    call_function_field: str = "function"  # field on call node for callee
     call_accessor_node_types: frozenset = frozenset()  # member/attribute nodes
-    call_accessor_field: str = "attribute"          # field on accessor for method name
+    call_accessor_field: str = "attribute"  # field on accessor for method name
 
     # Stop recursion at these types in walk_calls
     function_boundary_types: frozenset = frozenset()
@@ -447,22 +559,57 @@ class LanguageConfig:
 
 # ── Generic helpers ───────────────────────────────────────────────────────────
 
-def _read_text(node, source: bytes) -> str:
-    return source[node.start_byte:node.end_byte].decode("utf-8", errors="replace")
-
 
-_PYTHON_TYPE_CONTAINERS = frozenset({
-    "list", "dict", "set", "tuple", "frozenset", "type",
-    "List", "Dict", "Set", "Tuple", "FrozenSet", "Type",
-    "Optional", "Union", "Sequence", "Iterable", "Mapping", "MutableMapping",
-    "Iterator", "Callable", "Awaitable", "AsyncIterable", "AsyncIterator", "Coroutine",
-    "Generator", "AsyncGenerator", "ContextManager", "AsyncContextManager",
-    "Annotated", "ClassVar", "Final", "Literal", "Concatenate", "ParamSpec", "TypeVar",
-    "None", "Ellipsis",
-})
+def _read_text(node, source: bytes) -> str:
+    return source[node.start_byte : node.end_byte].decode("utf-8", errors="replace")
+
+
+_PYTHON_TYPE_CONTAINERS = frozenset(
+    {
+        "list",
+        "dict",
+        "set",
+        "tuple",
+        "frozenset",
+        "type",
+        "List",
+        "Dict",
+        "Set",
+        "Tuple",
+        "FrozenSet",
+        "Type",
+        "Optional",
+        "Union",
+        "Sequence",
+        "Iterable",
+        "Mapping",
+        "MutableMapping",
+        "Iterator",
+        "Callable",
+        "Awaitable",
+        "AsyncIterable",
+        "AsyncIterator",
+        "Coroutine",
+        "Generator",
+        "AsyncGenerator",
+        "ContextManager",
+        "AsyncContextManager",
+        "Annotated",
+        "ClassVar",
+        "Final",
+        "Literal",
+        "Concatenate",
+        "ParamSpec",
+        "TypeVar",
+        "None",
+        "Ellipsis",
+    }
+)
 
 
-def _python_collect_type_refs(node, source: bytes, generic: bool, out: list[tuple[str, str]]) -> None:
+def _python_collect_type_refs(
+    node, source: bytes, generic: bool, out: list[tuple[str, str]]
+) -> None:
     """Walk a Python type annotation; append (name, role) where role is 'type' or 'generic_arg'.
 
     Builtin/typing containers (list, dict, Optional, Union, …) are not emitted as refs themselves,
@@ -537,7 +684,9 @@ def _csharp_classify_base(name: str, interface_names: set[str]) -> str:
     return "inherits"
 
 
-def _csharp_collect_type_refs(node, source: bytes, generic: bool, out: list[tuple[str, str]]) -> None:
+def _csharp_collect_type_refs(
+    node, source: bytes, generic: bool, out: list[tuple[str, str]]
+) -> None:
     """Walk a C# type expression; append (name, role) tuples (role is 'type' or 'generic_arg')."""
     if node is None:
         return
@@ -1160,7 +1309,10 @@ def _find_body(node, config: LanguageConfig):
 
 # ── Import handlers ───────────────────────────────────────────────────────────
 
-def _import_python(node, source: bytes, file_nid: str, stem: str, edges: list, str_path: str) -> None:
+
+def _import_python(
+    node, source: bytes, file_nid: str, stem: str, edges: list, str_path: str
+) -> None:
     t = node.type
     if t == "import_statement":
         for child in node.children:
@@ -1168,16 +1320,18 @@ def _import_python(node, source: bytes, file_nid: str, stem: str, edges: list, s
                 raw = _read_text(child, source)
                 module_name = raw.split(" as ")[0].strip().lstrip(".")
                 tgt_nid = _make_id(module_name)
-                edges.append({
-                    "source": file_nid,
-                    "target": tgt_nid,
-                    "relation": "imports",
-                    "context": "import",
-                    "confidence": "EXTRACTED",
-                    "source_file": str_path,
-                    "source_location": f"L{node.start_point[0] + 1}",
-                    "weight": 1.0,
-                })
+                edges.append(
+                    {
+                        "source": file_nid,
+                        "target": tgt_nid,
+                        "relation": "imports",
+                        "context": "import",
+                        "confidence": "EXTRACTED",
+                        "source_file": str_path,
+                        "source_location": f"L{node.start_point[0] + 1}",
+                        "weight": 1.0,
+                    }
+                )
     elif t == "import_from_statement":
         module_node = node.child_by_field_name("module_name")
         if module_node:
@@ -1193,16 +1347,18 @@ def _import_python(node, source: bytes, file_nid: str, stem: str, edges: list, s
                 tgt_nid = _make_id(str(base / rel))
             else:
                 tgt_nid = _make_id(raw)
-            edges.append({
-                "source": file_nid,
-                "target": tgt_nid,
-                "relation": "imports_from",
-                "context": "import",
-                "confidence": "EXTRACTED",
-                "source_file": str_path,
-                "source_location": f"L{node.start_point[0] + 1}",
-                "weight": 1.0,
-            })
+            edges.append(
+                {
+                    "source": file_nid,
+                    "target": tgt_nid,
+                    "relation": "imports_from",
+                    "context": "import",
+                    "confidence": "EXTRACTED",
+                    "source_file": str_path,
+                    "source_location": f"L{node.start_point[0] + 1}",
+                    "weight": 1.0,
+                }
+            )
 
 
 def _resolve_js_import_target(raw: str, str_path: str) -> "tuple[str, Path | None] | None":
@@ -1228,7 +1384,11 @@ def _import_js(node, source: bytes, file_nid: str, stem: str, edges: list, str_p
     # Only handle export_statement if it has a `from` clause (re-export).
     # Pure exports like `export const x = 1` or `export { localVar }` have no source module.
     if is_reexport:
-        has_from = any(child.type == "from" or (_read_text(child, source) == "from") for child in node.children if child.type in ("from", "identifier"))
+        has_from = any(
+            child.type == "from" or (_read_text(child, source) == "from")
+            for child in node.children
+            if child.type in ("from", "identifier")
+        )
         if not has_from:
             # Check for string child (source path) as a more reliable indicator
             has_from = any(child.type == "string" for child in node.children)
@@ -1243,16 +1403,18 @@ def _import_js(node, source: bytes, file_nid: str, stem: str, edges: list, str_p
             if resolved is None:
                 break
             tgt_nid, resolved_path = resolved
-            edges.append({
-                "source": file_nid,
-                "target": tgt_nid,
-                "relation": "imports_from",
-                "context": "re-export" if is_reexport else "import",
-                "confidence": "EXTRACTED",
-                "source_file": str_path,
-                "source_location": f"L{node.start_point[0] + 1}",
-                "weight": 1.0,
-            })
+            edges.append(
+                {
+                    "source": file_nid,
+                    "target": tgt_nid,
+                    "relation": "imports_from",
+                    "context": "re-export" if is_reexport else "import",
+                    "confidence": "EXTRACTED",
+                    "source_file": str_path,
+                    "source_location": f"L{node.start_point[0] + 1}",
+                    "weight": 1.0,
+                }
+            )
             break
 
     # Emit symbol-level edges for named imports/re-exports from local/aliased files.
@@ -1277,16 +1439,18 @@ def _import_js(node, source: bytes, file_nid: str, stem: str, edges: list, str_p
                                 sym = _read_text(name_node, source)
                                 if sym == "default":
                                     continue  # skip default re-exports for ID matching
-                                edges.append({
-                                    "source": file_nid,
-                                    "target": _make_id(target_stem, sym),
-                                    "relation": "re_exports",
-                                    "context": "re-export",
-                                    "confidence": "EXTRACTED",
-                                    "source_file": str_path,
-                                    "source_location": f"L{line}",
-                                    "weight": 1.0,
-                                })
+                                edges.append(
+                                    {
+                                        "source": file_nid,
+                                        "target": _make_id(target_stem, sym),
+                                        "relation": "re_exports",
+                                        "context": "re-export",
+                                        "confidence": "EXTRACTED",
+                                        "source_file": str_path,
+                                        "source_location": f"L{line}",
+                                        "weight": 1.0,
+                                    }
+                                )
         else:
             # Handle: import { Foo, type Bar } from './bar'
             for child in node.children:
@@ -1298,20 +1462,23 @@ def _import_js(node, source: bytes, file_nid: str, stem: str, edges: list, str_p
                                     name_node = spec.child_by_field_name("name")
                                     if name_node:
                                         sym = _read_text(name_node, source)
-                                        edges.append({
-                                            "source": file_nid,
-                                            "target": _make_id(target_stem, sym),
-                                            "relation": "imports",
-                                            "context": "import",
-                                            "confidence": "EXTRACTED",
-                                            "source_file": str_path,
-                                            "source_location": f"L{line}",
-                                            "weight": 1.0,
-                                        })
-
-
-def _dynamic_import_js(node, source: bytes, caller_nid: str, str_path: str, edges: list,
-                       seen_dyn_pairs: set) -> bool:
+                                        edges.append(
+                                            {
+                                                "source": file_nid,
+                                                "target": _make_id(target_stem, sym),
+                                                "relation": "imports",
+                                                "context": "import",
+                                                "confidence": "EXTRACTED",
+                                                "source_file": str_path,
+                                                "source_location": f"L{line}",
+                                                "weight": 1.0,
+                                            }
+                                        )
+
+
+def _dynamic_import_js(
+    node, source: bytes, caller_nid: str, str_path: str, edges: list, seen_dyn_pairs: set
+) -> bool:
     """Detect dynamic import() calls in JS/TS and emit imports_from edges.
 
     Handles patterns like:
@@ -1358,16 +1525,18 @@ def _dynamic_import_js(node, source: bytes, caller_nid: str, str_path: str, edge
         pair = (caller_nid, tgt_nid)
         if pair not in seen_dyn_pairs:
             seen_dyn_pairs.add(pair)
-            edges.append({
-                "source": caller_nid,
-                "target": tgt_nid,
-                "relation": "imports_from",
-                "context": "import",
-                "confidence": "EXTRACTED",
-                "source_file": str_path,
-                "source_location": f"L{node.start_point[0] + 1}",
-                "weight": 1.0,
-            })
+            edges.append(
+                {
+                    "source": caller_nid,
+                    "target": tgt_nid,
+                    "relation": "imports_from",
+                    "context": "import",
+                    "confidence": "EXTRACTED",
+                    "source_file": str_path,
+                    "source_location": f"L{node.start_point[0] + 1}",
+                    "weight": 1.0,
+                }
+            )
         break
     return True
 
@@ -1398,16 +1567,18 @@ def _walk_scoped(n) -> str:
             )
             if module_name:
                 tgt_nid = _make_id(module_name)
-                edges.append({
-                    "source": file_nid,
-                    "target": tgt_nid,
-                    "relation": "imports",
-                    "context": "import",
-                    "confidence": "EXTRACTED",
-                    "source_file": str_path,
-                    "source_location": f"L{node.start_point[0] + 1}",
-                    "weight": 1.0,
-                })
+                edges.append(
+                    {
+                        "source": file_nid,
+                        "target": tgt_nid,
+                        "relation": "imports",
+                        "context": "import",
+                        "confidence": "EXTRACTED",
+                        "source_file": str_path,
+                        "source_location": f"L{node.start_point[0] + 1}",
+                        "weight": 1.0,
+                    }
+                )
             break
 
 
@@ -1435,7 +1606,24 @@ def _import_c(node, source: bytes, file_nid: str, stem: str, edges: list, str_pa
                 resolved = _resolve_c_include_path(raw, str_path)
                 if resolved is not None:
                     tgt_nid = _make_id(str(resolved))
-                    edges.append({
+                    edges.append(
+                        {
+                            "source": file_nid,
+                            "target": tgt_nid,
+                            "relation": "imports",
+                            "context": "import",
+                            "confidence": "EXTRACTED",
+                            "source_file": str_path,
+                            "source_location": f"L{node.start_point[0] + 1}",
+                            "weight": 1.0,
+                        }
+                    )
+                    break
+            module_name = raw.split("/")[-1].split(".")[0]
+            if module_name:
+                tgt_nid = _make_id(module_name)
+                edges.append(
+                    {
                         "source": file_nid,
                         "target": tgt_nid,
                         "relation": "imports",
@@ -1444,97 +1632,98 @@ def _import_c(node, source: bytes, file_nid: str, stem: str, edges: list, str_pa
                         "source_file": str_path,
                         "source_location": f"L{node.start_point[0] + 1}",
                         "weight": 1.0,
-                    })
-                    break
-            module_name = raw.split("/")[-1].split(".")[0]
-            if module_name:
-                tgt_nid = _make_id(module_name)
-                edges.append({
-                    "source": file_nid,
-                    "target": tgt_nid,
-                    "relation": "imports",
-                    "context": "import",
-                    "confidence": "EXTRACTED",
-                    "source_file": str_path,
-                    "source_location": f"L{node.start_point[0] + 1}",
-                    "weight": 1.0,
-                })
+                    }
+                )
             break
 
 
-def _import_csharp(node, source: bytes, file_nid: str, stem: str, edges: list, str_path: str) -> None:
+def _import_csharp(
+    node, source: bytes, file_nid: str, stem: str, edges: list, str_path: str
+) -> None:
     for child in node.children:
         if child.type in ("qualified_name", "identifier", "name_equals"):
             raw = _read_text(child, source)
             module_name = raw.split(".")[-1].strip()
             if module_name:
                 tgt_nid = _make_id(module_name)
-                edges.append({
-                    "source": file_nid,
-                    "target": tgt_nid,
-                    "relation": "imports",
-                    "context": "import",
-                    "confidence": "EXTRACTED",
-                    "source_file": str_path,
-                    "source_location": f"L{node.start_point[0] + 1}",
-                    "weight": 1.0,
-                })
+                edges.append(
+                    {
+                        "source": file_nid,
+                        "target": tgt_nid,
+                        "relation": "imports",
+                        "context": "import",
+                        "confidence": "EXTRACTED",
+                        "source_file": str_path,
+                        "source_location": f"L{node.start_point[0] + 1}",
+                        "weight": 1.0,
+                    }
+                )
             break
 
 
-def _import_kotlin(node, source: bytes, file_nid: str, stem: str, edges: list, str_path: str) -> None:
+def _import_kotlin(
+    node, source: bytes, file_nid: str, stem: str, edges: list, str_path: str
+) -> None:
     path_node = node.child_by_field_name("path")
     if path_node:
         raw = _read_text(path_node, source)
         module_name = raw.split(".")[-1].strip()
         if module_name:
             tgt_nid = _make_id(module_name)
-            edges.append({
-                "source": file_nid,
-                "target": tgt_nid,
-                "relation": "imports",
-                "context": "import",
-                "confidence": "EXTRACTED",
-                "source_file": str_path,
-                "source_location": f"L{node.start_point[0] + 1}",
-                "weight": 1.0,
-            })
+            edges.append(
+                {
+                    "source": file_nid,
+                    "target": tgt_nid,
+                    "relation": "imports",
+                    "context": "import",
+                    "confidence": "EXTRACTED",
+                    "source_file": str_path,
+                    "source_location": f"L{node.start_point[0] + 1}",
+                    "weight": 1.0,
+                }
+            )
         return
     # Fallback: find identifier child
     for child in node.children:
         if child.type == "identifier":
             raw = _read_text(child, source)
             tgt_nid = _make_id(raw)
-            edges.append({
-                "source": file_nid,
-                "target": tgt_nid,
-                "relation": "imports",
-                "context": "import",
-                "confidence": "EXTRACTED",
-                "source_file": str_path,
-                "source_location": f"L{node.start_point[0] + 1}",
-                "weight": 1.0,
-            })
+            edges.append(
+                {
+                    "source": file_nid,
+                    "target": tgt_nid,
+                    "relation": "imports",
+                    "context": "import",
+                    "confidence": "EXTRACTED",
+                    "source_file": str_path,
+                    "source_location": f"L{node.start_point[0] + 1}",
+                    "weight": 1.0,
+                }
+            )
             break
 
 
-def _import_scala(node, source: bytes, file_nid: str, stem: str, edges: list, str_path: str) -> None:
+def _import_scala(
+    node, source: bytes, file_nid: str, stem: str, edges: list, str_path: str
+) -> None:
     for child in node.children:
         if child.type in ("stable_id", "identifier"):
             raw = _read_text(child, source)
             module_name = raw.split(".")[-1].strip("{} ")
             if module_name and module_name != "_":
                 tgt_nid = _make_id(module_name)
-                edges.append({
-                    "source": file_nid,
-                    "target": tgt_nid,
-                    "relation": "imports",
-                    "context": "import",
-                    "confidence": "EXTRACTED",
-                    "source_file": str_path,
-                    "source_location": f"L{node.start_point[0] + 1}",
-                    "weight": 1.0,
-                })
+                edges.append(
+                    {
+                        "source": file_nid,
+                        "target": tgt_nid,
+                        "relation": "imports",
+                        "context": "import",
+                        "confidence": "EXTRACTED",
+                        "source_file": str_path,
+                        "source_location": f"L{node.start_point[0] + 1}",
+                        "weight": 1.0,
+                    }
+                )
             break
 
 
@@ -1545,21 +1734,24 @@ def _import_php(node, source: bytes, file_nid: str, stem: str, edges: list, str_
             module_name = raw.split("\\")[-1].strip()
             if module_name:
                 tgt_nid = _make_id(module_name)
-                edges.append({
-                    "source": file_nid,
-                    "target": tgt_nid,
-                    "relation": "imports",
-                    "context": "import",
-                    "confidence": "EXTRACTED",
-                    "source_file": str_path,
-                    "source_location": f"L{node.start_point[0] + 1}",
-                    "weight": 1.0,
-                })
+                edges.append(
+                    {
+                        "source": file_nid,
+                        "target": tgt_nid,
+                        "relation": "imports",
+                        "context": "import",
+                        "confidence": "EXTRACTED",
+                        "source_file": str_path,
+                        "source_location": f"L{node.start_point[0] + 1}",
+                        "weight": 1.0,
+                    }
+                )
             break
 
 
 # ── C/C++ function name helpers ───────────────────────────────────────────────
 
+
 def _get_c_func_name(node, source: bytes) -> str | None:
     """Recursively unwrap declarator to find the innermost identifier (C)."""
     if node.type == "identifier":
@@ -1594,6 +1786,7 @@ def _get_cpp_func_name(node, source: bytes) -> str | None:
 
 # ── JS/TS extra walk for arrow functions ──────────────────────────────────────
 
+
 def _find_require_call(value_node):
     """Return the call_expression node if `value_node` is a `require(...)` call
     or `require(...).x` member access. Otherwise None."""
@@ -1609,7 +1802,9 @@ def _find_require_call(value_node):
     return None
 
 
-def _require_imports_js(node, source: bytes, file_nid: str, stem: str, edges: list, str_path: str) -> bool:
+def _require_imports_js(
+    node, source: bytes, file_nid: str, stem: str, edges: list, str_path: str
+) -> bool:
     """Detect CommonJS require imports inside lexical_declaration / variable_declaration.
 
     Handles three patterns:
@@ -1647,16 +1842,18 @@ def _require_imports_js(node, source: bytes, file_nid: str, stem: str, edges: li
             continue
         tgt_nid, resolved_path = resolved
         line = node.start_point[0] + 1
-        edges.append({
-            "source": file_nid,
-            "target": tgt_nid,
-            "relation": "imports_from",
-            "context": "import",
-            "confidence": "EXTRACTED",
-            "source_file": str_path,
-            "source_location": f"L{line}",
-            "weight": 1.0,
-        })
+        edges.append(
+            {
+                "source": file_nid,
+                "target": tgt_nid,
+                "relation": "imports_from",
+                "context": "import",
+                "confidence": "EXTRACTED",
+                "source_file": str_path,
+                "source_location": f"L{line}",
+                "weight": 1.0,
+            }
+        )
         found = True
 
         # Symbol-level edges for destructured / accessor binders.
@@ -1679,22 +1876,35 @@ def _require_imports_js(node, source: bytes, file_nid: str, stem: str, edges: li
                 sym_names.append(_read_text(prop, source))
         if target_stem is not None:
             for sym in sym_names:
-                edges.append({
-                    "source": file_nid,
-                    "target": _make_id(target_stem, sym),
-                    "relation": "imports",
-                    "context": "import",
-                    "confidence": "EXTRACTED",
-                    "source_file": str_path,
-                    "source_location": f"L{line}",
-                    "weight": 1.0,
-                })
+                edges.append(
+                    {
+                        "source": file_nid,
+                        "target": _make_id(target_stem, sym),
+                        "relation": "imports",
+                        "context": "import",
+                        "confidence": "EXTRACTED",
+                        "source_file": str_path,
+                        "source_location": f"L{line}",
+                        "weight": 1.0,
+                    }
+                )
     return found
 
 
-def _js_extra_walk(node, source: bytes, file_nid: str, stem: str, str_path: str,
-                   nodes: list, edges: list, seen_ids: set, function_bodies: list,
-                   parent_class_nid: str | None, add_node_fn, add_edge_fn) -> bool:
+def _js_extra_walk(
+    node,
+    source: bytes,
+    file_nid: str,
+    stem: str,
+    str_path: str,
+    nodes: list,
+    edges: list,
+    seen_ids: set,
+    function_bodies: list,
+    parent_class_nid: str | None,
+    add_node_fn,
+    add_edge_fn,
+) -> bool:
     """Handle lexical_declaration (arrow functions, CJS requires, module-level const literals) for JS/TS. Returns True if handled."""
     if node.type in ("lexical_declaration", "variable_declaration"):
         # CJS require imports — emit edges, do not block other lexical_declaration handling
@@ -1734,7 +1944,11 @@ def _js_extra_walk(node, source: bytes, file_nid: str, stem: str, str_path: str,
                                 function_bodies.append((func_nid, body))
                             arrow_found = True
                     elif value and value.type in (
-                        "object", "array", "as_expression", "call_expression", "new_expression",
+                        "object",
+                        "array",
+                        "as_expression",
+                        "call_expression",
+                        "new_expression",
                     ):
                         # Module-level const with literal/object/array/factory value
                         name_node = child.child_by_field_name("name")
@@ -1756,10 +1970,22 @@ def _js_extra_walk(node, source: bytes, file_nid: str, stem: str, str_path: str,
 
 # ── C# extra walk for namespace declarations ──────────────────────────────────
 
-def _csharp_extra_walk(node, source: bytes, file_nid: str, stem: str, str_path: str,
-                       nodes: list, edges: list, seen_ids: set, function_bodies: list,
-                       parent_class_nid: str | None, add_node_fn, add_edge_fn,
-                       walk_fn) -> bool:
+
+def _csharp_extra_walk(
+    node,
+    source: bytes,
+    file_nid: str,
+    stem: str,
+    str_path: str,
+    nodes: list,
+    edges: list,
+    seen_ids: set,
+    function_bodies: list,
+    parent_class_nid: str | None,
+    add_node_fn,
+    add_edge_fn,
+    walk_fn,
+) -> bool:
     """Handle namespace_declaration for C#. Returns True if handled."""
     if node.type == "namespace_declaration":
         name_node = node.child_by_field_name("name")
@@ -1779,9 +2005,21 @@ def _csharp_extra_walk(node, source: bytes, file_nid: str, stem: str, str_path:
 
 # ── Swift extra walk for enum cases ──────────────────────────────────────────
 
-def _swift_extra_walk(node, source: bytes, file_nid: str, stem: str, str_path: str,
-                      nodes: list, edges: list, seen_ids: set, function_bodies: list,
-                      parent_class_nid: str | None, add_node_fn, add_edge_fn) -> bool:
+
+def _swift_extra_walk(
+    node,
+    source: bytes,
+    file_nid: str,
+    stem: str,
+    str_path: str,
+    nodes: list,
+    edges: list,
+    seen_ids: set,
+    function_bodies: list,
+    parent_class_nid: str | None,
+    add_node_fn,
+    add_edge_fn,
+) -> bool:
     """Handle enum_entry for Swift. Returns True if handled."""
     if node.type == "enum_entry" and parent_class_nid:
         for child in node.children:
@@ -1819,27 +2057,33 @@ def _swift_extra_walk(node, source: bytes, file_nid: str, stem: str, str_path: s
     call_function_field="function",
     call_accessor_node_types=frozenset({"member_expression"}),
     call_accessor_field="property",
-    function_boundary_types=frozenset({"function_declaration", "arrow_function", "method_definition"}),
+    function_boundary_types=frozenset(
+        {"function_declaration", "arrow_function", "method_definition"}
+    ),
     import_handler=_import_js,
 )
 
 _TS_CONFIG = LanguageConfig(
     ts_module="tree_sitter_typescript",
     ts_language_fn="language_typescript",
-    class_types=frozenset({
-        "class_declaration",
-        "abstract_class_declaration",  # TS abstract class
-        "interface_declaration",   # parity with Java/C#
-        "enum_declaration",        # named enums
-        "type_alias_declaration",  # named type aliases
-    }),
+    class_types=frozenset(
+        {
+            "class_declaration",
+            "abstract_class_declaration",  # TS abstract class
+            "interface_declaration",  # parity with Java/C#
+            "enum_declaration",  # named enums
+            "type_alias_declaration",  # named type aliases
+        }
+    ),
     function_types=frozenset({"function_declaration", "method_definition"}),
     import_types=frozenset({"import_statement", "export_statement"}),
     call_types=frozenset({"call_expression", "new_expression"}),
     call_function_field="function",
     call_accessor_node_types=frozenset({"member_expression"}),
     call_accessor_field="property",
-    function_boundary_types=frozenset({"function_declaration", "arrow_function", "method_definition"}),
+    function_boundary_types=frozenset(
+        {"function_declaration", "arrow_function", "method_definition"}
+    ),
     import_handler=_import_js,
 )
 
@@ -1980,7 +2224,14 @@ def _swift_extra_walk(node, source: bytes, file_nid: str, stem: str, str_path: s
     class_types=frozenset({"class_declaration"}),
     function_types=frozenset({"function_definition", "method_declaration"}),
     import_types=frozenset({"namespace_use_clause"}),
-    call_types=frozenset({"function_call_expression", "member_call_expression", "scoped_call_expression", "class_constant_access_expression"}),
+    call_types=frozenset(
+        {
+            "function_call_expression",
+            "member_call_expression",
+            "scoped_call_expression",
+            "class_constant_access_expression",
+        }
+    ),
     static_prop_types=frozenset({"scoped_property_access_expression"}),
     helper_fn_names=frozenset({"config"}),
     container_bind_methods=frozenset({"bind", "singleton", "scoped", "instance"}),
@@ -2037,6 +2288,7 @@ def _import_lua(node, source: bytes, file_nid: str, stem: str, edges: list, str_
     """Extract require('module') from Lua variable_declaration nodes."""
     text = _read_text(node, source)
     import re
+
     m = re.search(r"""require\s*[\('"]\s*['"]?([^'")\s]+)""", text)
     if m:
         raw_module = m.group(1)
@@ -2073,21 +2325,25 @@ def _import_lua(node, source: bytes, file_nid: str, stem: str, edges: list, str_
 )
 
 
-def _import_swift(node, source: bytes, file_nid: str, stem: str, edges: list, str_path: str) -> None:
+def _import_swift(
+    node, source: bytes, file_nid: str, stem: str, edges: list, str_path: str
+) -> None:
     for child in node.children:
         if child.type == "identifier":
             raw = _read_text(child, source)
             tgt_nid = _make_id(raw)
-            edges.append({
-                "source": file_nid,
-                "target": tgt_nid,
-                "relation": "imports",
-                "context": "import",
-                "confidence": "EXTRACTED",
-                "source_file": str_path,
-                "source_location": f"L{node.start_point[0] + 1}",
-                "weight": 1.0,
-            })
+            edges.append(
+                {
+                    "source": file_nid,
+                    "target": tgt_nid,
+                    "relation": "imports",
+                    "context": "import",
+                    "confidence": "EXTRACTED",
+                    "source_file": str_path,
+                    "source_location": f"L{node.start_point[0] + 1}",
+                    "weight": 1.0,
+                }
+            )
             break
 
 
@@ -2115,7 +2371,9 @@ def _read_csharp_type_name(node, source: bytes) -> str | None:
 _SWIFT_CONFIG = LanguageConfig(
     ts_module="tree_sitter_swift",
     class_types=frozenset({"class_declaration", "protocol_declaration"}),
-    function_types=frozenset({"function_declaration", "init_declaration", "deinit_declaration", "subscript_declaration"}),
+    function_types=frozenset(
+        {"function_declaration", "init_declaration", "deinit_declaration", "subscript_declaration"}
+    ),
     import_types=frozenset({"import_declaration"}),
     call_types=frozenset({"call_expression"}),
     call_function_field="",
@@ -2123,23 +2381,31 @@ def _read_csharp_type_name(node, source: bytes) -> str | None:
     call_accessor_field="",
     name_fallback_child_types=("simple_identifier", "type_identifier", "user_type"),
     body_fallback_child_types=("class_body", "protocol_body", "function_body", "enum_class_body"),
-    function_boundary_types=frozenset({"function_declaration", "init_declaration", "deinit_declaration", "subscript_declaration"}),
+    function_boundary_types=frozenset(
+        {"function_declaration", "init_declaration", "deinit_declaration", "subscript_declaration"}
+    ),
     import_handler=_import_swift,
 )
 
 # ── Generic extractor ─────────────────────────────────────────────────────────
 
+
 def _extract_generic(path: Path, config: LanguageConfig) -> dict:
     """Generic AST extractor driven by LanguageConfig."""
     try:
         mod = importlib.import_module(config.ts_module)
         from tree_sitter import Language, Parser
+
         lang_fn = getattr(mod, config.ts_language_fn, None)
         if lang_fn is None:
             # Fallback for PHP: try "language_php" then "language"
             lang_fn = getattr(mod, "language", None)
         if lang_fn is None:
-            return {"nodes": [], "edges": [], "error": f"No language function in {config.ts_module}"}
+            return {
+                "nodes": [],
+                "edges": [],
+                "error": f"No language function in {config.ts_module}",
+            }
         language = Language(lang_fn())
     except ImportError:
         return {"nodes": [], "edges": [], "error": f"{config.ts_module} not installed"}
@@ -2188,17 +2454,25 @@ def _extract_generic(path: Path, config: LanguageConfig) -> dict:
     def add_node(nid: str, label: str, line: int) -> None:
         if nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({
-                "id": nid,
-                "label": label,
-                "file_type": "code",
-                "source_file": str_path,
-                "source_location": f"L{line}",
-            })
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
 
-    def add_edge(src: str, tgt: str, relation: str, line: int,
-                 confidence: str = "EXTRACTED", weight: float = 1.0,
-                 context: str | None = None) -> None:
+    def add_edge(
+        src: str,
+        tgt: str,
+        relation: str,
+        line: int,
+        confidence: str = "EXTRACTED",
+        weight: float = 1.0,
+        context: str | None = None,
+    ) -> None:
         edge = {
             "source": src,
             "target": tgt,
@@ -2274,13 +2548,15 @@ def walk(node, parent_class_nid: str | None = None) -> None:
                             if base_nid not in seen_ids:
                                 base_nid = _make_id(base)
                                 if base_nid not in seen_ids:
-                                    nodes.append({
-                                        "id": base_nid,
-                                        "label": base,
-                                        "file_type": "code",
-                                        "source_file": "",
-                                        "source_location": "",
-                                    })
+                                    nodes.append(
+                                        {
+                                            "id": base_nid,
+                                            "label": base,
+                                            "file_type": "code",
+                                            "source_file": "",
+                                            "source_location": "",
+                                        }
+                                    )
                                     seen_ids.add(base_nid)
                             add_edge(class_nid, base_nid, "inherits", line)
 
@@ -2448,7 +2724,8 @@ def _php_emit_base(base_name: str, rel: str, at_line: int) -> None:
                         if sub.type == "generic_name":
                             name_child = sub.child_by_field_name("name")
                             base = (
-                                _read_text(name_child, source) if name_child
+                                _read_text(name_child, source)
+                                if name_child
                                 else _read_text(sub.children[0], source)
                             )
                         elif sub.type == "qualified_name":
@@ -2461,13 +2738,15 @@ def _php_emit_base(base_name: str, rel: str, at_line: int) -> None:
                         if base_nid not in seen_ids:
                             base_nid = _make_id(base)
                             if base_nid not in seen_ids:
-                                nodes.append({
-                                    "id": base_nid,
-                                    "label": base,
-                                    "file_type": "code",
-                                    "source_file": "",
-                                    "source_location": "",
-                                })
+                                nodes.append(
+                                    {
+                                        "id": base_nid,
+                                        "label": base,
+                                        "file_type": "code",
+                                        "source_file": "",
+                                        "source_location": "",
+                                    }
+                                )
                                 seen_ids.add(base_nid)
                         relation = _csharp_classify_base(base, csharp_interface_names)
                         add_edge(class_nid, base_nid, relation, line)
@@ -2482,11 +2761,17 @@ def _php_emit_base(base_name: str, rel: str, at_line: int) -> None:
                                     _csharp_collect_type_refs(arg, source, True, refs)
                                     for ref_name, _role in refs:
                                         target = ensure_named_node(ref_name, line)
-                                        add_edge(class_nid, target, "references", line,
-                                                 context="generic_arg")
+                                        add_edge(
+                                            class_nid,
+                                            target,
+                                            "references",
+                                            line,
+                                            context="generic_arg",
+                                        )
 
             # Java-specific: extends (superclass) / implements (interfaces) / interface-extends
             if config.ts_module == "tree_sitter_java":
+
                 def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
                     if not base_name:
                         return
@@ -2494,13 +2779,15 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
                     if base_nid not in seen_ids:
                         base_nid = _make_id(base_name)
                         if base_nid not in seen_ids:
-                            nodes.append({
-                                "id": base_nid,
-                                "label": base_name,
-                                "file_type": "code",
-                                "source_file": "",
-                                "source_location": "",
-                            })
+                            nodes.append(
+                                {
+                                    "id": base_nid,
+                                    "label": base_name,
+                                    "file_type": "code",
+                                    "source_file": "",
+                                    "source_location": "",
+                                }
+                            )
                             seen_ids.add(base_nid)
                     add_edge(class_nid, base_nid, rel, at_line)
 
@@ -2526,7 +2813,9 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
                                 if sub.type == "type_list":
                                     for tid in sub.children:
                                         if tid.type == "type_identifier":
-                                            _emit_java_parent(_read_text(tid, source), "inherits", line)
+                                            _emit_java_parent(
+                                                _read_text(tid, source), "inherits", line
+                                            )
 
             # Scala: extends_clause carries `extends Base with Trait1 with Trait2`.
             # The first base after `extends` is `inherits`; each subsequent
@@ -2612,13 +2901,15 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
                         if base_nid not in seen_ids:
                             base_nid = _make_id(base)
                             if base_nid not in seen_ids:
-                                nodes.append({
-                                    "id": base_nid,
-                                    "label": base,
-                                    "file_type": "code",
-                                    "source_file": "",
-                                    "source_location": "",
-                                })
+                                nodes.append(
+                                    {
+                                        "id": base_nid,
+                                        "label": base,
+                                        "file_type": "code",
+                                        "source_file": "",
+                                        "source_location": "",
+                                    }
+                                )
                                 seen_ids.add(base_nid)
                         add_edge(class_nid, base_nid, "inherits", line)
 
@@ -2647,9 +2938,11 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
                                 break
                     elif c.type == "array_creation_expression":
                         array_node = c
-                if (prop_name is None
-                        or prop_name not in config.event_listener_properties
-                        or array_node is None):
+                if (
+                    prop_name is None
+                    or prop_name not in config.event_listener_properties
+                    or array_node is None
+                ):
                     continue
                 handled_event_listener = True
                 for entry in array_node.children:
@@ -2683,9 +2976,11 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
             if handled_event_listener:
                 return
 
-        if (config.ts_module == "tree_sitter_c_sharp"
-                and t == "field_declaration"
-                and parent_class_nid):
+        if (
+            config.ts_module == "tree_sitter_c_sharp"
+            and t == "field_declaration"
+            and parent_class_nid
+        ):
             type_node = node.child_by_field_name("type")
             if type_node is None:
                 for child in node.children:
@@ -2696,8 +2991,13 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
             type_name = _read_csharp_type_name(type_node, source)
             if type_name:
                 line = node.start_point[0] + 1
-                add_edge(parent_class_nid, ensure_named_node(type_name, line),
-                         "references", line, context="field")
+                add_edge(
+                    parent_class_nid,
+                    ensure_named_node(type_name, line),
+                    "references",
+                    line,
+                    context="field",
+                )
             return
 
         if (config.ts_module == "tree_sitter_php"
@@ -3075,21 +3375,55 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
 
         # JS/TS arrow functions and C# namespaces — language-specific extra handling
         if config.ts_module in ("tree_sitter_javascript", "tree_sitter_typescript"):
-            if _js_extra_walk(node, source, file_nid, stem, str_path,
-                              nodes, edges, seen_ids, function_bodies,
-                              parent_class_nid, add_node, add_edge):
+            if _js_extra_walk(
+                node,
+                source,
+                file_nid,
+                stem,
+                str_path,
+                nodes,
+                edges,
+                seen_ids,
+                function_bodies,
+                parent_class_nid,
+                add_node,
+                add_edge,
+            ):
                 return
 
         if config.ts_module == "tree_sitter_c_sharp":
-            if _csharp_extra_walk(node, source, file_nid, stem, str_path,
-                                   nodes, edges, seen_ids, function_bodies,
-                                   parent_class_nid, add_node, add_edge, walk):
+            if _csharp_extra_walk(
+                node,
+                source,
+                file_nid,
+                stem,
+                str_path,
+                nodes,
+                edges,
+                seen_ids,
+                function_bodies,
+                parent_class_nid,
+                add_node,
+                add_edge,
+                walk,
+            ):
                 return
 
         if config.ts_module == "tree_sitter_swift":
-            if _swift_extra_walk(node, source, file_nid, stem, str_path,
-                                  nodes, edges, seen_ids, function_bodies,
-                                  parent_class_nid, add_node, add_edge):
+            if _swift_extra_walk(
+                node,
+                source,
+                file_nid,
+                stem,
+                str_path,
+                nodes,
+                edges,
+                seen_ids,
+                function_bodies,
+                parent_class_nid,
+                add_node,
+                add_edge,
+            ):
                 return
 
         # Python's `@property` / `@staticmethod` / `@classmethod` wrap the
@@ -3113,7 +3447,7 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
     walk(root)
 
     # ── Call-graph pass ───────────────────────────────────────────────────────
-    label_to_nid: dict[str, str] = {}     # case-sensitive (Ruby, C#, Java, Kotlin, etc.)
+    label_to_nid: dict[str, str] = {}  # case-sensitive (Ruby, C#, Java, Kotlin, etc.)
     label_to_nid_ci: dict[str, str] = {}  # case-insensitive (PHP functions/classes)
     for n in nodes:
         raw = n["label"]
@@ -3146,8 +3480,9 @@ def walk_calls(node, caller_nid: str) -> None:
         if node.type in config.call_types:
             # JS/TS dynamic imports: await import('./foo.js')
             if config.ts_module in ("tree_sitter_javascript", "tree_sitter_typescript"):
-                if _dynamic_import_js(node, source, caller_nid, str_path,
-                                      edges, seen_dyn_import_pairs):
+                if _dynamic_import_js(
+                    node, source, caller_nid, str_path, edges, seen_dyn_import_pairs
+                ):
                     # Still recurse into children (import().then(...) may have calls)
                     for child in node.children:
                         walk_calls(child, caller_nid)
@@ -3236,18 +3571,28 @@ def walk_calls(node, caller_nid: str) -> None:
                         callee_name = _read_text(name_node, source)
             elif config.ts_module == "tree_sitter_cpp":
                 # C++: function field, then field_expression/qualified_identifier
-                func_node = node.child_by_field_name(config.call_function_field) if config.call_function_field else None
+                func_node = (
+                    node.child_by_field_name(config.call_function_field)
+                    if config.call_function_field
+                    else None
+                )
                 if func_node:
                     if func_node.type == "identifier":
                         callee_name = _read_text(func_node, source)
                     elif func_node.type in ("field_expression", "qualified_identifier"):
                         is_member_call = True
-                        name = func_node.child_by_field_name("field") or func_node.child_by_field_name("name")
+                        name = func_node.child_by_field_name(
+                            "field"
+                        ) or func_node.child_by_field_name("name")
                         if name:
                             callee_name = _read_text(name, source)
             else:
                 # Generic: get callee from call_function_field
-                func_node = node.child_by_field_name(config.call_function_field) if config.call_function_field else None
+                func_node = (
+                    node.child_by_field_name(config.call_function_field)
+                    if config.call_function_field
+                    else None
+                )
                 if func_node:
                     if func_node.type == "identifier":
                         callee_name = _read_text(func_node, source)
@@ -3268,28 +3613,32 @@ def walk_calls(node, caller_nid: str) -> None:
                     if pair not in seen_call_pairs:
                         seen_call_pairs.add(pair)
                         line = node.start_point[0] + 1
-                        edges.append({
-                            "source": caller_nid,
-                            "target": tgt_nid,
-                            "relation": "calls",
-                            "context": "call",
-                            "confidence": "EXTRACTED",
-                            "source_file": str_path,
-                            "source_location": f"L{line}",
-                            "weight": 1.0,
-                        })
+                        edges.append(
+                            {
+                                "source": caller_nid,
+                                "target": tgt_nid,
+                                "relation": "calls",
+                                "context": "call",
+                                "confidence": "EXTRACTED",
+                                "source_file": str_path,
+                                "source_location": f"L{line}",
+                                "weight": 1.0,
+                            }
+                        )
                 elif callee_name and not tgt_nid:
                     # Callee not in this file — save for cross-file resolution in extract()
-                    raw_calls.append({
-                        "caller_nid": caller_nid,
-                        "callee": callee_name,
-                        "is_member_call": is_member_call,
-                        "source_file": str_path,
-                        "source_location": f"L{node.start_point[0] + 1}",
-                    })
+                    raw_calls.append(
+                        {
+                            "caller_nid": caller_nid,
+                            "callee": callee_name,
+                            "is_member_call": is_member_call,
+                            "source_file": str_path,
+                            "source_location": f"L{node.start_point[0] + 1}",
+                        }
+                    )
 
             # Helper function calls: config('foo.bar') → uses_config edge to "foo"
-            if (callee_name and callee_name in config.helper_fn_names):
+            if callee_name and callee_name in config.helper_fn_names:
                 args_node = node.child_by_field_name("arguments")
                 first_key: str | None = None
                 if args_node:
@@ -3307,29 +3656,34 @@ def walk_calls(node, caller_nid: str) -> None:
                             break
                 if first_key:
                     segment = first_key.split(".")[0]
-                    tgt_nid = (label_to_nid_ci.get(segment.lower())
-                               or label_to_nid_ci.get(f"{segment}.php".lower()))
+                    tgt_nid = label_to_nid_ci.get(segment.lower()) or label_to_nid_ci.get(
+                        f"{segment}.php".lower()
+                    )
                     if tgt_nid and tgt_nid != caller_nid:
                         relation = f"uses_{callee_name}"
                         pair3 = (caller_nid, tgt_nid, relation)
                         if pair3 not in seen_helper_ref_pairs:
                             seen_helper_ref_pairs.add(pair3)
                             line = node.start_point[0] + 1
-                            edges.append({
-                                "source": caller_nid,
-                                "target": tgt_nid,
-                                "relation": relation,
-                                "confidence": "EXTRACTED",
-                                "confidence_score": 1.0,
-                                "source_file": str_path,
-                                "source_location": f"L{line}",
-                                "weight": 1.0,
-                            })
+                            edges.append(
+                                {
+                                    "source": caller_nid,
+                                    "target": tgt_nid,
+                                    "relation": relation,
+                                    "confidence": "EXTRACTED",
+                                    "confidence_score": 1.0,
+                                    "source_file": str_path,
+                                    "source_location": f"L{line}",
+                                    "weight": 1.0,
+                                }
+                            )
 
             # Service container bindings: $this->app->bind(Foo::class, Bar::class)
-            if (node.type == "member_call_expression"
-                    and callee_name
-                    and callee_name in config.container_bind_methods):
+            if (
+                node.type == "member_call_expression"
+                and callee_name
+                and callee_name in config.container_bind_methods
+            ):
                 args_node = node.child_by_field_name("arguments")
                 class_args: list[str] = []
                 if args_node:
@@ -3353,16 +3707,18 @@ def walk_calls(node, caller_nid: str) -> None:
                         if pair3 not in seen_bind_pairs:
                             seen_bind_pairs.add(pair3)
                             line = node.start_point[0] + 1
-                            edges.append({
-                                "source": contract_nid,
-                                "target": impl_nid,
-                                "relation": "bound_to",
-                                "confidence": "EXTRACTED",
-                                "confidence_score": 1.0,
-                                "source_file": str_path,
-                                "source_location": f"L{line}",
-                                "weight": 1.0,
-                            })
+                            edges.append(
+                                {
+                                    "source": contract_nid,
+                                    "target": impl_nid,
+                                    "relation": "bound_to",
+                                    "confidence": "EXTRACTED",
+                                    "confidence_score": 1.0,
+                                    "source_file": str_path,
+                                    "source_location": f"L{line}",
+                                    "weight": 1.0,
+                                }
+                            )
 
         # Static property access: Foo::$bar → uses_static_prop edge
         if node.type in config.static_prop_types:
@@ -3380,19 +3736,24 @@ def walk_calls(node, caller_nid: str) -> None:
                     if pair3 not in seen_static_ref_pairs:
                         seen_static_ref_pairs.add(pair3)
                         line = node.start_point[0] + 1
-                        edges.append({
-                            "source": caller_nid,
-                            "target": tgt_nid,
-                            "relation": "uses_static_prop",
-                            "confidence": "EXTRACTED",
-                            "confidence_score": 1.0,
-                            "source_file": str_path,
-                            "source_location": f"L{line}",
-                            "weight": 1.0,
-                        })
+                        edges.append(
+                            {
+                                "source": caller_nid,
+                                "target": tgt_nid,
+                                "relation": "uses_static_prop",
+                                "confidence": "EXTRACTED",
+                                "confidence_score": 1.0,
+                                "source_file": str_path,
+                                "source_location": f"L{line}",
+                                "weight": 1.0,
+                            }
+                        )
 
         # PHP class constant access: Foo::BAR → references_constant edge
-        if config.ts_module == "tree_sitter_php" and node.type == "class_constant_access_expression":
+        if (
+            config.ts_module == "tree_sitter_php"
+            and node.type == "class_constant_access_expression"
+        ):
             class_name = _php_class_const_scope(node)
             if class_name:
                 tgt_nid = label_to_nid_ci.get(class_name.lower())
@@ -3401,16 +3762,18 @@ def walk_calls(node, caller_nid: str) -> None:
                     if pair3 not in seen_static_ref_pairs:
                         seen_static_ref_pairs.add(pair3)
                         line = node.start_point[0] + 1
-                        edges.append({
-                            "source": caller_nid,
-                            "target": tgt_nid,
-                            "relation": "references_constant",
-                            "confidence": "EXTRACTED",
-                            "confidence_score": 1.0,
-                            "source_file": str_path,
-                            "source_location": f"L{line}",
-                            "weight": 1.0,
-                        })
+                        edges.append(
+                            {
+                                "source": caller_nid,
+                                "target": tgt_nid,
+                                "relation": "references_constant",
+                                "confidence": "EXTRACTED",
+                                "confidence_score": 1.0,
+                                "source_file": str_path,
+                                "source_location": f"L{line}",
+                                "weight": 1.0,
+                            }
+                        )
 
         for child in node.children:
             walk_calls(child, caller_nid)
@@ -3429,23 +3792,27 @@ def walk_calls(node, caller_nid: str) -> None:
         if pair2 in seen_listen_pairs:
             continue
         seen_listen_pairs.add(pair2)
-        edges.append({
-            "source": event_nid,
-            "target": listener_nid,
-            "relation": "listened_by",
-            "confidence": "EXTRACTED",
-            "confidence_score": 1.0,
-            "source_file": str_path,
-            "source_location": f"L{line}",
-            "weight": 1.0,
-        })
+        edges.append(
+            {
+                "source": event_nid,
+                "target": listener_nid,
+                "relation": "listened_by",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": str_path,
+                "source_location": f"L{line}",
+                "weight": 1.0,
+            }
+        )
 
     # ── Clean edges ───────────────────────────────────────────────────────────
     valid_ids = seen_ids
     clean_edges = []
     for edge in edges:
         src, tgt = edge["source"], edge["target"]
-        if src in valid_ids and (tgt in valid_ids or edge["relation"] in ("imports", "imports_from", "re_exports")):
+        if src in valid_ids and (
+            tgt in valid_ids or edge["relation"] in ("imports", "imports_from", "re_exports")
+        ):
             clean_edges.append(edge)
 
     result = {"nodes": nodes, "edges": clean_edges, "raw_calls": raw_calls}
@@ -3456,7 +3823,15 @@ def walk_calls(node, caller_nid: str) -> None:
 
 # ── Python rationale extraction ───────────────────────────────────────────────
 
-_RATIONALE_PREFIXES = ("# NOTE:", "# IMPORTANT:", "# HACK:", "# WHY:", "# RATIONALE:", "# TODO:", "# FIXME:")
+_RATIONALE_PREFIXES = (
+    "# NOTE:",
+    "# IMPORTANT:",
+    "# HACK:",
+    "# WHY:",
+    "# RATIONALE:",
+    "# TODO:",
+    "# FIXME:",
+)
 
 
 def _is_autogenerated_python(source: bytes) -> bool:
@@ -3470,9 +3845,11 @@ def _is_autogenerated_python(source: bytes) -> bool:
     if any(m in head for m in ("DO NOT EDIT", "@generated", "Generated by the protocol buffer")):
         return True
     # Alembic / Flask-Migrate revision files
-    if (re.search(r"^revision\s*[:=]", head, re.MULTILINE)
-            and "def upgrade(" in head
-            and "down_revision" in head):
+    if (
+        re.search(r"^revision\s*[:=]", head, re.MULTILINE)
+        and "def upgrade(" in head
+        and "down_revision" in head
+    ):
         return True
     # Django migrations
     if "class Migration(migrations.Migration)" in head and "operations" in head:
@@ -3487,6 +3864,7 @@ def _extract_python_rationale(path: Path, result: dict) -> None:
     try:
         import tree_sitter_python as tspython
         from tree_sitter import Language, Parser
+
         language = Language(tspython.language())
         parser = Parser(language)
         source = path.read_bytes()
@@ -3509,7 +3887,9 @@ def _get_docstring(body_node) -> tuple[str, int] | None:
             if child.type == "expression_statement":
                 for sub in child.children:
                     if sub.type in ("string", "concatenated_string"):
-                        text = source[sub.start_byte:sub.end_byte].decode("utf-8", errors="replace")
+                        text = source[sub.start_byte : sub.end_byte].decode(
+                            "utf-8", errors="replace"
+                        )
                         text = text.strip("\"'").strip('"""').strip("'''").strip()
                         if len(text) > 20:
                             return text, child.start_point[0] + 1
@@ -3521,22 +3901,26 @@ def _add_rationale(text: str, line: int, parent_nid: str) -> None:
         rid = _make_id(stem, "rationale", str(line))
         if rid not in seen_ids:
             seen_ids.add(rid)
-            nodes.append({
-                "id": rid,
-                "label": label,
-                "file_type": "rationale",
+            nodes.append(
+                {
+                    "id": rid,
+                    "label": label,
+                    "file_type": "rationale",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
+        edges.append(
+            {
+                "source": rid,
+                "target": parent_nid,
+                "relation": "rationale_for",
+                "confidence": "EXTRACTED",
                 "source_file": str_path,
                 "source_location": f"L{line}",
-            })
-        edges.append({
-            "source": rid,
-            "target": parent_nid,
-            "relation": "rationale_for",
-            "confidence": "EXTRACTED",
-            "source_file": str_path,
-            "source_location": f"L{line}",
-            "weight": 1.0,
-        })
+                "weight": 1.0,
+            }
+        )
 
     # Module-level docstring — skip for auto-generated files (Alembic, Django
     # migrations, protobuf stubs, etc.) whose module docstrings are revision
@@ -3553,7 +3937,9 @@ def walk_docstrings(node, parent_nid: str) -> None:
             name_node = node.child_by_field_name("name")
             body = node.child_by_field_name("body")
             if name_node and body:
-                class_name = source[name_node.start_byte:name_node.end_byte].decode("utf-8", errors="replace")
+                class_name = source[name_node.start_byte : name_node.end_byte].decode(
+                    "utf-8", errors="replace"
+                )
                 nid = _make_id(stem, class_name)
                 ds = _get_docstring(body)
                 if ds:
@@ -3565,8 +3951,14 @@ def walk_docstrings(node, parent_nid: str) -> None:
             name_node = node.child_by_field_name("name")
             body = node.child_by_field_name("body")
             if name_node and body:
-                func_name = source[name_node.start_byte:name_node.end_byte].decode("utf-8", errors="replace")
-                nid = _make_id(parent_nid, func_name) if parent_nid != file_nid else _make_id(stem, func_name)
+                func_name = source[name_node.start_byte : name_node.end_byte].decode(
+                    "utf-8", errors="replace"
+                )
+                nid = (
+                    _make_id(parent_nid, func_name)
+                    if parent_nid != file_nid
+                    else _make_id(stem, func_name)
+                )
                 ds = _get_docstring(body)
                 if ds:
                     _add_rationale(ds[0], ds[1], nid)
@@ -3586,6 +3978,7 @@ def walk_docstrings(node, parent_nid: str) -> None:
 
 # ── Public API ────────────────────────────────────────────────────────────────
 
+
 def extract_python(path: Path) -> dict:
     """Extract classes, functions, and imports from a .py file via tree-sitter AST."""
     result = _extract_generic(path, _PYTHON_CONFIG)
@@ -3615,6 +4008,7 @@ def extract_svelte(path: Path) -> dict:
     result = _extract_generic(path, _JS_CONFIG)
     try:
         import re as _re
+
         src = path.read_text(encoding="utf-8", errors="replace")
         existing_ids = {n["id"] for n in result.get("nodes", [])}
         # Source file node ID must match the one _extract_generic creates:
@@ -3642,7 +4036,7 @@ def extract_svelte(path: Path) -> dict:
                 resolved_alias = None
                 for alias_prefix, alias_base in aliases.items():
                     if raw == alias_prefix or raw.startswith(alias_prefix + "/"):
-                        rest = raw[len(alias_prefix):].lstrip("/")
+                        rest = raw[len(alias_prefix) :].lstrip("/")
                         resolved_alias = Path(os.path.normpath(Path(alias_base) / rest))
                         break
                 if resolved_alias is not None:
@@ -3659,34 +4053,42 @@ def extract_svelte(path: Path) -> dict:
                     stub_source_file = raw
             if node_id in existing_ids:
                 # Edge target already a real node - just add the edge, don't add a node.
-                result.setdefault("edges", []).append({
-                    "source": file_node_id, "target": node_id,
-                    "relation": "dynamic_import", "confidence": "EXTRACTED",
-                    "source_file": str(path),
-                })
+                result.setdefault("edges", []).append(
+                    {
+                        "source": file_node_id,
+                        "target": node_id,
+                        "relation": "dynamic_import",
+                        "confidence": "EXTRACTED",
+                        "source_file": str(path),
+                    }
+                )
                 continue
-            result.setdefault("nodes", []).append({
-                "id": node_id, "label": raw,
-                "file_type": "code", "source_file": stub_source_file,
-                "confidence": "EXTRACTED",
-            })
-            result.setdefault("edges", []).append({
-                "source": file_node_id, "target": node_id,
-                "relation": "dynamic_import", "confidence": "EXTRACTED",
-                "source_file": str(path),
-            })
+            result.setdefault("nodes", []).append(
+                {
+                    "id": node_id,
+                    "label": raw,
+                    "file_type": "code",
+                    "source_file": stub_source_file,
+                    "confidence": "EXTRACTED",
+                }
+            )
+            result.setdefault("edges", []).append(
+                {
+                    "source": file_node_id,
+                    "target": node_id,
+                    "relation": "dynamic_import",
+                    "confidence": "EXTRACTED",
+                    "source_file": str(path),
+                }
+            )
             existing_ids.add(node_id)
         # Static imports inside <script> blocks. The JS tree-sitter parser fed
         # the full .svelte file produces a top-level ERROR node (HTML markup
         # is not valid JS), so import_statement nodes are never reached and
         # static imports are silently dropped (#713). Regex over each script
         # body recovers them.
-        script_re = _re.compile(
-            r"<script\b[^>]*>([\s\S]*?)</script\s*>", _re.IGNORECASE
-        )
-        static_import_re = _re.compile(
-            r"""import\s+(?:[^'"`;]+?\s+from\s+)?['"]([^'"]+)['"]"""
-        )
+        script_re = _re.compile(r"<script\b[^>]*>([\s\S]*?)</script\s*>", _re.IGNORECASE)
+        static_import_re = _re.compile(r"""import\s+(?:[^'"`;]+?\s+from\s+)?['"]([^'"]+)['"]""")
         for script_match in script_re.finditer(src):
             script_body = script_match.group(1)
             for m in static_import_re.finditer(script_body):
@@ -3705,7 +4107,7 @@ def extract_svelte(path: Path) -> dict:
                     resolved_alias = None
                     for alias_prefix, alias_base in aliases.items():
                         if raw == alias_prefix or raw.startswith(alias_prefix + "/"):
-                            rest = raw[len(alias_prefix):].lstrip("/")
+                            rest = raw[len(alias_prefix) :].lstrip("/")
                             resolved_alias = Path(os.path.normpath(Path(alias_base) / rest))
                             break
                     if resolved_alias is not None:
@@ -3718,25 +4120,37 @@ def extract_svelte(path: Path) -> dict:
                         node_id = _make_id(module_name)
                         stub_source_file = raw
                 if node_id in existing_ids:
-                    result.setdefault("edges", []).append({
-                        "source": file_node_id, "target": node_id,
-                        "relation": "imports_from", "confidence": "EXTRACTED",
-                        "source_file": str(path),
-                    })
+                    result.setdefault("edges", []).append(
+                        {
+                            "source": file_node_id,
+                            "target": node_id,
+                            "relation": "imports_from",
+                            "confidence": "EXTRACTED",
+                            "source_file": str(path),
+                        }
+                    )
                     continue
-                result.setdefault("nodes", []).append({
-                    "id": node_id, "label": raw,
-                    "file_type": "code", "source_file": stub_source_file,
-                    "confidence": "EXTRACTED",
-                })
-                result.setdefault("edges", []).append({
-                    "source": file_node_id, "target": node_id,
-                    "relation": "imports_from", "confidence": "EXTRACTED",
-                    "source_file": str(path),
-                })
+                result.setdefault("nodes", []).append(
+                    {
+                        "id": node_id,
+                        "label": raw,
+                        "file_type": "code",
+                        "source_file": stub_source_file,
+                        "confidence": "EXTRACTED",
+                    }
+                )
+                result.setdefault("edges", []).append(
+                    {
+                        "source": file_node_id,
+                        "target": node_id,
+                        "relation": "imports_from",
+                        "confidence": "EXTRACTED",
+                        "source_file": str(path),
+                    }
+                )
                 existing_ids.add(node_id)
-    except Exception:
-        pass
+    except Exception as exc:
+        _LOG.debug("svelte import fallback failed for %s: %s", path, exc)
     return result
 
 
@@ -3756,6 +4170,7 @@ def extract_astro(path: Path) -> dict:
     result = _extract_generic(path, _JS_CONFIG)
     try:
         import re as _re
+
         src = path.read_text(encoding="utf-8", errors="replace")
         existing_ids = {n["id"] for n in result.get("nodes", [])}
         file_node_id = _make_id(str(path))
@@ -3775,7 +4190,7 @@ def extract_astro(path: Path) -> dict:
                 resolved_alias = None
                 for alias_prefix, alias_base in aliases.items():
                     if raw == alias_prefix or raw.startswith(alias_prefix + "/"):
-                        rest = raw[len(alias_prefix):].lstrip("/")
+                        rest = raw[len(alias_prefix) :].lstrip("/")
                         resolved_alias = Path(os.path.normpath(Path(alias_base) / rest))
                         break
                 if resolved_alias is not None:
@@ -3789,35 +4204,41 @@ def extract_astro(path: Path) -> dict:
                     node_id = _make_id(module_name)
                     stub_source_file = raw
             if node_id in existing_ids:
-                result.setdefault("edges", []).append({
-                    "source": file_node_id, "target": node_id,
-                    "relation": "dynamic_import", "confidence": "EXTRACTED",
-                    "source_file": str(path),
-                })
+                result.setdefault("edges", []).append(
+                    {
+                        "source": file_node_id,
+                        "target": node_id,
+                        "relation": "dynamic_import",
+                        "confidence": "EXTRACTED",
+                        "source_file": str(path),
+                    }
+                )
                 continue
-            result.setdefault("nodes", []).append({
-                "id": node_id, "label": raw,
-                "file_type": "code", "source_file": stub_source_file,
-                "confidence": "EXTRACTED",
-            })
-            result.setdefault("edges", []).append({
-                "source": file_node_id, "target": node_id,
-                "relation": "dynamic_import", "confidence": "EXTRACTED",
-                "source_file": str(path),
-            })
+            result.setdefault("nodes", []).append(
+                {
+                    "id": node_id,
+                    "label": raw,
+                    "file_type": "code",
+                    "source_file": stub_source_file,
+                    "confidence": "EXTRACTED",
+                }
+            )
+            result.setdefault("edges", []).append(
+                {
+                    "source": file_node_id,
+                    "target": node_id,
+                    "relation": "dynamic_import",
+                    "confidence": "EXTRACTED",
+                    "source_file": str(path),
+                }
+            )
             existing_ids.add(node_id)
         # Static imports: scan the `---...---` frontmatter at the file head plus any
         # client-side <script> blocks. Both are TS/JS regions but live inside a file
         # the JS tree-sitter parser cannot validate as a whole.
-        frontmatter_re = _re.compile(
-            r"\A\s*---\s*\r?\n([\s\S]*?)\r?\n---\s*(?:\r?\n|\Z)"
-        )
-        script_re = _re.compile(
-            r"<script\b[^>]*>([\s\S]*?)</script\s*>", _re.IGNORECASE
-        )
-        static_import_re = _re.compile(
-            r"""import\s+(?:[^'"`;]+?\s+from\s+)?['"]([^'"]+)['"]"""
-        )
+        frontmatter_re = _re.compile(r"\A\s*---\s*\r?\n([\s\S]*?)\r?\n---\s*(?:\r?\n|\Z)")
+        script_re = _re.compile(r"<script\b[^>]*>([\s\S]*?)</script\s*>", _re.IGNORECASE)
+        static_import_re = _re.compile(r"""import\s+(?:[^'"`;]+?\s+from\s+)?['"]([^'"]+)['"]""")
         regions: list[str] = []
         fm = frontmatter_re.search(src)
         if fm:
@@ -3841,7 +4262,7 @@ def extract_astro(path: Path) -> dict:
                     resolved_alias = None
                     for alias_prefix, alias_base in aliases.items():
                         if raw == alias_prefix or raw.startswith(alias_prefix + "/"):
-                            rest = raw[len(alias_prefix):].lstrip("/")
+                            rest = raw[len(alias_prefix) :].lstrip("/")
                             resolved_alias = Path(os.path.normpath(Path(alias_base) / rest))
                             break
                     if resolved_alias is not None:
@@ -3854,25 +4275,37 @@ def extract_astro(path: Path) -> dict:
                         node_id = _make_id(module_name)
                         stub_source_file = raw
                 if node_id in existing_ids:
-                    result.setdefault("edges", []).append({
-                        "source": file_node_id, "target": node_id,
-                        "relation": "imports_from", "confidence": "EXTRACTED",
-                        "source_file": str(path),
-                    })
+                    result.setdefault("edges", []).append(
+                        {
+                            "source": file_node_id,
+                            "target": node_id,
+                            "relation": "imports_from",
+                            "confidence": "EXTRACTED",
+                            "source_file": str(path),
+                        }
+                    )
                     continue
-                result.setdefault("nodes", []).append({
-                    "id": node_id, "label": raw,
-                    "file_type": "code", "source_file": stub_source_file,
-                    "confidence": "EXTRACTED",
-                })
-                result.setdefault("edges", []).append({
-                    "source": file_node_id, "target": node_id,
-                    "relation": "imports_from", "confidence": "EXTRACTED",
-                    "source_file": str(path),
-                })
+                result.setdefault("nodes", []).append(
+                    {
+                        "id": node_id,
+                        "label": raw,
+                        "file_type": "code",
+                        "source_file": stub_source_file,
+                        "confidence": "EXTRACTED",
+                    }
+                )
+                result.setdefault("edges", []).append(
+                    {
+                        "source": file_node_id,
+                        "target": node_id,
+                        "relation": "imports_from",
+                        "confidence": "EXTRACTED",
+                        "source_file": str(path),
+                    }
+                )
                 existing_ids.add(node_id)
-    except Exception:
-        pass
+    except Exception as exc:
+        _LOG.debug("astro import fallback failed for %s: %s", path, exc)
     return result
 
 
@@ -3885,6 +4318,7 @@ def _is_spock_file(path: Path, ts_result: dict) -> bool:
     """Return True when the file contains Spock-style ``def "feature"()`` methods
     that tree-sitter-groovy cannot parse, detected by checking the raw source."""
     import re as _re
+
     _SPOCK_FEATURE_RE = _re.compile(r"""^\s*def\s+[\"']""", _re.MULTILINE)
     try:
         return bool(_SPOCK_FEATURE_RE.search(path.read_text(errors="replace")))
@@ -3898,6 +4332,7 @@ def _extract_spock_fallback(path: Path, ts_result: dict) -> dict:
     (which survive reliably) with class and feature-method nodes extracted via regex.
     """
     import re as _re
+
     source = path.read_text(errors="replace")
     str_path = str(path)
     stem = _file_stem(path)
@@ -3913,25 +4348,30 @@ def _extract_spock_fallback(path: Path, ts_result: dict) -> dict:
     def _add_node(nid: str, label: str, line: int) -> None:
         if nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({
-                "id": nid,
-                "label": label,
-                "file_type": "code",
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
+
+    def _add_edge(
+        src: str, tgt: str, relation: str, line: int, confidence: str = "EXTRACTED"
+    ) -> None:
+        edges.append(
+            {
+                "source": src,
+                "target": tgt,
+                "relation": relation,
+                "confidence": confidence,
                 "source_file": str_path,
                 "source_location": f"L{line}",
-            })
-
-    def _add_edge(src: str, tgt: str, relation: str, line: int,
-                  confidence: str = "EXTRACTED") -> None:
-        edges.append({
-            "source": src,
-            "target": tgt,
-            "relation": relation,
-            "confidence": confidence,
-            "source_file": str_path,
-            "source_location": f"L{line}",
-            "weight": 1.0,
-        })
+                "weight": 1.0,
+            }
+        )
 
     lines_text = source.splitlines()
 
@@ -4035,14 +4475,22 @@ def extract_php(path: Path) -> dict:
 def extract_blade(path: Path) -> dict:
     """Extract @include, <livewire:> components, and wire:click bindings from Blade templates."""
     import re
+
     try:
         src = path.read_text(encoding="utf-8", errors="replace")
     except OSError:
         return {"error": f"cannot read {path}"}
 
     file_nid = _make_id(str(path))
-    nodes = [{"id": file_nid, "label": path.name, "file_type": "code",
-              "source_file": str(path), "source_location": None}]
+    nodes = [
+        {
+            "id": file_nid,
+            "label": path.name,
+            "file_type": "code",
+            "source_file": str(path),
+            "source_location": None,
+        }
+    ]
     edges = []
 
     # @include('path.to.partial') or @include("path.to.partial")
@@ -4050,31 +4498,79 @@ def extract_blade(path: Path) -> dict:
         tgt = m.group(1).replace(".", "/")
         tgt_nid = _make_id(tgt)
         if tgt_nid not in {n["id"] for n in nodes}:
-            nodes.append({"id": tgt_nid, "label": m.group(1), "file_type": "code",
-                          "source_file": str(path), "source_location": None})
-        edges.append({"source": file_nid, "target": tgt_nid, "relation": "includes",
-                      "confidence": "EXTRACTED", "confidence_score": 1.0,
-                      "source_file": str(path), "source_location": None, "weight": 1.0})
+            nodes.append(
+                {
+                    "id": tgt_nid,
+                    "label": m.group(1),
+                    "file_type": "code",
+                    "source_file": str(path),
+                    "source_location": None,
+                }
+            )
+        edges.append(
+            {
+                "source": file_nid,
+                "target": tgt_nid,
+                "relation": "includes",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": str(path),
+                "source_location": None,
+                "weight": 1.0,
+            }
+        )
 
     # <livewire:component.name /> or <livewire:component.name>
     for m in re.finditer(r"<livewire:([\w.\-]+)", src):
         tgt_nid = _make_id(m.group(1))
         if tgt_nid not in {n["id"] for n in nodes}:
-            nodes.append({"id": tgt_nid, "label": m.group(1), "file_type": "code",
-                          "source_file": str(path), "source_location": None})
-        edges.append({"source": file_nid, "target": tgt_nid, "relation": "uses_component",
-                      "confidence": "EXTRACTED", "confidence_score": 1.0,
-                      "source_file": str(path), "source_location": None, "weight": 1.0})
+            nodes.append(
+                {
+                    "id": tgt_nid,
+                    "label": m.group(1),
+                    "file_type": "code",
+                    "source_file": str(path),
+                    "source_location": None,
+                }
+            )
+        edges.append(
+            {
+                "source": file_nid,
+                "target": tgt_nid,
+                "relation": "uses_component",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": str(path),
+                "source_location": None,
+                "weight": 1.0,
+            }
+        )
 
     # wire:click="methodName"
     for m in re.finditer(r'wire:click=["\']([^"\']+)["\']', src):
         tgt_nid = _make_id(m.group(1))
         if tgt_nid not in {n["id"] for n in nodes}:
-            nodes.append({"id": tgt_nid, "label": m.group(1), "file_type": "code",
-                          "source_file": str(path), "source_location": None})
-        edges.append({"source": file_nid, "target": tgt_nid, "relation": "binds_method",
-                      "confidence": "EXTRACTED", "confidence_score": 1.0,
-                      "source_file": str(path), "source_location": None, "weight": 1.0})
+            nodes.append(
+                {
+                    "id": tgt_nid,
+                    "label": m.group(1),
+                    "file_type": "code",
+                    "source_file": str(path),
+                    "source_location": None,
+                }
+            )
+        edges.append(
+            {
+                "source": file_nid,
+                "target": tgt_nid,
+                "relation": "binds_method",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": str(path),
+                "source_location": None,
+                "weight": 1.0,
+            }
+        )
 
     return {"nodes": nodes, "edges": edges}
 
@@ -4089,8 +4585,15 @@ def extract_dart(path: Path) -> dict:
     # Use stem (not str(path)) for child IDs to keep them machine-independent.
     stem = _file_stem(path)
     file_nid = _make_id(str(path))
-    nodes = [{"id": file_nid, "label": path.name, "file_type": "code",
-              "source_file": str(path), "source_location": None}]
+    nodes = [
+        {
+            "id": file_nid,
+            "label": path.name,
+            "file_type": "code",
+            "source_file": str(path),
+            "source_location": None,
+        }
+    ]
     edges = []
     defined: set[str] = set()
 
@@ -4098,11 +4601,27 @@ def extract_dart(path: Path) -> dict:
     for m in re.finditer(r"^\s*(?:abstract\s+)?(?:class|mixin)\s+(\w+)", src, re.MULTILINE):
         nid = _make_id(stem, m.group(1))
         if nid not in defined:
-            nodes.append({"id": nid, "label": m.group(1), "file_type": "code",
-                          "source_file": str(path), "source_location": None})
-            edges.append({"source": file_nid, "target": nid, "relation": "defines",
-                          "confidence": "EXTRACTED", "confidence_score": 1.0,
-                          "source_file": str(path), "source_location": None, "weight": 1.0})
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": m.group(1),
+                    "file_type": "code",
+                    "source_file": str(path),
+                    "source_location": None,
+                }
+            )
+            edges.append(
+                {
+                    "source": file_nid,
+                    "target": nid,
+                    "relation": "defines",
+                    "confidence": "EXTRACTED",
+                    "confidence_score": 1.0,
+                    "source_file": str(path),
+                    "source_location": None,
+                    "weight": 1.0,
+                }
+            )
             defined.add(nid)
 
     # Top-level and member functions/methods
@@ -4112,11 +4631,27 @@ def extract_dart(path: Path) -> dict:
             continue
         nid = _make_id(stem, name)
         if nid not in defined:
-            nodes.append({"id": nid, "label": name, "file_type": "code",
-                          "source_file": str(path), "source_location": None})
-            edges.append({"source": file_nid, "target": nid, "relation": "defines",
-                          "confidence": "EXTRACTED", "confidence_score": 1.0,
-                          "source_file": str(path), "source_location": None, "weight": 1.0})
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": name,
+                    "file_type": "code",
+                    "source_file": str(path),
+                    "source_location": None,
+                }
+            )
+            edges.append(
+                {
+                    "source": file_nid,
+                    "target": nid,
+                    "relation": "defines",
+                    "confidence": "EXTRACTED",
+                    "confidence_score": 1.0,
+                    "source_file": str(path),
+                    "source_location": None,
+                    "weight": 1.0,
+                }
+            )
             defined.add(nid)
 
     # import 'package:...' or import '...'
@@ -4124,12 +4659,28 @@ def extract_dart(path: Path) -> dict:
         pkg = m.group(1)
         tgt_nid = _make_id(pkg)
         if tgt_nid not in defined:
-            nodes.append({"id": tgt_nid, "label": pkg, "file_type": "code",
-                          "source_file": str(path), "source_location": None})
+            nodes.append(
+                {
+                    "id": tgt_nid,
+                    "label": pkg,
+                    "file_type": "code",
+                    "source_file": str(path),
+                    "source_location": None,
+                }
+            )
             defined.add(tgt_nid)
-        edges.append({"source": file_nid, "target": tgt_nid, "relation": "imports",
-                      "confidence": "EXTRACTED", "confidence_score": 1.0,
-                      "source_file": str(path), "source_location": None, "weight": 1.0})
+        edges.append(
+            {
+                "source": file_nid,
+                "target": tgt_nid,
+                "relation": "imports",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": str(path),
+                "source_location": None,
+                "weight": 1.0,
+            }
+        )
 
     return {"nodes": nodes, "edges": edges}
 
@@ -4160,15 +4711,37 @@ def extract_verilog(path: Path) -> dict:
     def add_node(nid: str, label: str, line: int) -> None:
         if nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({"id": nid, "label": label, "file_type": "code",
-                          "source_file": str_path, "source_location": f"L{line}",
-                          "confidence_score": 1.0})
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                    "confidence_score": 1.0,
+                }
+            )
 
-    def add_edge(src: str, tgt: str, relation: str, line: int,
-                 confidence: str = "EXTRACTED", score: float = 1.0) -> None:
-        edges.append({"source": src, "target": tgt, "relation": relation,
-                      "confidence": confidence, "confidence_score": score,
-                      "source_file": str_path, "source_location": f"L{line}", "weight": 1.0})
+    def add_edge(
+        src: str,
+        tgt: str,
+        relation: str,
+        line: int,
+        confidence: str = "EXTRACTED",
+        score: float = 1.0,
+    ) -> None:
+        edges.append(
+            {
+                "source": src,
+                "target": tgt,
+                "relation": relation,
+                "confidence": confidence,
+                "confidence_score": score,
+                "source_file": str_path,
+                "source_location": f"L{line}",
+                "weight": 1.0,
+            }
+        )
 
     file_nid = _make_id(str(path))
     add_node(file_nid, path.name, 1)
@@ -4244,7 +4817,11 @@ def extract_sql(path: Path) -> dict:
         import tree_sitter_sql as tssql
         from tree_sitter import Language, Parser
     except ImportError:
-        return {"nodes": [], "edges": [], "error": "tree_sitter_sql not installed. Run: pip install tree-sitter-sql"}
+        return {
+            "nodes": [],
+            "edges": [],
+            "error": "tree_sitter_sql not installed. Run: pip install tree-sitter-sql",
+        }
 
     try:
         language = Language(tssql.language())
@@ -4258,14 +4835,21 @@ def extract_sql(path: Path) -> dict:
     stem = _file_stem(path)
     str_path = str(path)
     file_nid = _make_id(str_path)
-    nodes: list[dict] = [{"id": file_nid, "label": path.name, "file_type": "code",
-                           "source_file": str_path, "source_location": None}]
+    nodes: list[dict] = [
+        {
+            "id": file_nid,
+            "label": path.name,
+            "file_type": "code",
+            "source_file": str_path,
+            "source_location": None,
+        }
+    ]
     edges: list[dict] = []
     seen_ids: set[str] = {file_nid}
     table_nids: dict[str, str] = {}  # name → nid for reference resolution
 
     def _read(n) -> str:
-        return source[n.start_byte:n.end_byte].decode("utf-8", errors="replace")
+        return source[n.start_byte : n.end_byte].decode("utf-8", errors="replace")
 
     def _obj_name(n) -> str | None:
         for c in n.children:
@@ -4276,16 +4860,39 @@ def _obj_name(n) -> str | None:
     def _add_node(nid: str, label: str, line: int) -> None:
         if nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({"id": nid, "label": label, "file_type": "code",
-                           "source_file": str_path, "source_location": f"L{line}"})
-            edges.append({"source": file_nid, "target": nid, "relation": "contains",
-                           "confidence": "EXTRACTED", "source_file": str_path,
-                           "source_location": f"L{line}", "weight": 1.0})
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
+            edges.append(
+                {
+                    "source": file_nid,
+                    "target": nid,
+                    "relation": "contains",
+                    "confidence": "EXTRACTED",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                    "weight": 1.0,
+                }
+            )
 
     def _add_edge(src: str, tgt: str, relation: str, line: int) -> None:
-        edges.append({"source": src, "target": tgt, "relation": relation,
-                       "confidence": "EXTRACTED", "source_file": str_path,
-                       "source_location": f"L{line}", "weight": 1.0})
+        edges.append(
+            {
+                "source": src,
+                "target": tgt,
+                "relation": relation,
+                "confidence": "EXTRACTED",
+                "source_file": str_path,
+                "source_location": f"L{line}",
+                "weight": 1.0,
+            }
+        )
 
     def walk(node) -> None:
         t = node.type
@@ -4314,7 +4921,9 @@ def walk(node) -> None:
                                         ref_name = _read(cc)
                                         break
                                 if ref_name:
-                                    ref_nid = table_nids.get(ref_name.lower()) or _make_id(stem, ref_name)
+                                    ref_nid = table_nids.get(ref_name.lower()) or _make_id(
+                                        stem, ref_name
+                                    )
                                     _add_edge(nid, ref_nid, "references", line)
                                     seen_refs.add(ref_name.lower())
                             elif cd.type == "constraints":
@@ -4331,7 +4940,9 @@ def walk(node) -> None:
                                             ref_name = _read(cc)
                                             break
                                     if ref_name:
-                                        ref_nid = table_nids.get(ref_name.lower()) or _make_id(stem, ref_name)
+                                        ref_nid = table_nids.get(ref_name.lower()) or _make_id(
+                                            stem, ref_name
+                                        )
                                         _add_edge(nid, ref_nid, "references", line)
                                         seen_refs.add(ref_name.lower())
                         if has_error:
@@ -4339,10 +4950,14 @@ def walk(node) -> None:
                             # nodes that make the parser drop the trailing constraints block.
                             # Regex-scan the raw column_definitions text as fallback.
                             col_text = _read(col)
-                            for rm in re.finditer(r"\bREFERENCES\s+([\w$]+)", col_text, re.IGNORECASE):
-                                ref_name = rm.group(1)
+                            for rm in re.finditer(
+                                r"\bREFERENCES\s+([\w$]+)", col_text, re.IGNORECASE
+                            ):
+                                ref_name = str(rm.group(1))
                                 if ref_name.lower() not in seen_refs:
-                                    ref_nid = table_nids.get(ref_name.lower()) or _make_id(stem, ref_name)
+                                    ref_nid = table_nids.get(ref_name.lower()) or _make_id(
+                                        stem, ref_name
+                                    )
                                     _add_edge(nid, ref_nid, "references", line)
                                     seen_refs.add(ref_name.lower())
 
@@ -4422,7 +5037,8 @@ def walk(node) -> None:
             m = re.match(
                 r"CREATE\s+(?:OR\s+(?:REPLACE|ALTER)\s+)?"
                 r"(PROCEDURE|TRIGGER|FUNCTION)\s+([\w$]+)",
-                text, re.IGNORECASE,
+                text,
+                re.IGNORECASE,
             )
             if m:
                 obj_type = m.group(1).upper()
@@ -4437,8 +5053,19 @@ def walk(node) -> None:
                         tbl_nid = table_nids.get(tbl.lower()) or _make_id(stem, tbl)
                         _add_edge(obj_nid, tbl_nid, "triggers", line)
                 _NON_TABLES = {
-                    "select", "where", "set", "dual", "null", "true", "false",
-                    "first", "skip", "rows", "next", "only", "lateral",
+                    "select",
+                    "where",
+                    "set",
+                    "dual",
+                    "null",
+                    "true",
+                    "false",
+                    "first",
+                    "skip",
+                    "rows",
+                    "next",
+                    "only",
+                    "lateral",
                 }
                 seen_tbls: set[str] = set()
                 for rm in re.finditer(r"\b(?:FROM|JOIN|INTO)\s+([\w$]+)", text, re.IGNORECASE):
@@ -4466,8 +5093,7 @@ def _walk_from_refs(node, caller_nid: str, line: int) -> None:
                         if cc.type == "object_reference":
                             tbl = _read(cc)
                             tbl_nid = _make_id(stem, tbl)
-                            _add_edge(caller_nid, tbl_nid, "reads_from",
-                                      c.start_point[0] + 1)
+                            _add_edge(caller_nid, tbl_nid, "reads_from", c.start_point[0] + 1)
         for child in node.children:
             _walk_from_refs(child, caller_nid, line)
 
@@ -4489,7 +5115,7 @@ def _walk_from_refs(node, caller_nid: str, line: int) -> None:
         if tbl_nid is None:
             continue
         tbl_line = src_text[: m.start()].count("\n") + 1
-        tail = src_text[m.start():]
+        tail = src_text[m.start() :]
         end = re.search(r"(?:^|\n)(?:CREATE|SET\s+TERM|ALTER)\s", tail[1:], re.IGNORECASE)
         block = tail[: end.start() + 1] if end else tail
         for rm in re.finditer(r"\bREFERENCES\s+([\w$]+)", block, re.IGNORECASE):
@@ -4514,6 +5140,7 @@ def extract_swift(path: Path) -> dict:
 
 # ── Julia extractor (custom walk) ────────────────────────────────────────────
 
+
 def extract_julia(path: Path) -> dict:
     """Extract modules, structs, functions, imports, and calls from a .jl file."""
     try:
@@ -4536,22 +5163,30 @@ def extract_julia(path: Path) -> dict:
     nodes: list[dict] = []
     edges: list[dict] = []
     seen_ids: set[str] = set()
-    function_bodies: list[tuple[str, object]] = []
+    function_bodies: list[tuple[str, Any]] = []
 
     def add_node(nid: str, label: str, line: int) -> None:
         if nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({
-                "id": nid,
-                "label": label,
-                "file_type": "code",
-                "source_file": str_path,
-                "source_location": f"L{line}",
-            })
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
 
-    def add_edge(src: str, tgt: str, relation: str, line: int,
-                 confidence: str = "EXTRACTED", weight: float = 1.0,
-                 context: str | None = None) -> None:
+    def add_edge(
+        src: str,
+        tgt: str,
+        relation: str,
+        line: int,
+        confidence: str = "EXTRACTED",
+        weight: float = 1.0,
+        context: str | None = None,
+    ) -> None:
         edge = {
             "source": src,
             "target": tgt,
@@ -4598,15 +5233,27 @@ def walk_calls(body_node, func_nid: str) -> None:
             if callee.type == "identifier":
                 callee_name = _read_text(callee, source)
                 target_nid = _make_id(stem, callee_name)
-                add_edge(func_nid, target_nid, "calls", body_node.start_point[0] + 1,
-                         confidence="EXTRACTED", context="call")
+                add_edge(
+                    func_nid,
+                    target_nid,
+                    "calls",
+                    body_node.start_point[0] + 1,
+                    confidence="EXTRACTED",
+                    context="call",
+                )
             # Method call: obj.method(...)
             elif callee.type == "field_expression" and len(callee.children) >= 3:
                 method_node = callee.children[-1]
                 method_name = _read_text(method_node, source)
                 target_nid = _make_id(stem, method_name)
-                add_edge(func_nid, target_nid, "calls", body_node.start_point[0] + 1,
-                         confidence="EXTRACTED", context="call")
+                add_edge(
+                    func_nid,
+                    target_nid,
+                    "calls",
+                    body_node.start_point[0] + 1,
+                    confidence="EXTRACTED",
+                    context="call",
+                )
         for child in body_node.children:
             walk_calls(child, func_nid)
 
@@ -4767,18 +5414,20 @@ def _cpp_preprocess(path: Path) -> bytes:
     """
     import shutil
     import subprocess
-    if not shutil.which("cpp"):
+
+    cpp = shutil.which("cpp")
+    if not cpp:
         return path.read_bytes()
     try:
-        result = subprocess.run(
-            ["cpp", "-w", "-P", "-nostdinc", "-I", "/dev/null", str(path)],
+        result = subprocess.run(  # nosec B603
+            [cpp, "-w", "-P", "-nostdinc", "-I", "/dev/null", str(path)],
             capture_output=True,
             timeout=30,
         )
         if result.returncode == 0 and result.stdout:
             return result.stdout
-    except Exception:
-        pass
+    except Exception as exc:
+        _LOG.debug("cpp preprocessing failed for %s: %s", path, exc)
     return path.read_bytes()
 
 
@@ -4808,22 +5457,30 @@ def extract_fortran(path: Path) -> dict:
     nodes: list[dict] = []
     edges: list[dict] = []
     seen_ids: set[str] = set()
-    scope_bodies: list[tuple[str, object]] = []
+    scope_bodies: list[tuple[str, Any]] = []
 
     def add_node(nid: str, label: str, line: int) -> None:
         if nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({
-                "id": nid,
-                "label": label,
-                "file_type": "code",
-                "source_file": str_path,
-                "source_location": f"L{line}",
-            })
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
 
-    def add_edge(src: str, tgt: str, relation: str, line: int,
-                 confidence: str = "EXTRACTED", weight: float = 1.0,
-                 context: str | None = None) -> None:
+    def add_edge(
+        src: str,
+        tgt: str,
+        relation: str,
+        line: int,
+        confidence: str = "EXTRACTED",
+        weight: float = 1.0,
+        context: str | None = None,
+    ) -> None:
         edge = {
             "source": src,
             "target": tgt,
@@ -4915,8 +5572,14 @@ def walk_calls(node, scope_nid: str) -> None:
             if name_node:
                 callee = _read_text(name_node, source).lower()
                 target_nid = _make_id(stem, callee)
-                add_edge(scope_nid, target_nid, "calls", node.start_point[0] + 1,
-                         confidence="EXTRACTED", context="call")
+                add_edge(
+                    scope_nid,
+                    target_nid,
+                    "calls",
+                    node.start_point[0] + 1,
+                    confidence="EXTRACTED",
+                    context="call",
+                )
         for child in node.children:
             walk_calls(child, scope_nid)
 
@@ -4997,7 +5660,9 @@ def walk(node, scope_nid: str) -> None:
         if t == "use_statement":
             line = node.start_point[0] + 1
             # tree-sitter-fortran uses module_name node for the used module
-            name_node = next((c for c in node.children if c.type in ("module_name", "name", "identifier")), None)
+            name_node = next(
+                (c for c in node.children if c.type in ("module_name", "name", "identifier")), None
+            )
             if name_node:
                 mod_name = _read_text(name_node, source).lower()
                 imp_nid = _make_id(mod_name)
@@ -5011,8 +5676,10 @@ def walk(node, scope_nid: str) -> None:
     walk(root, file_nid)
 
     _stmt_headers = {
-        "subroutine_statement", "function_statement",
-        "program_statement", "module_statement",
+        "subroutine_statement",
+        "function_statement",
+        "program_statement",
+        "module_statement",
     }
     for scope_nid, body_node in scope_bodies:
         for child in body_node.children:
@@ -5024,6 +5691,7 @@ def walk(node, scope_nid: str) -> None:
 
 # ── Go extractor (custom walk) ────────────────────────────────────────────────
 
+
 def extract_go(path: Path) -> dict:
     """Extract functions, methods, type declarations, and imports from a .go file."""
     try:
@@ -5055,17 +5723,25 @@ def extract_go(path: Path) -> dict:
     def add_node(nid: str, label: str, line: int) -> None:
         if nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({
-                "id": nid,
-                "label": label,
-                "file_type": "code",
-                "source_file": str_path,
-                "source_location": f"L{line}",
-            })
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
 
-    def add_edge(src: str, tgt: str, relation: str, line: int,
-                 confidence: str = "EXTRACTED", weight: float = 1.0,
-                 context: str | None = None) -> None:
+    def add_edge(
+        src: str,
+        tgt: str,
+        relation: str,
+        line: int,
+        confidence: str = "EXTRACTED",
+        weight: float = 1.0,
+        context: str | None = None,
+    ) -> None:
         edge = {
             "source": src,
             "target": tgt,
@@ -5263,10 +5939,18 @@ def walk(node) -> None:
                                 # Prefix with go_pkg_ so stdlib names (e.g. "context")
                                 # don't collide with local files of the same basename.
                                 tgt_nid = _make_id("go", "pkg", raw)
-                                add_edge(file_nid, tgt_nid, "imports_from", spec.start_point[0] + 1, context="import")
+                                add_edge(
+                                    file_nid,
+                                    tgt_nid,
+                                    "imports_from",
+                                    spec.start_point[0] + 1,
+                                    context="import",
+                                )
                                 # Track local name (alias or last path segment)
                                 alias = spec.child_by_field_name("name")
-                                local_name = _read_text(alias, source) if alias else raw.split("/")[-1]
+                                local_name = (
+                                    _read_text(alias, source) if alias else raw.split("/")[-1]
+                                )
                                 if local_name and local_name != "_" and local_name != ".":
                                     go_imported_pkgs.add(local_name)
                 elif child.type == "import_spec":
@@ -5274,7 +5958,13 @@ def walk(node) -> None:
                     if path_node:
                         raw = _read_text(path_node, source).strip('"')
                         tgt_nid = _make_id("go", "pkg", raw)
-                        add_edge(file_nid, tgt_nid, "imports_from", child.start_point[0] + 1, context="import")
+                        add_edge(
+                            file_nid,
+                            tgt_nid,
+                            "imports_from",
+                            child.start_point[0] + 1,
+                            context="import",
+                        )
                         alias = child.child_by_field_name("name")
                         local_name = _read_text(alias, source) if alias else raw.split("/")[-1]
                         if local_name and local_name != "_" and local_name != ".":
@@ -5321,24 +6011,28 @@ def walk_calls(node, caller_nid: str) -> None:
                     if pair not in seen_call_pairs:
                         seen_call_pairs.add(pair)
                         line = node.start_point[0] + 1
-                        edges.append({
-                            "source": caller_nid,
-                            "target": tgt_nid,
-                            "relation": "calls",
-                            "context": "call",
-                            "confidence": "EXTRACTED",
-                            "source_file": str_path,
-                            "source_location": f"L{line}",
-                            "weight": 1.0,
-                        })
+                        edges.append(
+                            {
+                                "source": caller_nid,
+                                "target": tgt_nid,
+                                "relation": "calls",
+                                "context": "call",
+                                "confidence": "EXTRACTED",
+                                "source_file": str_path,
+                                "source_location": f"L{line}",
+                                "weight": 1.0,
+                            }
+                        )
                 elif callee_name:
-                    raw_calls.append({
-                        "caller_nid": caller_nid,
-                        "callee": callee_name,
-                        "is_member_call": is_member_call,
-                        "source_file": str_path,
-                        "source_location": f"L{node.start_point[0] + 1}",
-                    })
+                    raw_calls.append(
+                        {
+                            "caller_nid": caller_nid,
+                            "callee": callee_name,
+                            "is_member_call": is_member_call,
+                            "source_file": str_path,
+                            "source_location": f"L{node.start_point[0] + 1}",
+                        }
+                    )
         for child in node.children:
             walk_calls(child, caller_nid)
 
@@ -5349,7 +6043,9 @@ def walk_calls(node, caller_nid: str) -> None:
     clean_edges = []
     for edge in edges:
         src, tgt = edge["source"], edge["target"]
-        if src in valid_ids and (tgt in valid_ids or edge["relation"] in ("imports", "imports_from")):
+        if src in valid_ids and (
+            tgt in valid_ids or edge["relation"] in ("imports", "imports_from")
+        ):
             clean_edges.append(edge)
 
     return {"nodes": nodes, "edges": clean_edges, "raw_calls": raw_calls}
@@ -5360,13 +6056,51 @@ def walk_calls(node, caller_nid: str) -> None:
 # Common Rust trait/stdlib method names that appear in virtually every codebase.
 # Resolving these cross-file produces spurious INFERRED edges across crate
 # boundaries (issue #908) — skip them from the unresolved-call queue entirely.
-_RUST_TRAIT_METHOD_BLOCKLIST: frozenset[str] = frozenset({
-    "new", "default", "parse", "from_str", "now", "clone", "into", "from",
-    "to_string", "to_owned", "len", "is_empty", "iter", "next", "build",
-    "start", "run", "init", "app", "get", "set", "push", "pop", "insert",
-    "remove", "contains", "collect", "map", "filter", "unwrap", "expect",
-    "ok", "err", "some", "none", "send", "recv", "lock", "read", "write",
-})
+_RUST_TRAIT_METHOD_BLOCKLIST: frozenset[str] = frozenset(
+    {
+        "new",
+        "default",
+        "parse",
+        "from_str",
+        "now",
+        "clone",
+        "into",
+        "from",
+        "to_string",
+        "to_owned",
+        "len",
+        "is_empty",
+        "iter",
+        "next",
+        "build",
+        "start",
+        "run",
+        "init",
+        "app",
+        "get",
+        "set",
+        "push",
+        "pop",
+        "insert",
+        "remove",
+        "contains",
+        "collect",
+        "map",
+        "filter",
+        "unwrap",
+        "expect",
+        "ok",
+        "err",
+        "some",
+        "none",
+        "send",
+        "recv",
+        "lock",
+        "read",
+        "write",
+    }
+)
+
 
 def extract_rust(path: Path) -> dict:
     """Extract functions, structs, enums, traits, impl methods, and use declarations from a .rs file."""
@@ -5395,17 +6129,25 @@ def extract_rust(path: Path) -> dict:
     def add_node(nid: str, label: str, line: int) -> None:
         if nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({
-                "id": nid,
-                "label": label,
-                "file_type": "code",
-                "source_file": str_path,
-                "source_location": f"L{line}",
-            })
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
 
-    def add_edge(src: str, tgt: str, relation: str, line: int,
-                 confidence: str = "EXTRACTED", weight: float = 1.0,
-                 context: str | None = None) -> None:
+    def add_edge(
+        src: str,
+        tgt: str,
+        relation: str,
+        line: int,
+        confidence: str = "EXTRACTED",
+        weight: float = 1.0,
+        context: str | None = None,
+    ) -> None:
         edge = {
             "source": src,
             "target": tgt,
@@ -5563,7 +6305,9 @@ def walk(node, parent_impl_nid: str | None = None) -> None:
                 module_name = clean.split("::")[-1].strip()
                 if module_name:
                     tgt_nid = _make_id(module_name)
-                    add_edge(file_nid, tgt_nid, "imports_from", node.start_point[0] + 1, context="import")
+                    add_edge(
+                        file_nid, tgt_nid, "imports_from", node.start_point[0] + 1, context="import"
+                    )
             return
 
         for child in node.children:
@@ -5611,24 +6355,28 @@ def walk_calls(node, caller_nid: str) -> None:
                     if pair not in seen_call_pairs:
                         seen_call_pairs.add(pair)
                         line = node.start_point[0] + 1
-                        edges.append({
-                            "source": caller_nid,
-                            "target": tgt_nid,
-                            "relation": "calls",
-                            "context": "call",
-                            "confidence": "EXTRACTED",
-                            "source_file": str_path,
-                            "source_location": f"L{line}",
-                            "weight": 1.0,
-                        })
+                        edges.append(
+                            {
+                                "source": caller_nid,
+                                "target": tgt_nid,
+                                "relation": "calls",
+                                "context": "call",
+                                "confidence": "EXTRACTED",
+                                "source_file": str_path,
+                                "source_location": f"L{line}",
+                                "weight": 1.0,
+                            }
+                        )
                 elif not is_scoped_call and callee_name.lower() not in _RUST_TRAIT_METHOD_BLOCKLIST:
-                    raw_calls.append({
-                        "caller_nid": caller_nid,
-                        "callee": callee_name,
-                        "is_member_call": is_member_call,
-                        "source_file": str_path,
-                        "source_location": f"L{node.start_point[0] + 1}",
-                    })
+                    raw_calls.append(
+                        {
+                            "caller_nid": caller_nid,
+                            "callee": callee_name,
+                            "is_member_call": is_member_call,
+                            "source_file": str_path,
+                            "source_location": f"L{node.start_point[0] + 1}",
+                        }
+                    )
         for child in node.children:
             walk_calls(child, caller_nid)
 
@@ -5639,7 +6387,9 @@ def walk_calls(node, caller_nid: str) -> None:
     clean_edges = []
     for edge in edges:
         src, tgt = edge["source"], edge["target"]
-        if src in valid_ids and (tgt in valid_ids or edge["relation"] in ("imports", "imports_from")):
+        if src in valid_ids and (
+            tgt in valid_ids or edge["relation"] in ("imports", "imports_from")
+        ):
             clean_edges.append(edge)
 
     return {"nodes": nodes, "edges": clean_edges, "raw_calls": raw_calls}
@@ -5647,6 +6397,7 @@ def walk_calls(node, caller_nid: str) -> None:
 
 # ── Zig ───────────────────────────────────────────────────────────────────────
 
+
 def extract_zig(path: Path) -> dict:
     """Extract functions, structs, enums, unions, and imports from a .zig file."""
     try:
@@ -5674,15 +6425,34 @@ def extract_zig(path: Path) -> dict:
     def add_node(nid: str, label: str, line: int) -> None:
         if nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({"id": nid, "label": label, "file_type": "code",
-                          "source_file": str_path, "source_location": f"L{line}"})
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
 
-    def add_edge(src: str, tgt: str, relation: str, line: int,
-                 confidence: str = "EXTRACTED", weight: float = 1.0,
-                 context: str | None = None) -> None:
-        edge = {"source": src, "target": tgt, "relation": relation,
-                "confidence": confidence, "source_file": str_path,
-                "source_location": f"L{line}", "weight": weight}
+    def add_edge(
+        src: str,
+        tgt: str,
+        relation: str,
+        line: int,
+        confidence: str = "EXTRACTED",
+        weight: float = 1.0,
+        context: str | None = None,
+    ) -> None:
+        edge = {
+            "source": src,
+            "target": tgt,
+            "relation": relation,
+            "confidence": confidence,
+            "source_file": str_path,
+            "source_location": f"L{line}",
+            "weight": weight,
+        }
         if context:
             edge["context"] = context
         edges.append(edge)
@@ -5707,8 +6477,7 @@ def _extract_import(node) -> None:
                             module_name = raw.split("/")[-1].split(".")[0]
                             if module_name:
                                 tgt_nid = _make_id(module_name)
-                                add_edge(file_nid, tgt_nid, "imports_from",
-                                         node.start_point[0] + 1)
+                                add_edge(file_nid, tgt_nid, "imports_from", node.start_point[0] + 1)
                             return
             elif child.type == "field_expression":
                 _extract_import(child)
@@ -5741,9 +6510,13 @@ def walk(node, parent_struct_nid: str | None = None) -> None:
             for child in node.children:
                 if child.type == "identifier":
                     name_node = child
-                elif child.type in ("struct_declaration", "enum_declaration",
-                                    "union_declaration", "builtin_function",
-                                    "field_expression"):
+                elif child.type in (
+                    "struct_declaration",
+                    "enum_declaration",
+                    "union_declaration",
+                    "builtin_function",
+                    "field_expression",
+                ):
                     value_node = child
 
             if value_node and value_node.type == "struct_declaration":
@@ -5787,36 +6560,48 @@ def walk_calls(node, caller_nid: str) -> None:
                 fn_text = _read_text(fn, source)
                 callee = fn_text.split(".")[-1]
                 is_member_call = "." in fn_text
-                tgt_nid = next((n["id"] for n in nodes if n["label"] in
-                                (f"{callee}()", f".{callee}()")), None)
+                tgt_nid = next(
+                    (n["id"] for n in nodes if n["label"] in (f"{callee}()", f".{callee}()")), None
+                )
                 if tgt_nid and tgt_nid != caller_nid:
                     pair = (caller_nid, tgt_nid)
                     if pair not in seen_call_pairs:
                         seen_call_pairs.add(pair)
-                        add_edge(caller_nid, tgt_nid, "calls",
-                                 node.start_point[0] + 1,
-                                 confidence="EXTRACTED", weight=1.0)
+                        add_edge(
+                            caller_nid,
+                            tgt_nid,
+                            "calls",
+                            node.start_point[0] + 1,
+                            confidence="EXTRACTED",
+                            weight=1.0,
+                        )
                 elif callee:
-                    raw_calls.append({
-                        "caller_nid": caller_nid,
-                        "callee": callee,
-                        "is_member_call": is_member_call,
-                        "source_file": str_path,
-                        "source_location": f"L{node.start_point[0] + 1}",
-                    })
+                    raw_calls.append(
+                        {
+                            "caller_nid": caller_nid,
+                            "callee": callee,
+                            "is_member_call": is_member_call,
+                            "source_file": str_path,
+                            "source_location": f"L{node.start_point[0] + 1}",
+                        }
+                    )
         for child in node.children:
             walk_calls(child, caller_nid)
 
     for caller_nid, body_node in function_bodies:
         walk_calls(body_node, caller_nid)
 
-    clean_edges = [e for e in edges if e["source"] in seen_ids and
-                   (e["target"] in seen_ids or e["relation"] == "imports_from")]
+    clean_edges = [
+        e
+        for e in edges
+        if e["source"] in seen_ids and (e["target"] in seen_ids or e["relation"] == "imports_from")
+    ]
     return {"nodes": nodes, "edges": clean_edges, "raw_calls": raw_calls}
 
 
 # ── PowerShell ────────────────────────────────────────────────────────────────
 
+
 def extract_powershell(path: Path) -> dict:
     """Extract functions, classes, methods, and using statements from a .ps1 file."""
     try:
@@ -5844,15 +6629,34 @@ def extract_powershell(path: Path) -> dict:
     def add_node(nid: str, label: str, line: int) -> None:
         if nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({"id": nid, "label": label, "file_type": "code",
-                          "source_file": str_path, "source_location": f"L{line}"})
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
 
-    def add_edge(src: str, tgt: str, relation: str, line: int,
-                 confidence: str = "EXTRACTED", weight: float = 1.0,
-                 context: str | None = None) -> None:
-        edge = {"source": src, "target": tgt, "relation": relation,
-                "confidence": confidence, "source_file": str_path,
-                "source_location": f"L{line}", "weight": weight}
+    def add_edge(
+        src: str,
+        tgt: str,
+        relation: str,
+        line: int,
+        confidence: str = "EXTRACTED",
+        weight: float = 1.0,
+        context: str | None = None,
+    ) -> None:
+        edge = {
+            "source": src,
+            "target": tgt,
+            "relation": relation,
+            "confidence": confidence,
+            "source_file": str_path,
+            "source_location": f"L{line}",
+            "weight": weight,
+        }
         if context:
             edge["context"] = context
         edges.append(edge)
@@ -5860,11 +6664,31 @@ def add_edge(src: str, tgt: str, relation: str, line: int,
     file_nid = _make_id(str(path))
     add_node(file_nid, path.name, 1)
 
-    _PS_SKIP = frozenset({
-        "using", "return", "if", "else", "elseif", "foreach", "for",
-        "while", "do", "switch", "try", "catch", "finally", "throw",
-        "break", "continue", "exit", "param", "begin", "process", "end",
-    })
+    _PS_SKIP = frozenset(
+        {
+            "using",
+            "return",
+            "if",
+            "else",
+            "elseif",
+            "foreach",
+            "for",
+            "while",
+            "do",
+            "switch",
+            "try",
+            "catch",
+            "finally",
+            "throw",
+            "break",
+            "continue",
+            "exit",
+            "param",
+            "begin",
+            "process",
+            "end",
+        }
+    )
 
     def _find_script_block_body(node):
         for child in node.children:
@@ -5993,12 +6817,14 @@ def walk(node, parent_class_nid: str | None = None) -> None:
                             for el in child.children:
                                 if el.type == "generic_token":
                                     tokens.append(_read_text(el, source))
-                    module_tokens = [t for t in tokens
-                                     if t.lower() not in ("namespace", "module", "assembly")]
+                    module_tokens = [
+                        t for t in tokens if t.lower() not in ("namespace", "module", "assembly")
+                    ]
                     if module_tokens:
                         module_name = module_tokens[-1].split(".")[-1]
-                        add_edge(file_nid, _make_id(module_name), "imports_from",
-                                 node.start_point[0] + 1)
+                        add_edge(
+                            file_nid, _make_id(module_name), "imports_from", node.start_point[0] + 1
+                        )
             return
 
         for child in node.children:
@@ -6023,30 +6849,41 @@ def walk_calls(node, caller_nid: str) -> None:
                         pair = (caller_nid, tgt_nid)
                         if pair not in seen_call_pairs:
                             seen_call_pairs.add(pair)
-                            add_edge(caller_nid, tgt_nid, "calls",
-                                     node.start_point[0] + 1,
-                                     confidence="EXTRACTED", weight=1.0)
+                            add_edge(
+                                caller_nid,
+                                tgt_nid,
+                                "calls",
+                                node.start_point[0] + 1,
+                                confidence="EXTRACTED",
+                                weight=1.0,
+                            )
                     elif cmd_text:
-                        raw_calls.append({
-                            "caller_nid": caller_nid,
-                            "callee": cmd_text,
-                            "is_member_call": False,
-                            "source_file": str_path,
-                            "source_location": f"L{node.start_point[0] + 1}",
-                        })
+                        raw_calls.append(
+                            {
+                                "caller_nid": caller_nid,
+                                "callee": cmd_text,
+                                "is_member_call": False,
+                                "source_file": str_path,
+                                "source_location": f"L{node.start_point[0] + 1}",
+                            }
+                        )
         for child in node.children:
             walk_calls(child, caller_nid)
 
     for caller_nid, body_node in function_bodies:
         walk_calls(body_node, caller_nid)
 
-    clean_edges = [e for e in edges if e["source"] in seen_ids and
-                   (e["target"] in seen_ids or e["relation"] == "imports_from")]
+    clean_edges = [
+        e
+        for e in edges
+        if e["source"] in seen_ids and (e["target"] in seen_ids or e["relation"] == "imports_from")
+    ]
     return {"nodes": nodes, "edges": clean_edges, "raw_calls": raw_calls}
 
 
 # ── Cross-file import resolution ──────────────────────────────────────────────
 
+
 def _source_key(source_file: str, root: Path) -> str:
     if not source_file:
         return ""
@@ -6094,8 +6931,7 @@ def _disambiguate_colliding_node_ids(
         if old_id in ambiguous_ids:
             continue
         candidates = {
-            node["id"] for node in group
-            if isinstance(node.get("id"), str) and node["id"] != old_id
+            node["id"] for node in group if isinstance(node.get("id"), str) and node["id"] != old_id
         }
         if len(candidates) == 1:
             unambiguous_remaps[old_id] = next(iter(candidates))
@@ -6288,13 +7124,15 @@ def ensure_symbol_node(path: Path, name: str, line: int) -> str:
             return existing
         node_id = _make_id(_file_stem(path), name)
         symbol_nodes[(resolved_path, name)] = node_id
-        nodes.append({
-            "id": node_id,
-            "label": name,
-            "file_type": "code",
-            "source_file": str(path),
-            "source_location": f"L{line}",
-        })
+        nodes.append(
+            {
+                "id": node_id,
+                "label": name,
+                "file_type": "code",
+                "source_file": str(path),
+                "source_location": f"L{line}",
+            }
+        )
         return node_id
 
     existing_edges = {
@@ -6307,21 +7145,25 @@ def ensure_symbol_node(path: Path, name: str, line: int) -> str:
         for edge in edges
     }
 
-    def add_edge(source: str, target: str, relation: str, context: str, line: int, source_path: Path) -> None:
+    def add_edge(
+        source: str, target: str, relation: str, context: str, line: int, source_path: Path
+    ) -> None:
         key = (source, target, relation, context or "")
         if key in existing_edges:
             return
         existing_edges.add(key)
-        edges.append({
-            "source": source,
-            "target": target,
-            "relation": relation,
-            "context": context,
-            "confidence": "EXTRACTED",
-            "source_file": str(source_path),
-            "source_location": f"L{line}",
-            "weight": 1.0,
-        })
+        edges.append(
+            {
+                "source": source,
+                "target": target,
+                "relation": relation,
+                "context": context,
+                "confidence": "EXTRACTED",
+                "source_file": str(source_path),
+                "source_location": f"L{line}",
+                "weight": 1.0,
+            }
+        )
 
     for declaration in facts.declarations:
         ensure_symbol_node(declaration.file_path, declaration.name, declaration.line)
@@ -6393,7 +7235,9 @@ def add_edge(source: str, target: str, relation: str, context: str, line: int, s
                     export_fact.file_path,
                 )
 
-    def resolve_exported_origin(target_path: Path, imported_name: str, seen: set[tuple[Path, str]] | None = None) -> tuple[Path, str]:
+    def resolve_exported_origin(
+        target_path: Path, imported_name: str, seen: set[tuple[Path, str]] | None = None
+    ) -> tuple[Path, str]:
         target_path = target_path.resolve()
         key = (target_path, imported_name)
         if seen is None:
@@ -6455,11 +7299,14 @@ def resolve_exported_origin(target_path: Path, imported_name: str, seen: set[tup
 def _parse_js_tree(path: Path):
     try:
         from tree_sitter import Language, Parser
+
         if path.suffix in (".ts", ".tsx"):
             import tree_sitter_typescript as tstypescript
+
             language = Language(tstypescript.language_typescript())
         else:
             import tree_sitter_javascript as tsjavascript
+
             language = Language(tsjavascript.language())
         source = path.read_bytes()
         parser = Parser(language)
@@ -6468,7 +7315,7 @@ def _parse_js_tree(path: Path):
         return None
 
 
-def _walk_js_tree(node):
+def _walk_js_tree(node: Any) -> Iterator[Any]:
     yield node
     for child in node.children:
         yield from _walk_js_tree(child)
@@ -6595,10 +7442,23 @@ def _js_call_identifier(node, source: bytes) -> str | None:
     return None
 
 
-_JS_PRIMITIVE_TYPES = frozenset({
-    "string", "number", "boolean", "any", "unknown", "void", "never",
-    "object", "null", "undefined", "bigint", "symbol", "this",
-})
+_JS_PRIMITIVE_TYPES = frozenset(
+    {
+        "string",
+        "number",
+        "boolean",
+        "any",
+        "unknown",
+        "void",
+        "never",
+        "object",
+        "null",
+        "undefined",
+        "bigint",
+        "symbol",
+        "this",
+    }
+)
 
 
 def _ts_heritage_clause_entries(clause_node, source: bytes) -> list[str]:
@@ -6678,24 +7538,31 @@ def _ts_collect_type_refs(node, source: bytes, generic: bool, out: list[tuple[st
                 _ts_collect_type_refs(c, source, generic, out)
 
 
-def _ts_walk_class_members(class_node, source: bytes, path: Path, class_nid: str,
-                            facts: _SymbolResolutionFacts) -> None:
+def _ts_walk_class_members(
+    class_node, source: bytes, path: Path, class_nid: str, facts: _SymbolResolutionFacts
+) -> None:
     """Emit type-relation and type-reference use facts for a class declaration node."""
-    line = class_node.start_point[0] + 1
     for child in class_node.children:
         if child.type == "class_heritage":
             for clause in child.children:
                 if clause.type == "extends_clause":
                     for name in _ts_heritage_clause_entries(clause, source):
                         facts.uses.append(
-                            _SymbolUseFact(path, class_nid, name, "inherits", "type",
-                                           clause.start_point[0] + 1)
+                            _SymbolUseFact(
+                                path, class_nid, name, "inherits", "type", clause.start_point[0] + 1
+                            )
                         )
                 elif clause.type == "implements_clause":
                     for name in _ts_heritage_clause_entries(clause, source):
                         facts.uses.append(
-                            _SymbolUseFact(path, class_nid, name, "implements", "type",
-                                           clause.start_point[0] + 1)
+                            _SymbolUseFact(
+                                path,
+                                class_nid,
+                                name,
+                                "implements",
+                                "type",
+                                clause.start_point[0] + 1,
+                            )
                         )
 
     body = class_node.child_by_field_name("body")
@@ -6746,15 +7613,12 @@ def _ts_walk_class_members(class_node, source: bytes, path: Path, class_nid: str
             _ts_collect_type_refs(type_anno, source, False, refs)
             for name, role in refs:
                 ctx = "generic_arg" if role == "generic_arg" else "field"
-                facts.uses.append(
-                    _SymbolUseFact(path, class_nid, name, "references", ctx, m_line)
-                )
+                facts.uses.append(_SymbolUseFact(path, class_nid, name, "references", ctx, m_line))
 
 
 def _collect_js_symbol_resolution_facts(paths: list[Path], facts: _SymbolResolutionFacts) -> None:
     js_paths = [
-        path for path in paths
-        if path.suffix in _JS_CACHE_BYPASS_SUFFIXES and path.suffix != ".vue"
+        path for path in paths if path.suffix in _JS_CACHE_BYPASS_SUFFIXES and path.suffix != ".vue"
     ]
     if not js_paths:
         return
@@ -6798,9 +7662,7 @@ def _collect_js_symbol_resolution_facts(paths: list[Path], facts: _SymbolResolut
 
         for node in _walk_js_tree(root_node):
             for alias, target in _js_lexical_aliases(node, source):
-                facts.aliases.append(
-                    _SymbolAliasFact(path, alias, target, node.start_point[0] + 1)
-                )
+                facts.aliases.append(_SymbolAliasFact(path, alias, target, node.start_point[0] + 1))
 
     for path in js_paths:
         resolved_path = path.resolve()
@@ -6913,6 +7775,7 @@ def _parse_python_tree(path: Path):
     try:
         from tree_sitter import Language, Parser
         import tree_sitter_python as tspython
+
         source = path.read_bytes()
         parser = Parser(Language(tspython.language()))
         return source, parser.parse(source).root_node
@@ -6920,7 +7783,7 @@ def _parse_python_tree(path: Path):
         return None
 
 
-def _walk_python_tree(node):
+def _walk_python_tree(node: Any) -> Iterator[Any]:
     yield node
     for child in node.children:
         yield from _walk_python_tree(child)
@@ -6966,12 +7829,16 @@ def _python_imported_names(node, source: bytes) -> list[tuple[str, str]]:
             if name_node is None:
                 continue
             name = _read_text(name_node, source)
-            local = _read_text(alias_node, source) if alias_node is not None else name.split(".")[-1]
+            local = (
+                _read_text(alias_node, source) if alias_node is not None else name.split(".")[-1]
+            )
             names.append((name, local))
     return names
 
 
-def _resolve_python_module_path(module_name: str, current_path: Path, root: Path, level: int) -> Path | None:
+def _resolve_python_module_path(
+    module_name: str, current_path: Path, root: Path, level: int
+) -> Path | None:
     if level > 0:
         base = current_path.parent
         for _ in range(level - 1):
@@ -6992,7 +7859,9 @@ def _resolve_python_module_path(module_name: str, current_path: Path, root: Path
     return None
 
 
-def _python_top_level_function_bodies(path: Path, root_node, source: bytes) -> list[tuple[str, object]]:
+def _python_top_level_function_bodies(
+    path: Path, root_node, source: bytes
+) -> list[tuple[str, object]]:
     bodies: list[tuple[str, object]] = []
     stem = _file_stem(path)
     for node in root_node.children:
@@ -7163,7 +8032,7 @@ def _resolve_cross_file_imports(
 
     # Pass 2: for each file, find `from .X import A, B, C` and resolve
     new_edges: list[dict] = []
-    stem_to_path: dict[str, Path] = {_file_stem(p): p for p in paths}
+    {_file_stem(p): p for p in paths}
 
     for file_result, path in zip(per_file, paths):
         stem = _file_stem(path)
@@ -7173,7 +8042,8 @@ def _resolve_cross_file_imports(
         # Excludes rationale nodes whose labels happen not to end in ")" or ".py"
         # but which must never be treated as importing entities (#563).
         local_classes = [
-            n["id"] for n in file_result.get("nodes", [])
+            n["id"]
+            for n in file_result.get("nodes", [])
             if n.get("source_file") == str_path
             and not n["label"].endswith((")", ".py"))
             and n["id"] != _make_id(stem)  # exclude file-level node
@@ -7186,7 +8056,8 @@ def _resolve_cross_file_imports(
         try:
             source = path.read_bytes()
             tree = parser.parse(source)
-        except Exception:
+        except Exception as exc:
+            _LOG.debug("python import parse failed for %s: %s", path, exc)
             continue
 
         def walk_imports(node) -> None:
@@ -7203,7 +8074,9 @@ def walk_imports(node) -> None:
                     if child.type == "relative_import":
                         for sub in child.children:
                             if sub.type == "dotted_name":
-                                raw = source[sub.start_byte:sub.end_byte].decode("utf-8", errors="replace")
+                                raw = source[sub.start_byte : sub.end_byte].decode(
+                                    "utf-8", errors="replace"
+                                )
                                 bare = raw.split(".")[-1]
                                 # Resolve relative import to exact qualified stem.
                                 candidate = path.parent / f"{bare}.py"
@@ -7211,7 +8084,9 @@ def walk_imports(node) -> None:
                                 break
                         break
                     if child.type == "dotted_name" and target_fq is None:
-                        raw = source[child.start_byte:child.end_byte].decode("utf-8", errors="replace")
+                        raw = source[child.start_byte : child.end_byte].decode(
+                            "utf-8", errors="replace"
+                        )
                         bare = raw.split(".")[-1]
                         target_fq = bare_to_qualified.get(bare)
 
@@ -7230,14 +8105,18 @@ def walk_imports(node) -> None:
                         continue
                     if child.type == "dotted_name":
                         imported_names.append(
-                            source[child.start_byte:child.end_byte].decode("utf-8", errors="replace")
+                            source[child.start_byte : child.end_byte].decode(
+                                "utf-8", errors="replace"
+                            )
                         )
                     elif child.type == "aliased_import":
                         # `import X as Y` - take the original name
                         name_node = child.child_by_field_name("name")
                         if name_node:
                             imported_names.append(
-                                source[name_node.start_byte:name_node.end_byte].decode("utf-8", errors="replace")
+                                source[name_node.start_byte : name_node.end_byte].decode(
+                                    "utf-8", errors="replace"
+                                )
                             )
 
                 line = node.start_point[0] + 1
@@ -7245,15 +8124,17 @@ def walk_imports(node) -> None:
                     tgt_nid = stem_to_entities[target_fq].get(name)
                     if tgt_nid:
                         for src_class_nid in local_classes:
-                            new_edges.append({
-                                "source": src_class_nid,
-                                "target": tgt_nid,
-                                "relation": "uses",
-                                "confidence": "INFERRED",
-                                "source_file": str_path,
-                                "source_location": f"L{line}",
-                                "weight": 0.8,
-                            })
+                            new_edges.append(
+                                {
+                                    "source": src_class_nid,
+                                    "target": tgt_nid,
+                                    "relation": "uses",
+                                    "confidence": "INFERRED",
+                                    "source_file": str_path,
+                                    "source_location": f"L{line}",
+                                    "weight": 0.8,
+                                }
+                            )
             for child in node.children:
                 walk_imports(child)
 
@@ -7318,8 +8199,10 @@ def _merge_swift_extensions(
     rewritten: list[dict] = []
     seen_keys: set[tuple] = set()
     for e in all_edges:
-        src = remap.get(e.get("source"), e.get("source"))
-        tgt = remap.get(e.get("target"), e.get("target"))
+        raw_src = e.get("source")
+        raw_tgt = e.get("target")
+        src = remap.get(raw_src, raw_src) if isinstance(raw_src, str) else raw_src
+        tgt = remap.get(raw_tgt, raw_tgt) if isinstance(raw_tgt, str) else raw_tgt
         if src == tgt:
             continue
         e["source"] = src
@@ -7374,15 +8257,16 @@ def _resolve_cross_file_java_imports(
         try:
             source = path.read_bytes()
             tree = parser.parse(source)
-        except Exception:
+        except Exception as exc:
+            _LOG.debug("java import parse failed for %s: %s", path, exc)
             continue
 
         def walk(n) -> None:
             if n.type == "import_declaration":
                 raw = _read_text(n, source).strip()
-                body = raw[len("import"):].strip().rstrip(";").strip()
+                body = raw[len("import") :].strip().rstrip(";").strip()
                 if body.startswith("static "):
-                    body = body[len("static "):].strip()
+                    body = body[len("static ") :].strip()
                 if body.endswith(".*"):
                     return
                 parts = body.split(".")
@@ -7399,16 +8283,18 @@ def walk(n) -> None:
                     if key in seen_pairs:
                         continue
                     seen_pairs.add(key)
-                    new_edges.append({
-                        "source": file_nid,
-                        "target": tgt_nid,
-                        "relation": "imports",
-                        "confidence": "EXTRACTED",
-                        "confidence_score": 1.0,
-                        "source_file": str(path),
-                        "source_location": f"L{at_line}",
-                        "weight": 1.0,
-                    })
+                    new_edges.append(
+                        {
+                            "source": file_nid,
+                            "target": tgt_nid,
+                            "relation": "imports",
+                            "confidence": "EXTRACTED",
+                            "confidence_score": 1.0,
+                            "source_file": str(path),
+                            "source_location": f"L{at_line}",
+                            "weight": 1.0,
+                        }
+                    )
             for child in n.children:
                 walk(child)
 
@@ -7444,15 +8330,34 @@ def extract_objc(path: Path) -> dict:
     def add_node(nid: str, label: str, line: int) -> None:
         if nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({"id": nid, "label": label, "file_type": "code",
-                          "source_file": str_path, "source_location": f"L{line}"})
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
 
-    def add_edge(src: str, tgt: str, relation: str, line: int,
-                 confidence: str = "EXTRACTED", weight: float = 1.0,
-                 context: str | None = None) -> None:
-        edge = {"source": src, "target": tgt, "relation": relation,
-                "confidence": confidence, "source_file": str_path,
-                "source_location": f"L{line}", "weight": weight}
+    def add_edge(
+        src: str,
+        tgt: str,
+        relation: str,
+        line: int,
+        confidence: str = "EXTRACTED",
+        weight: float = 1.0,
+        context: str | None = None,
+    ) -> None:
+        edge = {
+            "source": src,
+            "target": tgt,
+            "relation": relation,
+            "confidence": confidence,
+            "source_file": str_path,
+            "source_location": f"L{line}",
+            "weight": weight,
+        }
         if context:
             edge["context"] = context
         edges.append(edge)
@@ -7461,7 +8366,7 @@ def add_edge(src: str, tgt: str, relation: str, line: int,
     add_node(file_nid, path.name, 1)
 
     def _read(node) -> str:
-        return source[node.start_byte:node.end_byte].decode("utf-8", errors="replace")
+        return source[node.start_byte : node.end_byte].decode("utf-8", errors="replace")
 
     def _get_name(node, field: str) -> str | None:
         n = node.child_by_field_name(field)
@@ -7609,6 +8514,7 @@ def walk(node, parent_nid: str | None = None) -> None:
     all_method_nids = {n["id"] for n in nodes if n["id"] != file_nid}
     seen_calls: set[tuple[str, str]] = set()
     for caller_nid, body_node in method_bodies:
+
         def walk_calls(n) -> None:
             if n.type == "message_expression":
                 # [receiver selector]
@@ -7629,10 +8535,18 @@ def walk_calls(n) -> None:
                                 pair = (caller_nid, candidate)
                                 if pair not in seen_calls and caller_nid != candidate:
                                     seen_calls.add(pair)
-                                    add_edge(caller_nid, candidate, "calls", body_node.start_point[0] + 1,
-                                             confidence="EXTRACTED", weight=1.0, context="call")
+                                    add_edge(
+                                        caller_nid,
+                                        candidate,
+                                        "calls",
+                                        body_node.start_point[0] + 1,
+                                        confidence="EXTRACTED",
+                                        weight=1.0,
+                                        context="call",
+                                    )
             for child in n.children:
                 walk_calls(child)
+
         walk_calls(body_node)
 
     return {"nodes": nodes, "edges": edges, "input_tokens": 0, "output_tokens": 0}
@@ -7665,15 +8579,34 @@ def extract_elixir(path: Path) -> dict:
     def add_node(nid: str, label: str, line: int) -> None:
         if nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({"id": nid, "label": label, "file_type": "code",
-                          "source_file": str_path, "source_location": f"L{line}"})
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
 
-    def add_edge(src: str, tgt: str, relation: str, line: int,
-                 confidence: str = "EXTRACTED", weight: float = 1.0,
-                 context: str | None = None) -> None:
-        edge = {"source": src, "target": tgt, "relation": relation,
-                "confidence": confidence, "source_file": str_path,
-                "source_location": f"L{line}", "weight": weight}
+    def add_edge(
+        src: str,
+        tgt: str,
+        relation: str,
+        line: int,
+        confidence: str = "EXTRACTED",
+        weight: float = 1.0,
+        context: str | None = None,
+    ) -> None:
+        edge = {
+            "source": src,
+            "target": tgt,
+            "relation": relation,
+            "confidence": confidence,
+            "source_file": str_path,
+            "source_location": f"L{line}",
+            "weight": weight,
+        }
         if context:
             edge["context"] = context
         edges.append(edge)
@@ -7686,7 +8619,7 @@ def add_edge(src: str, tgt: str, relation: str, line: int,
     def _get_alias_text(node) -> str | None:
         for child in node.children:
             if child.type == "alias":
-                return source[child.start_byte:child.end_byte].decode("utf-8", errors="replace")
+                return source[child.start_byte : child.end_byte].decode("utf-8", errors="replace")
         return None
 
     def walk(node, parent_module_nid: str | None = None) -> None:
@@ -7711,7 +8644,9 @@ def walk(node, parent_module_nid: str | None = None) -> None:
                 walk(child, parent_module_nid)
             return
 
-        keyword = source[identifier_node.start_byte:identifier_node.end_byte].decode("utf-8", errors="replace")
+        keyword = source[identifier_node.start_byte : identifier_node.end_byte].decode(
+            "utf-8", errors="replace"
+        )
         line = node.start_point[0] + 1
 
         if keyword == "defmodule":
@@ -7733,10 +8668,14 @@ def walk(node, parent_module_nid: str | None = None) -> None:
                     if child.type == "call":
                         for sub in child.children:
                             if sub.type == "identifier":
-                                func_name = source[sub.start_byte:sub.end_byte].decode("utf-8", errors="replace")
+                                func_name = source[sub.start_byte : sub.end_byte].decode(
+                                    "utf-8", errors="replace"
+                                )
                                 break
                     elif child.type == "identifier":
-                        func_name = source[child.start_byte:child.end_byte].decode("utf-8", errors="replace")
+                        func_name = source[child.start_byte : child.end_byte].decode(
+                            "utf-8", errors="replace"
+                        )
                         break
             if not func_name:
                 return
@@ -7770,12 +8709,29 @@ def walk(node, parent_module_nid: str | None = None) -> None:
 
     seen_call_pairs: set[tuple[str, str]] = set()
     raw_calls: list[dict] = []
-    _SKIP_KEYWORDS = frozenset({
-        "def", "defp", "defmodule", "defmacro", "defmacrop",
-        "defstruct", "defprotocol", "defimpl", "defguard",
-        "alias", "import", "require", "use",
-        "if", "unless", "case", "cond", "with", "for",
-    })
+    _SKIP_KEYWORDS = frozenset(
+        {
+            "def",
+            "defp",
+            "defmodule",
+            "defmacro",
+            "defmacrop",
+            "defstruct",
+            "defprotocol",
+            "defimpl",
+            "defguard",
+            "alias",
+            "import",
+            "require",
+            "use",
+            "if",
+            "unless",
+            "case",
+            "cond",
+            "with",
+            "for",
+        }
+    )
 
     def walk_calls(node, caller_nid: str) -> None:
         if node.type != "call":
@@ -7784,7 +8740,7 @@ def walk_calls(node, caller_nid: str) -> None:
             return
         for child in node.children:
             if child.type == "identifier":
-                kw = source[child.start_byte:child.end_byte].decode("utf-8", errors="replace")
+                kw = source[child.start_byte : child.end_byte].decode("utf-8", errors="replace")
                 if kw in _SKIP_KEYWORDS:
                     for c in node.children:
                         walk_calls(c, caller_nid)
@@ -7795,13 +8751,17 @@ def walk_calls(node, caller_nid: str) -> None:
         for child in node.children:
             if child.type == "dot":
                 is_member_call = True
-                dot_text = source[child.start_byte:child.end_byte].decode("utf-8", errors="replace")
+                dot_text = source[child.start_byte : child.end_byte].decode(
+                    "utf-8", errors="replace"
+                )
                 parts = dot_text.rstrip(".").split(".")
                 if parts:
                     callee_name = parts[-1]
                 break
             if child.type == "identifier":
-                callee_name = source[child.start_byte:child.end_byte].decode("utf-8", errors="replace")
+                callee_name = source[child.start_byte : child.end_byte].decode(
+                    "utf-8", errors="replace"
+                )
                 break
         if callee_name and callee_name not in _LANGUAGE_BUILTIN_GLOBALS:
             tgt_nid = label_to_nid.get(callee_name)
@@ -7809,26 +8769,43 @@ def walk_calls(node, caller_nid: str) -> None:
                 pair = (caller_nid, tgt_nid)
                 if pair not in seen_call_pairs:
                     seen_call_pairs.add(pair)
-                    add_edge(caller_nid, tgt_nid, "calls",
-                             node.start_point[0] + 1, confidence="EXTRACTED", weight=1.0,
-                             context="call")
+                    add_edge(
+                        caller_nid,
+                        tgt_nid,
+                        "calls",
+                        node.start_point[0] + 1,
+                        confidence="EXTRACTED",
+                        weight=1.0,
+                        context="call",
+                    )
             else:
-                raw_calls.append({
-                    "caller_nid": caller_nid,
-                    "callee": callee_name,
-                    "is_member_call": is_member_call,
-                    "source_file": str_path,
-                    "source_location": f"L{node.start_point[0] + 1}",
-                })
+                raw_calls.append(
+                    {
+                        "caller_nid": caller_nid,
+                        "callee": callee_name,
+                        "is_member_call": is_member_call,
+                        "source_file": str_path,
+                        "source_location": f"L{node.start_point[0] + 1}",
+                    }
+                )
         for child in node.children:
             walk_calls(child, caller_nid)
 
     for caller_nid, body in function_bodies:
         walk_calls(body, caller_nid)
 
-    clean_edges = [e for e in edges if e["source"] in seen_ids and
-                   (e["target"] in seen_ids or e["relation"] == "imports")]
-    return {"nodes": nodes, "edges": clean_edges, "raw_calls": raw_calls, "input_tokens": 0, "output_tokens": 0}
+    clean_edges = [
+        e
+        for e in edges
+        if e["source"] in seen_ids and (e["target"] in seen_ids or e["relation"] == "imports")
+    ]
+    return {
+        "nodes": nodes,
+        "edges": clean_edges,
+        "raw_calls": raw_calls,
+        "input_tokens": 0,
+        "output_tokens": 0,
+    }
 
 
 def extract_markdown(path: Path) -> dict:
@@ -7864,14 +8841,35 @@ def extract_markdown(path: Path) -> dict:
     def add_node(nid: str, label: str, line: int, file_type: str = "document") -> None:
         if nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({"id": nid, "label": label, "file_type": file_type,
-                          "source_file": str_path, "source_location": f"L{line}"})
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": file_type,
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
 
-    def add_edge(src: str, tgt: str, relation: str, line: int,
-                 confidence: str = "EXTRACTED", weight: float = 1.0) -> None:
-        edges.append({"source": src, "target": tgt, "relation": relation,
-                      "confidence": confidence, "source_file": str_path,
-                      "source_location": f"L{line}", "weight": weight})
+    def add_edge(
+        src: str,
+        tgt: str,
+        relation: str,
+        line: int,
+        confidence: str = "EXTRACTED",
+        weight: float = 1.0,
+    ) -> None:
+        edges.append(
+            {
+                "source": src,
+                "target": tgt,
+                "relation": relation,
+                "confidence": confidence,
+                "source_file": str_path,
+                "source_location": f"L{line}",
+                "weight": weight,
+            }
+        )
 
     file_nid = _make_id(str(path))
     add_node(file_nid, path.name, 1)
@@ -7895,7 +8893,7 @@ def add_edge(src: str, tgt: str, relation: str, line: int,
             continue
 
         # Detect headings: # Heading, ## Heading, etc.
-        heading_match = re.match(r'^(#{1,6})\s+(.+)', line_text)
+        heading_match = re.match(r"^(#{1,6})\s+(.+)", line_text)
         if heading_match:
             level = len(heading_match.group(1))
             title = heading_match.group(2).strip()
@@ -8041,32 +9039,98 @@ def _pascal_resolve_class(from_path: Path, class_name: str) -> str | None:
     r"\s*;",
     re.IGNORECASE,
 )
-_PAS_BEGIN_END_TOKEN_RE = re.compile(
-    r"\b(begin|end|case|try|asm|record)\b", re.IGNORECASE
-)
+_PAS_BEGIN_END_TOKEN_RE = re.compile(r"\b(begin|end|case|try|asm|record)\b", re.IGNORECASE)
 _PAS_CALL_RE = re.compile(r"\b([A-Za-z_]\w*(?:\.[A-Za-z_]\w*)*)\s*[(;]")
-_PAS_KEYWORDS = frozenset({
-    "begin", "end", "if", "then", "else", "while", "do", "for", "to",
-    "downto", "repeat", "until", "case", "of", "try", "finally", "except",
-    "with", "inherited", "result", "var", "const", "type", "nil", "true",
-    "false", "exit", "break", "continue", "uses", "unit", "program",
-    "library", "interface", "implementation", "initialization", "finalization",
-    "procedure", "function", "constructor", "destructor", "class", "record",
-    "object", "array", "string", "integer", "boolean", "real", "char",
-    "writeln", "write", "readln", "read", "assigned", "length", "high",
-    "low", "inc", "dec", "new", "dispose", "setlength", "copy", "pos",
-    "trim", "format", "inttostr", "strtoint", "ord", "chr", "sizeof",
-    "create", "free", "destroy",
-})
+_PAS_KEYWORDS = frozenset(
+    {
+        "begin",
+        "end",
+        "if",
+        "then",
+        "else",
+        "while",
+        "do",
+        "for",
+        "to",
+        "downto",
+        "repeat",
+        "until",
+        "case",
+        "of",
+        "try",
+        "finally",
+        "except",
+        "with",
+        "inherited",
+        "result",
+        "var",
+        "const",
+        "type",
+        "nil",
+        "true",
+        "false",
+        "exit",
+        "break",
+        "continue",
+        "uses",
+        "unit",
+        "program",
+        "library",
+        "interface",
+        "implementation",
+        "initialization",
+        "finalization",
+        "procedure",
+        "function",
+        "constructor",
+        "destructor",
+        "class",
+        "record",
+        "object",
+        "array",
+        "string",
+        "integer",
+        "boolean",
+        "real",
+        "char",
+        "writeln",
+        "write",
+        "readln",
+        "read",
+        "assigned",
+        "length",
+        "high",
+        "low",
+        "inc",
+        "dec",
+        "new",
+        "dispose",
+        "setlength",
+        "copy",
+        "pos",
+        "trim",
+        "format",
+        "inttostr",
+        "strtoint",
+        "ord",
+        "chr",
+        "sizeof",
+        "create",
+        "free",
+        "destroy",
+    }
+)
 
 
 def _pascal_strip_comments(text: str) -> str:
     """Strip Pascal comments ({}, (* *), //) while preserving newlines."""
+
     def _sub(m: re.Match) -> str:
         tok = m.group(0)
         if tok.startswith("'"):
             return tok
         return "".join(c if c == "\n" else " " for c in tok)
+
     return _PAS_TOKEN_RE.sub(_sub, text)
 
 
@@ -8080,11 +9144,9 @@ def _pascal_split_sections(text: str) -> tuple[str, int, str, int]:
     if iface_m and impl_m:
         iface_off = iface_m.end()
         impl_off = impl_m.end()
-        end_m = re.search(
-            r"\b(initialization|finalization)\b", text[impl_off:], re.IGNORECASE
-        )
+        end_m = re.search(r"\b(initialization|finalization)\b", text[impl_off:], re.IGNORECASE)
         impl_end = impl_off + end_m.start() if end_m else len(text)
-        return text[iface_off:impl_m.start()], iface_off, text[impl_off:impl_end], impl_off
+        return text[iface_off : impl_m.start()], iface_off, text[impl_off:impl_end], impl_off
     return "", 0, text, 0
 
 
@@ -8161,13 +9223,15 @@ def _extract_pascal_regex(path: Path) -> dict:
     def _add_node(nid: str, label: str, line: int) -> None:
         if nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({
-                "id": nid,
-                "label": label,
-                "file_type": "code",
-                "source_file": str_path,
-                "source_location": f"L{line}",
-            })
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
 
     def _add_edge(src: str, tgt: str, relation: str, line: int, context: str | None = None) -> None:
         edge: dict = {
@@ -8234,7 +9298,7 @@ def _lineno(text: str, offset: int) -> int:
 
         # Find class body (up to next end;)
         end_m = _PAS_END_SEMI_RE.search(search_text, hm.end())
-        body_text = search_text[hm.end():end_m.start()] if end_m else ""
+        body_text = search_text[hm.end() : end_m.start()] if end_m else ""
         body_off = search_off + hm.end()
 
         # Forward method declarations inside the class body
@@ -8315,7 +9379,7 @@ def extract_pascal(path: Path) -> dict:
     extraction works out of the box without an extra pip install.
     """
     try:
-        import tree_sitter_pascal as tspascal
+        import tree_sitter_pascal as tspascal  # type: ignore[reportMissingImports]
         from tree_sitter import Language, Parser
     except ImportError:
         return _extract_pascal_regex(path)
@@ -8337,25 +9401,38 @@ def extract_pascal(path: Path) -> dict:
     proc_bodies: list[tuple[str, Any]] = []
 
     def _read(node) -> str:  # type: ignore[no-untyped-def]
-        return source[node.start_byte:node.end_byte].decode("utf-8", errors="replace")
+        return source[node.start_byte : node.end_byte].decode("utf-8", errors="replace")
 
     def add_node(nid: str, label: str, line: int) -> None:
         if nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({
-                "id": nid, "label": label, "file_type": "code",
-                "source_file": str_path, "source_location": f"L{line}",
-            })
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
 
     def add_edge(
-        src: str, tgt: str, relation: str, line: int,
-        confidence: str = "EXTRACTED", weight: float = 1.0,
+        src: str,
+        tgt: str,
+        relation: str,
+        line: int,
+        confidence: str = "EXTRACTED",
+        weight: float = 1.0,
         context: str | None = None,
     ) -> None:
         edge: dict[str, Any] = {
-            "source": src, "target": tgt, "relation": relation,
-            "confidence": confidence, "source_file": str_path,
-            "source_location": f"L{line}", "weight": weight,
+            "source": src,
+            "target": tgt,
+            "relation": relation,
+            "confidence": confidence,
+            "source_file": str_path,
+            "source_location": f"L{line}",
+            "weight": weight,
         }
         if context:
             edge["context"] = context
@@ -8463,7 +9540,8 @@ def walk(node, parent_nid: str) -> None:  # type: ignore[no-untyped-def]
             proc_nid = _make_id(stem, name)
             add_node(proc_nid, label, line)
             add_edge(
-                container, proc_nid,
+                container,
+                proc_nid,
                 "method" if container != parent_nid else "contains",
                 line,
             )
@@ -8478,8 +9556,7 @@ def walk(node, parent_nid: str) -> None:  # type: ignore[no-untyped-def]
 
     # Second pass: resolve calls inside procedure/function bodies
     all_procs: dict[str, str] = {
-        n["label"].removesuffix("()").lower(): n["id"]
-        for n in nodes if n["id"] != file_nid
+        n["label"].removesuffix("()").lower(): n["id"] for n in nodes if n["id"] != file_nid
     }
     seen_call_pairs: set[tuple[str, str]] = set()
 
@@ -8497,8 +9574,11 @@ def walk_calls(node, caller_nid: str) -> None:  # type: ignore[no-untyped-def]
                     if pair not in seen_call_pairs:
                         seen_call_pairs.add(pair)
                         add_edge(
-                            caller_nid, callee_nid, "calls",
-                            node.start_point[0] + 1, context="call",
+                            caller_nid,
+                            callee_nid,
+                            "calls",
+                            node.start_point[0] + 1,
+                            context="call",
                         )
         elif node.type == "statement":
             # Pascal bare procedure calls with no args: `Reset;`
@@ -8512,8 +9592,11 @@ def walk_calls(node, caller_nid: str) -> None:  # type: ignore[no-untyped-def]
                     if pair not in seen_call_pairs:
                         seen_call_pairs.add(pair)
                         add_edge(
-                            caller_nid, callee_nid, "calls",
-                            node.start_point[0] + 1, context="call",
+                            caller_nid,
+                            callee_nid,
+                            "calls",
+                            node.start_point[0] + 1,
+                            context="call",
                         )
         for child in node.children:
             walk_calls(child, caller_nid)
@@ -8552,6 +9635,7 @@ def extract_lazarus_form(path: Path) -> dict:
         return {"nodes": [], "edges": [], "error": str(e)}
 
     import re
+
     str_path = str(path)
     stem = _file_stem(path)
     nodes: list[dict] = []
@@ -8562,13 +9646,21 @@ def extract_lazarus_form(path: Path) -> dict:
     def add_node(nid: str, label: str, line: int) -> None:
         if nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({
-                "id": nid, "label": label, "file_type": "code",
-                "source_file": str_path, "source_location": f"L{line}",
-            })
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
 
     def add_edge(
-        src: str, tgt: str, relation: str, line: int,
+        src: str,
+        tgt: str,
+        relation: str,
+        line: int,
         context: str | None = None,
     ) -> None:
         key = (src, tgt, relation)
@@ -8576,9 +9668,13 @@ def add_edge(
             return
         seen_edge_pairs.add(key)
         edge: dict[str, Any] = {
-            "source": src, "target": tgt, "relation": relation,
-            "confidence": "EXTRACTED", "source_file": str_path,
-            "source_location": f"L{line}", "weight": 1.0,
+            "source": src,
+            "target": tgt,
+            "relation": relation,
+            "confidence": "EXTRACTED",
+            "source_file": str_path,
+            "source_location": f"L{line}",
+            "weight": 1.0,
         }
         if context:
             edge["context"] = context
@@ -8641,7 +9737,8 @@ def extract_delphi_form(path: Path) -> dict:
     # Detect binary DFM: Delphi binary resource streams start with FF 0A
     if raw[:2] == b"\xff\x0a":
         return {
-            "nodes": [], "edges": [],
+            "nodes": [],
+            "edges": [],
             "error": f"binary DFM (convert to text in Delphi IDE to index): {path.name}",
         }
 
@@ -8652,6 +9749,7 @@ def extract_delphi_form(path: Path) -> dict:
         return {"nodes": [], "edges": [], "error": str(e)}
 
     import re
+
     str_path = str(path)
     stem = _file_stem(path)
     nodes: list[dict] = []
@@ -8662,13 +9760,21 @@ def extract_delphi_form(path: Path) -> dict:
     def add_node(nid: str, label: str, line: int) -> None:
         if nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({
-                "id": nid, "label": label, "file_type": "code",
-                "source_file": str_path, "source_location": f"L{line}",
-            })
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
 
     def add_edge(
-        src: str, tgt: str, relation: str, line: int,
+        src: str,
+        tgt: str,
+        relation: str,
+        line: int,
         context: str | None = None,
     ) -> None:
         key = (src, tgt, relation)
@@ -8676,9 +9782,13 @@ def add_edge(
             return
         seen_edge_pairs.add(key)
         edge: dict[str, Any] = {
-            "source": src, "target": tgt, "relation": relation,
-            "confidence": "EXTRACTED", "source_file": str_path,
-            "source_location": f"L{line}", "weight": 1.0,
+            "source": src,
+            "target": tgt,
+            "relation": relation,
+            "confidence": "EXTRACTED",
+            "source_file": str_path,
+            "source_location": f"L{line}",
+            "weight": 1.0,
         }
         if context:
             edge["context"] = context
@@ -8687,9 +9797,9 @@ def add_edge(
     file_nid = _make_id(str(path))
     add_node(file_nid, path.name, 1)
 
-    obj_re   = re.compile(r"^\s*object\s+\w+\s*:\s*(\w+)", re.IGNORECASE)
+    obj_re = re.compile(r"^\s*object\s+\w+\s*:\s*(\w+)", re.IGNORECASE)
     event_re = re.compile(r"^\s*On\w+\s*=\s*(\w+)", re.IGNORECASE)
-    end_re   = re.compile(r"^\s*end\s*$", re.IGNORECASE)
+    end_re = re.compile(r"^\s*end\s*$", re.IGNORECASE)
     stack: list[str] = [file_nid]
 
     for lineno, line in enumerate(text.splitlines(), 1):
@@ -8756,7 +9866,8 @@ def extract_lazarus_package(path: Path) -> dict:
     - package --contains--> listed unit
     """
     try:
-        import xml.etree.ElementTree as ET
+        from defusedxml import ElementTree as ET
+
         src = path.read_bytes()
     except OSError as e:
         return {"nodes": [], "edges": [], "error": str(e)}
@@ -8764,8 +9875,7 @@ def extract_lazarus_package(path: Path) -> dict:
     if len(src) > _PROJECT_XML_MAX_BYTES:
         return {"nodes": [], "edges": [], "error": "package file too large"}
     if not _project_xml_is_safe(src):
-        return {"nodes": [], "edges": [],
-                "error": "refusing XML with DOCTYPE/ENTITY declaration"}
+        return {"nodes": [], "edges": [], "error": "refusing XML with DOCTYPE/ENTITY declaration"}
 
     try:
         xml_root = ET.fromstring(src)
@@ -8781,16 +9891,25 @@ def extract_lazarus_package(path: Path) -> dict:
     def add_node(nid: str, label: str) -> None:
         if nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({
-                "id": nid, "label": label, "file_type": "code",
-                "source_file": str_path, "source_location": "L1",
-            })
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": "L1",
+                }
+            )
 
     def add_edge(src: str, tgt: str, relation: str, context: str | None = None) -> None:
         edge: dict[str, Any] = {
-            "source": src, "target": tgt, "relation": relation,
-            "confidence": "EXTRACTED", "source_file": str_path,
-            "source_location": "L1", "weight": 1.0,
+            "source": src,
+            "target": tgt,
+            "relation": relation,
+            "confidence": "EXTRACTED",
+            "source_file": str_path,
+            "source_location": "L1",
+            "weight": 1.0,
         }
         if context:
             edge["context"] = context
@@ -8801,6 +9920,7 @@ def add_edge(src: str, tgt: str, relation: str, context: str | None = None) -> N
 
     name_elem = xml_root.find(".//Package/Name")
     pkg_name = name_elem.get("Value") if name_elem is not None else path.stem
+    pkg_name = pkg_name or path.stem
     pkg_nid = _make_id(stem, pkg_name)
     add_node(pkg_nid, pkg_name)
     add_edge(file_nid, pkg_nid, "contains")
@@ -8836,12 +9956,11 @@ def _check_tree_sitter_version() -> None:
     try:
         from tree_sitter import LANGUAGE_VERSION
     except ImportError:
-        raise ImportError(
-            "tree-sitter is not installed. Run: pip install 'tree-sitter>=0.23.0'"
-        )
+        raise ImportError("tree-sitter is not installed. Run: pip install 'tree-sitter>=0.23.0'")
     # Language API v2 starts at LANGUAGE_VERSION 14
     if LANGUAGE_VERSION < 14:
         import tree_sitter as _ts
+
         raise RuntimeError(
             f"tree-sitter {getattr(_ts, '__version__', 'unknown')} is too old. "
             f"graphify requires tree-sitter >= 0.23.0 (Language API v2). "
@@ -8879,18 +9998,37 @@ def extract_bash(path: Path) -> dict:
     def add_node(nid: str, label: str, line: int, kind: str = "code") -> None:
         if nid and nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({"id": nid, "label": label, "file_type": "code",
-                          "source_file": str_path, "source_location": f"L{line}",
-                          "metadata": sanitize_metadata({"language": "bash", "kind": kind})})  # noqa: E501
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                    "metadata": sanitize_metadata({"language": "bash", "kind": kind}),
+                }
+            )  # noqa: E501
 
-    def add_edge(src: str, tgt: str, relation: str, line: int,
-                 confidence: str = "EXTRACTED", weight: float = 1.0,
-                 context: str | None = None) -> None:
+    def add_edge(
+        src: str,
+        tgt: str,
+        relation: str,
+        line: int,
+        confidence: str = "EXTRACTED",
+        weight: float = 1.0,
+        context: str | None = None,
+    ) -> None:
         if not src or not tgt or src == tgt:
             return
-        edge = {"source": src, "target": tgt, "relation": relation,
-                "confidence": confidence, "source_file": str_path,
-                "source_location": f"L{line}", "weight": weight}
+        edge = {
+            "source": src,
+            "target": tgt,
+            "relation": relation,
+            "confidence": confidence,
+            "source_file": str_path,
+            "source_location": f"L{line}",
+            "weight": weight,
+        }
         if context:
             edge["context"] = context
         edges.append(edge)
@@ -8908,13 +10046,15 @@ def add_edge(src: str, tgt: str, relation: str, line: int,
     # or expansion, not a real function call. Token-level filtering misses
     # these because `$(build)` exposes `build` as a child command whose name
     # token has no metacharacters — only the parent does.
-    _BASH_EXPANSION_PARENTS = frozenset({
-        "command_substitution",
-        "process_substitution",
-    })
+    _BASH_EXPANSION_PARENTS = frozenset(
+        {
+            "command_substitution",
+            "process_substitution",
+        }
+    )
 
     def text(node) -> str:
-        return source[node.start_byte:node.end_byte].decode("utf-8", errors="replace")
+        return source[node.start_byte : node.end_byte].decode("utf-8", errors="replace")
 
     def is_inside_expansion(node) -> bool:
         parent = node.parent
@@ -8967,9 +10107,14 @@ def walk_calls(body_node, func_nid: str, seen_calls: set) -> None:
                         key = (func_nid, tgt)
                         if tgt and key not in seen_calls:
                             seen_calls.add(key)
-                            add_edge(func_nid, tgt, "calls",
-                                     child.start_point[0] + 1,
-                                     confidence="EXTRACTED", context="call")
+                            add_edge(
+                                func_nid,
+                                tgt,
+                                "calls",
+                                child.start_point[0] + 1,
+                                confidence="EXTRACTED",
+                                context="call",
+                            )
             walk_calls(child, func_nid, seen_calls)
 
     def walk(node, parent_nid: str) -> None:
@@ -9005,9 +10150,11 @@ def walk(node, parent_nid: str) -> None:
                 cmd = literal(cmd_name_node)
                 if cmd in _BASH_SOURCE_COMMANDS and cmd not in defined_functions:
                     # find the path argument (first word after command name)
-                    args = [c for c in node.children
-                            if c.type in ("word", "string", "concatenation")
-                            and c != cmd_name_node]
+                    args = [
+                        c
+                        for c in node.children
+                        if c.type in ("word", "string", "concatenation") and c != cmd_name_node
+                    ]
                     if args:
                         raw = _read_text(args[0], source).strip().strip("'\"")
                         line = node.start_point[0] + 1
@@ -9019,13 +10166,11 @@ def walk(node, parent_nid: str) -> None:
                             # the project tree (B-1).
                             if resolved.exists():
                                 tgt_nid = _make_id(str(resolved))
-                                add_edge(file_nid, tgt_nid, "imports_from", line,
-                                         context="import")
+                                add_edge(file_nid, tgt_nid, "imports_from", line, context="import")
                         else:
                             tgt_nid = _make_id(raw)
                             if tgt_nid:
-                                add_edge(file_nid, tgt_nid, "imports", line,
-                                         context="import")
+                                add_edge(file_nid, tgt_nid, "imports", line, context="import")
             return
 
         if t == "declaration_command":
@@ -9074,6 +10219,7 @@ def _prescan_functions(node) -> None:
 
 # ── .NET project files (.sln, .csproj, .razor) ──────────────────────────────
 
+
 def extract_sln(path: Path) -> dict:
     """Extract projects and inter-project dependencies from a .sln file."""
     try:
@@ -9083,16 +10229,21 @@ def extract_sln(path: Path) -> dict:
 
     file_nid = _make_id(str(path))
     str_path = str(path)
-    nodes: list[dict] = [{"id": file_nid, "label": path.name, "file_type": "code",
-                          "source_file": str_path, "source_location": None}]
+    nodes: list[dict] = [
+        {
+            "id": file_nid,
+            "label": path.name,
+            "file_type": "code",
+            "source_file": str_path,
+            "source_location": None,
+        }
+    ]
     edges: list[dict] = []
     seen_ids: set[str] = set()
     seen_ids.add(file_nid)
 
-    _PROJECT_RE = re.compile(
-        r'Project\("[^"]*"\)\s*=\s*"([^"]+)"\s*,\s*"([^"]+)"\s*,\s*"([^"]*)"'
-    )
-    _DEP_RE = re.compile(r'\{([0-9a-fA-F-]+)\}\s*=\s*\{([0-9a-fA-F-]+)\}')
+    _PROJECT_RE = re.compile(r'Project\("[^"]*"\)\s*=\s*"([^"]+)"\s*,\s*"([^"]+)"\s*,\s*"([^"]*)"')
+    _DEP_RE = re.compile(r"\{([0-9a-fA-F-]+)\}\s*=\s*\{([0-9a-fA-F-]+)\}")
 
     guid_to_nid: dict[str, str] = {}
 
@@ -9108,18 +10259,33 @@ def extract_sln(path: Path) -> dict:
         proj_nid = _make_id(abs_proj)
         if proj_nid and proj_nid not in seen_ids:
             seen_ids.add(proj_nid)
-            nodes.append({"id": proj_nid, "label": proj_name,
-                          "file_type": "code", "source_file": abs_proj,
-                          "source_location": None})
-            edges.append({"source": file_nid, "target": proj_nid,
-                          "relation": "contains", "confidence": "EXTRACTED",
-                          "source_file": str_path, "weight": 1.0})
+            nodes.append(
+                {
+                    "id": proj_nid,
+                    "label": proj_name,
+                    "file_type": "code",
+                    "source_file": abs_proj,
+                    "source_location": None,
+                }
+            )
+            edges.append(
+                {
+                    "source": file_nid,
+                    "target": proj_nid,
+                    "relation": "contains",
+                    "confidence": "EXTRACTED",
+                    "source_file": str_path,
+                    "weight": 1.0,
+                }
+            )
         if proj_guid:
             guid_to_nid[proj_guid.lower()] = proj_nid
 
     in_dep_section = False
     current_proj_guid: str | None = None
-    _PROJECT_LINE_RE = re.compile(r'Project\("[^"]*"\)\s*=\s*"[^"]+"\s*,\s*"[^"]+"\s*,\s*"\{([^}]+)\}"')
+    _PROJECT_LINE_RE = re.compile(
+        r'Project\("[^"]*"\)\s*=\s*"[^"]+"\s*,\s*"[^"]+"\s*,\s*"\{([^}]+)\}"'
+    )
     for line in src.splitlines():
         proj_line_m = _PROJECT_LINE_RE.search(line)
         if proj_line_m:
@@ -9141,16 +10307,23 @@ def extract_sln(path: Path) -> dict:
                 from_nid = guid_to_nid.get(current_proj_guid)
                 to_nid = guid_to_nid.get(to_guid)
                 if from_nid and to_nid and from_nid != to_nid:
-                    edges.append({"source": from_nid, "target": to_nid,
-                                  "relation": "imports", "confidence": "EXTRACTED",
-                                  "source_file": str_path, "weight": 1.0})
+                    edges.append(
+                        {
+                            "source": from_nid,
+                            "target": to_nid,
+                            "relation": "imports",
+                            "confidence": "EXTRACTED",
+                            "source_file": str_path,
+                            "weight": 1.0,
+                        }
+                    )
 
     return {"nodes": nodes, "edges": edges}
 
 
 def extract_csproj(path: Path) -> dict:
     """Extract packages, project refs, and target framework from a .csproj/.fsproj/.vbproj."""
-    import xml.etree.ElementTree as ET
+    from defusedxml import ElementTree as ET
 
     try:
         src = path.read_bytes()
@@ -9160,8 +10333,7 @@ def extract_csproj(path: Path) -> dict:
     if len(src) > _PROJECT_XML_MAX_BYTES:
         return {"nodes": [], "edges": [], "error": "project file too large"}
     if not _project_xml_is_safe(src):
-        return {"nodes": [], "edges": [],
-                "error": "refusing XML with DOCTYPE/ENTITY declaration"}
+        return {"nodes": [], "edges": [], "error": "refusing XML with DOCTYPE/ENTITY declaration"}
 
     try:
         tree = ET.fromstring(src)
@@ -9170,8 +10342,15 @@ def extract_csproj(path: Path) -> dict:
 
     file_nid = _make_id(str(path))
     str_path = str(path)
-    nodes: list[dict] = [{"id": file_nid, "label": path.name, "file_type": "code",
-                          "source_file": str_path, "source_location": None}]
+    nodes: list[dict] = [
+        {
+            "id": file_nid,
+            "label": path.name,
+            "file_type": "code",
+            "source_file": str_path,
+            "source_location": None,
+        }
+    ]
     edges: list[dict] = []
     seen_ids: set[str] = set()
     seen_ids.add(file_nid)
@@ -9189,12 +10368,25 @@ def find_all(tag: str):
             fw_nid = _make_id("framework", tf.text.strip())
             if fw_nid and fw_nid not in seen_ids:
                 seen_ids.add(fw_nid)
-                nodes.append({"id": fw_nid, "label": tf.text.strip(),
-                              "file_type": "concept", "source_file": str_path,
-                              "source_location": None})
-                edges.append({"source": file_nid, "target": fw_nid,
-                              "relation": "references", "confidence": "EXTRACTED",
-                              "source_file": str_path, "weight": 1.0})
+                nodes.append(
+                    {
+                        "id": fw_nid,
+                        "label": tf.text.strip(),
+                        "file_type": "concept",
+                        "source_file": str_path,
+                        "source_location": None,
+                    }
+                )
+                edges.append(
+                    {
+                        "source": file_nid,
+                        "target": fw_nid,
+                        "relation": "references",
+                        "confidence": "EXTRACTED",
+                        "source_file": str_path,
+                        "weight": 1.0,
+                    }
+                )
 
     for tf in find_all("TargetFrameworks"):
         if tf.text:
@@ -9204,12 +10396,25 @@ def find_all(tag: str):
                     fw_nid = _make_id("framework", fw)
                     if fw_nid and fw_nid not in seen_ids:
                         seen_ids.add(fw_nid)
-                        nodes.append({"id": fw_nid, "label": fw,
-                                      "file_type": "concept", "source_file": str_path,
-                                      "source_location": None})
-                        edges.append({"source": file_nid, "target": fw_nid,
-                                      "relation": "references", "confidence": "EXTRACTED",
-                                      "source_file": str_path, "weight": 1.0})
+                        nodes.append(
+                            {
+                                "id": fw_nid,
+                                "label": fw,
+                                "file_type": "concept",
+                                "source_file": str_path,
+                                "source_location": None,
+                            }
+                        )
+                        edges.append(
+                            {
+                                "source": file_nid,
+                                "target": fw_nid,
+                                "relation": "references",
+                                "confidence": "EXTRACTED",
+                                "source_file": str_path,
+                                "weight": 1.0,
+                            }
+                        )
 
     for pkg in find_all("PackageReference"):
         name = pkg.get("Include") or pkg.get("include") or ""
@@ -9220,12 +10425,25 @@ def find_all(tag: str):
         label = f"{name} ({version})" if version else name
         if pkg_nid and pkg_nid not in seen_ids:
             seen_ids.add(pkg_nid)
-            nodes.append({"id": pkg_nid, "label": label,
-                          "file_type": "code", "source_file": str_path,
-                          "source_location": None})
-        edges.append({"source": file_nid, "target": pkg_nid,
-                      "relation": "imports", "confidence": "EXTRACTED",
-                      "source_file": str_path, "weight": 1.0})
+            nodes.append(
+                {
+                    "id": pkg_nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": None,
+                }
+            )
+        edges.append(
+            {
+                "source": file_nid,
+                "target": pkg_nid,
+                "relation": "imports",
+                "confidence": "EXTRACTED",
+                "source_file": str_path,
+                "weight": 1.0,
+            }
+        )
 
     for proj in find_all("ProjectReference"):
         ref_path = proj.get("Include") or proj.get("include") or ""
@@ -9240,24 +10458,50 @@ def find_all(tag: str):
         if proj_nid and proj_nid not in seen_ids:
             seen_ids.add(proj_nid)
             proj_label = Path(ref_path_norm).name
-            nodes.append({"id": proj_nid, "label": proj_label,
-                          "file_type": "code", "source_file": abs_ref,
-                          "source_location": None})
-        edges.append({"source": file_nid, "target": proj_nid,
-                      "relation": "imports", "confidence": "EXTRACTED",
-                      "source_file": str_path, "weight": 1.0})
+            nodes.append(
+                {
+                    "id": proj_nid,
+                    "label": proj_label,
+                    "file_type": "code",
+                    "source_file": abs_ref,
+                    "source_location": None,
+                }
+            )
+        edges.append(
+            {
+                "source": file_nid,
+                "target": proj_nid,
+                "relation": "imports",
+                "confidence": "EXTRACTED",
+                "source_file": str_path,
+                "weight": 1.0,
+            }
+        )
 
     sdk = tree.get("Sdk") or ""
     if sdk:
         sdk_nid = _make_id("sdk", sdk)
         if sdk_nid and sdk_nid not in seen_ids:
             seen_ids.add(sdk_nid)
-            nodes.append({"id": sdk_nid, "label": sdk,
-                          "file_type": "concept", "source_file": str_path,
-                          "source_location": None})
-            edges.append({"source": file_nid, "target": sdk_nid,
-                          "relation": "references", "confidence": "EXTRACTED",
-                          "source_file": str_path, "weight": 1.0})
+            nodes.append(
+                {
+                    "id": sdk_nid,
+                    "label": sdk,
+                    "file_type": "concept",
+                    "source_file": str_path,
+                    "source_location": None,
+                }
+            )
+            edges.append(
+                {
+                    "source": file_nid,
+                    "target": sdk_nid,
+                    "relation": "references",
+                    "confidence": "EXTRACTED",
+                    "source_file": str_path,
+                    "weight": 1.0,
+                }
+            )
 
     return {"nodes": nodes, "edges": edges}
 
@@ -9271,8 +10515,15 @@ def extract_razor(path: Path) -> dict:
 
     file_nid = _make_id(str(path))
     str_path = str(path)
-    nodes: list[dict] = [{"id": file_nid, "label": path.name, "file_type": "code",
-                          "source_file": str_path, "source_location": None}]
+    nodes: list[dict] = [
+        {
+            "id": file_nid,
+            "label": path.name,
+            "file_type": "code",
+            "source_file": str_path,
+            "source_location": None,
+        }
+    ]
     edges: list[dict] = []
     seen_ids: set[str] = set()
     seen_ids.add(file_nid)
@@ -9283,31 +10534,44 @@ def _add_ref(target_name: str, relation: str, line: int) -> None:
             return
         if tgt_nid not in seen_ids:
             seen_ids.add(tgt_nid)
-            nodes.append({"id": tgt_nid, "label": target_name,
-                          "file_type": "code", "source_file": str_path,
-                          "source_location": f"L{line}"})
-        edges.append({"source": file_nid, "target": tgt_nid,
-                      "relation": relation, "confidence": "EXTRACTED",
-                      "source_file": str_path, "source_location": f"L{line}",
-                      "weight": 1.0})
+            nodes.append(
+                {
+                    "id": tgt_nid,
+                    "label": target_name,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
+        edges.append(
+            {
+                "source": file_nid,
+                "target": tgt_nid,
+                "relation": relation,
+                "confidence": "EXTRACTED",
+                "source_file": str_path,
+                "source_location": f"L{line}",
+                "weight": 1.0,
+            }
+        )
 
     for i, line in enumerate(src.splitlines(), 1):
-        m = re.match(r'@using\s+([\w.]+)', line)
+        m = re.match(r"@using\s+([\w.]+)", line)
         if m:
             _add_ref(m.group(1), "imports", i)
             continue
 
-        m = re.match(r'@inject\s+([\w.<>\[\]]+)\s+(\w+)', line)
+        m = re.match(r"@inject\s+([\w.<>\[\]]+)\s+(\w+)", line)
         if m:
             _add_ref(m.group(1), "imports", i)
             continue
 
-        m = re.match(r'@inherits\s+([\w.<>\[\]]+)', line)
+        m = re.match(r"@inherits\s+([\w.<>\[\]]+)", line)
         if m:
             _add_ref(m.group(1), "inherits", i)
             continue
 
-        m = re.match(r'@model\s+([\w.<>\[\]]+)', line)
+        m = re.match(r"@model\s+([\w.<>\[\]]+)", line)
         if m:
             _add_ref(m.group(1), "references", i)
             continue
@@ -9318,44 +10582,81 @@ def _add_ref(target_name: str, relation: str, line: int) -> None:
             route_nid = _make_id("route", route)
             if route_nid and route_nid not in seen_ids:
                 seen_ids.add(route_nid)
-                nodes.append({"id": route_nid, "label": f"route:{route}",
-                              "file_type": "concept", "source_file": str_path,
-                              "source_location": f"L{i}"})
-                edges.append({"source": file_nid, "target": route_nid,
-                              "relation": "references", "confidence": "EXTRACTED",
-                              "source_file": str_path, "weight": 1.0})
+                nodes.append(
+                    {
+                        "id": route_nid,
+                        "label": f"route:{route}",
+                        "file_type": "concept",
+                        "source_file": str_path,
+                        "source_location": f"L{i}",
+                    }
+                )
+                edges.append(
+                    {
+                        "source": file_nid,
+                        "target": route_nid,
+                        "relation": "references",
+                        "confidence": "EXTRACTED",
+                        "source_file": str_path,
+                        "weight": 1.0,
+                    }
+                )
             continue
 
-    _COMPONENT_RE = re.compile(r'<([A-Z][A-Za-z0-9]+)[\s/>]')
-    _HTML_TAGS = frozenset({
-        "DOCTYPE", "Html", "Head", "Body", "Div", "Span", "Table", "Form",
-        "Input", "Button", "Select", "Option", "Label", "Textarea",
-        "Script", "Style", "Link", "Meta", "Title", "Header", "Footer",
-        "Nav", "Main", "Section", "Article", "Aside",
-    })
+    _COMPONENT_RE = re.compile(r"<([A-Z][A-Za-z0-9]+)[\s/>]")
+    _HTML_TAGS = frozenset(
+        {
+            "DOCTYPE",
+            "Html",
+            "Head",
+            "Body",
+            "Div",
+            "Span",
+            "Table",
+            "Form",
+            "Input",
+            "Button",
+            "Select",
+            "Option",
+            "Label",
+            "Textarea",
+            "Script",
+            "Style",
+            "Link",
+            "Meta",
+            "Title",
+            "Header",
+            "Footer",
+            "Nav",
+            "Main",
+            "Section",
+            "Article",
+            "Aside",
+        }
+    )
     for m in _COMPONENT_RE.finditer(src):
         comp_name = m.group(1)
         if comp_name in _HTML_TAGS:
             continue
-        line_num = src[:m.start()].count("\n") + 1
+        line_num = src[: m.start()].count("\n") + 1
         _add_ref(comp_name, "calls", line_num)
 
-    _CODE_BLOCK_RE = re.compile(r'@code\s*\{', re.MULTILINE)
+    _CODE_BLOCK_RE = re.compile(r"@code\s*\{", re.MULTILINE)
     for m in _CODE_BLOCK_RE.finditer(src):
         block_start = m.end()
         depth = 1
         pos = block_start
         while pos < len(src) and depth > 0:
-            if src[pos] == '{':
+            if src[pos] == "{":
                 depth += 1
-            elif src[pos] == '}':
+            elif src[pos] == "}":
                 depth -= 1
             pos += 1
-        code_block = src[block_start:pos - 1] if depth == 0 else ""
+        code_block = src[block_start : pos - 1] if depth == 0 else ""
 
         _METHOD_RE = re.compile(
-            r'(?:public|private|protected|internal|static|async|override|virtual|abstract)\s+'
-            r'[\w<>\[\],\s]+\s+(\w+)\s*\('
+            r"(?:public|private|protected|internal|static|async|override|virtual|abstract)\s+"
+            r"[\w<>\[\],\s]+\s+(\w+)\s*\("
         )
         for mm in _METHOD_RE.finditer(code_block):
             method_name = mm.group(1)
@@ -9364,12 +10665,25 @@ def _add_ref(target_name: str, relation: str, line: int) -> None:
             method_nid = _make_id(_file_stem(path), method_name)
             if method_nid and method_nid not in seen_ids:
                 seen_ids.add(method_nid)
-                nodes.append({"id": method_nid, "label": method_name,
-                              "file_type": "code", "source_file": str_path,
-                              "source_location": f"L{method_line}"})
-                edges.append({"source": file_nid, "target": method_nid,
-                              "relation": "contains", "confidence": "EXTRACTED",
-                              "source_file": str_path, "weight": 1.0})
+                nodes.append(
+                    {
+                        "id": method_nid,
+                        "label": method_name,
+                        "file_type": "code",
+                        "source_file": str_path,
+                        "source_location": f"L{method_line}",
+                    }
+                )
+                edges.append(
+                    {
+                        "source": file_nid,
+                        "target": method_nid,
+                        "relation": "contains",
+                        "confidence": "EXTRACTED",
+                        "source_file": str_path,
+                        "weight": 1.0,
+                    }
+                )
 
     return {"nodes": nodes, "edges": edges}
 
@@ -9406,24 +10720,42 @@ def extract_json(path: Path) -> dict:
     seen_ids: set[str] = set()
 
     # Keys whose string values become imports (package.json dep blocks)
-    _DEP_KEYS = frozenset({
-        "dependencies", "devDependencies", "peerDependencies",
-        "optionalDependencies", "bundleDependencies", "bundledDependencies",
-    })
+    _DEP_KEYS = frozenset(
+        {
+            "dependencies",
+            "devDependencies",
+            "peerDependencies",
+            "optionalDependencies",
+            "bundleDependencies",
+            "bundledDependencies",
+        }
+    )
 
     def add_node(nid: str, label: str, line: int) -> None:
         if nid and nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({"id": nid, "label": label, "file_type": "code",
-                          "source_file": str_path, "source_location": f"L{line}"})
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
 
-    def add_edge(src: str, tgt: str, relation: str, line: int,
-                 context: str | None = None) -> None:
+    def add_edge(src: str, tgt: str, relation: str, line: int, context: str | None = None) -> None:
         if not src or not tgt or src == tgt:
             return
-        edge = {"source": src, "target": tgt, "relation": relation,
-                "confidence": "EXTRACTED", "source_file": str_path,
-                "source_location": f"L{line}", "weight": 1.0}
+        edge = {
+            "source": src,
+            "target": tgt,
+            "relation": relation,
+            "confidence": "EXTRACTED",
+            "source_file": str_path,
+            "source_location": f"L{line}",
+            "weight": 1.0,
+        }
         if context:
             edge["context"] = context
         edges.append(edge)
@@ -9442,14 +10774,15 @@ def _key_text(pair_node) -> str | None:
                 return _read_text(content, source)
             # fallback: strip surrounding quotes
             raw = _read_text(key_node, source)
-            return raw.strip('"\'')
+            return raw.strip("\"'")
         return _read_text(key_node, source)
 
     def _val_node(pair_node):
         return pair_node.child_by_field_name("value")
 
-    def walk_object(obj_node, parent_nid: str, parent_key: str | None,
-                    depth: int, pair_count: list) -> None:
+    def walk_object(
+        obj_node, parent_nid: str, parent_key: str | None, depth: int, pair_count: list
+    ) -> None:
         if depth > 6:
             return
         for child in obj_node.children:
@@ -9482,7 +10815,11 @@ def walk_object(obj_node, parent_nid: str, parent_key: str | None,
                 for item in val.children:
                     if item.type == "string":
                         content = item.child_by_field_name("string_content")
-                        ref = _read_text(content, source) if content else _read_text(item, source).strip('"\'')
+                        ref = (
+                            _read_text(content, source)
+                            if content
+                            else _read_text(item, source).strip("\"'")
+                        )
                         if ref:
                             ref_nid = _make_id("ref", ref)
                             if ref_nid:
@@ -9490,7 +10827,9 @@ def walk_object(obj_node, parent_nid: str, parent_key: str | None,
 
             elif val.type == "string":
                 content = val.child_by_field_name("string_content")
-                val_text = _read_text(content, source) if content else _read_text(val, source).strip('"\'')
+                val_text = (
+                    _read_text(content, source) if content else _read_text(val, source).strip("\"'")
+                )
 
                 if key == "extends" and val_text:
                     # Namespace external refs to avoid ID collision with file nodes (J-4)
@@ -10197,23 +11536,21 @@ def _extract_parallel(
     try:
         with concurrent.futures.ProcessPoolExecutor(max_workers=max_workers) as pool:
             futures = {
-                pool.submit(_extract_single_file, item): item[0] for item in work_items
+                pool.submit(_extract_single_file, item): (item[0], item[1]) for item in work_items
             }
             for future in concurrent.futures.as_completed(futures):
                 try:
                     idx, result = future.result()
                     per_file[idx] = result
                 except Exception as exc:
-                    idx = futures[future]
+                    _, path_str = futures[future]
                     print(
-                        f"  warning: worker failed for {work_items[idx][1]}: {exc}",
-                        file=sys.stderr, flush=True,
+                        f"  warning: worker failed for {path_str}: {exc}",
+                        file=sys.stderr,
+                        flush=True,
                     )
                 done_count += 1
-                if (
-                    total_files >= _PROGRESS_INTERVAL
-                    and done_count % _PROGRESS_INTERVAL == 0
-                ):
+                if total_files >= _PROGRESS_INTERVAL and done_count % _PROGRESS_INTERVAL == 0:
                     print(
                         f"  AST extraction: {done_count}/{len(uncached_work)} uncached files "
                         f"({done_count * 100 // len(uncached_work)}%) [{max_workers} workers]",
@@ -10360,10 +11697,14 @@ def extract(
         if per_file[i] is None:
             per_file[i] = {"nodes": [], "edges": []}
 
+    completed_per_file: list[dict] = [
+        result if result is not None else {"nodes": [], "edges": []} for result in per_file
+    ]
+
     all_nodes: list[dict] = []
     all_edges: list[dict] = []
     all_raw_calls: list[dict] = []
-    for result in per_file:
+    for result in completed_per_file:
         all_nodes.extend(result.get("nodes", []))
         all_edges.extend(result.get("edges", []))
         all_raw_calls.extend(result.get("raw_calls", []))
@@ -10391,30 +11732,36 @@ def extract(
             if e.get("target") in id_remap:
                 e["target"] = id_remap[e["target"]]
 
-    _merge_swift_extensions(per_file, all_nodes, all_edges)
+    _merge_swift_extensions(completed_per_file, all_nodes, all_edges)
     _disambiguate_colliding_node_ids(all_nodes, all_edges, all_raw_calls, root)
     _rewire_unique_stub_nodes(all_nodes, all_edges)
 
     # Add cross-file class-level edges (Python only - uses Python parser internally)
     py_paths = [p for p in paths if p.suffix == ".py"]
     if py_paths:
-        py_results = [r for r, p in zip(per_file, paths) if p.suffix == ".py"]
+        py_results = [r for r, p in zip(completed_per_file, paths) if p.suffix == ".py"]
         try:
             cross_file_edges = _resolve_cross_file_imports(py_results, py_paths)
             all_edges.extend(cross_file_edges)
         except Exception as exc:
             import logging
-            logging.getLogger(__name__).warning("Cross-file import resolution failed, skipping: %s", exc)
+
+            logging.getLogger(__name__).warning(
+                "Cross-file import resolution failed, skipping: %s", exc
+            )
 
     # Cross-file Java import resolution
     java_paths = [p for p in paths if p.suffix == ".java"]
     if java_paths:
-        java_results = [r for r, p in zip(per_file, paths) if p.suffix == ".java"]
+        java_results = [r for r, p in zip(completed_per_file, paths) if p.suffix == ".java"]
         try:
             all_edges.extend(_resolve_cross_file_java_imports(java_results, java_paths))
         except Exception as exc:
             import logging
-            logging.getLogger(__name__).warning("Java cross-file import resolution failed, skipping: %s", exc)
+
+            logging.getLogger(__name__).warning(
+                "Java cross-file import resolution failed, skipping: %s", exc
+            )
 
     # Cross-file call resolution for all languages
     # Each extractor saved unresolved calls in raw_calls. Now that we have all
@@ -10490,11 +11837,18 @@ def extract(
             # file the callee lives in.
             caller_file_nid = nid_to_file_nid.get(caller)
             callee_file_nid = nid_to_file_nid.get(tgt)
-            imported_symbols = file_to_symbol_imports.get(caller_file_nid, set())
-            imported_modules = file_to_module_imports.get(caller_file_nid, set())
-            has_import_evidence = (
-                tgt in imported_symbols
-                or (callee_file_nid is not None and callee_file_nid in imported_modules)
+            imported_symbols = (
+                file_to_symbol_imports.get(caller_file_nid, set())
+                if caller_file_nid is not None
+                else set()
+            )
+            imported_modules = (
+                file_to_module_imports.get(caller_file_nid, set())
+                if caller_file_nid is not None
+                else set()
+            )
+            has_import_evidence = tgt in imported_symbols or (
+                callee_file_nid is not None and callee_file_nid in imported_modules
             )
             if has_import_evidence:
                 confidence = "EXTRACTED"
@@ -10502,17 +11856,19 @@ def extract(
             else:
                 confidence = "INFERRED"
                 confidence_score = 0.8
-            all_edges.append({
-                "source": caller,
-                "target": tgt,
-                "relation": "calls",
-                "context": "call",
-                "confidence": confidence,
-                "confidence_score": confidence_score,
-                "source_file": rc.get("source_file", ""),
-                "source_location": rc.get("source_location"),
-                "weight": 1.0,
-            })
+            all_edges.append(
+                {
+                    "source": caller,
+                    "target": tgt,
+                    "relation": "calls",
+                    "context": "call",
+                    "confidence": confidence,
+                    "confidence_score": confidence_score,
+                    "source_file": rc.get("source_file", ""),
+                    "source_location": rc.get("source_location"),
+                    "weight": 1.0,
+                }
+            )
 
     # Relativize source_file fields so paths are portable across machines (#555)
     for item in all_nodes + all_edges:
@@ -10535,11 +11891,14 @@ def extract(
     }
 
 
-def collect_files(target: Path, *, follow_symlinks: bool = False, root: Path | None = None) -> list[Path]:
+def collect_files(
+    target: Path, *, follow_symlinks: bool = False, root: Path | None = None
+) -> list[Path]:
     if target.is_file():
         return [target]
     _EXTENSIONS = set(_DISPATCH.keys())
     from graphify.detect import _load_graphifyignore, _is_ignored, _is_noise_dir
+
     ignore_root = root if root is not None else target
     patterns = _load_graphifyignore(ignore_root)
 
@@ -10550,9 +11909,9 @@ def _ignored(p: Path) -> bool:
         results: list[Path] = []
         for ext in sorted(_EXTENSIONS):
             results.extend(
-                p for p in target.rglob(f"*{ext}")
-                if not any(_is_noise_dir(part) for part in p.parts)
-                and not _ignored(p)
+                p
+                for p in target.rglob(f"*{ext}")
+                if not any(_is_noise_dir(part) for part in p.parts) and not _ignored(p)
             )
         return sorted(results)
     # Walk with symlink following + cycle detection
diff --git a/graphify/global_graph.py b/graphify/global_graph.py
index c6310f94b..dde2cdaa6 100644
--- a/graphify/global_graph.py
+++ b/graphify/global_graph.py
@@ -2,6 +2,7 @@
 import json
 import hashlib
 import sys
+from contextlib import suppress
 from datetime import datetime, timezone
 from pathlib import Path
 import networkx as nx
@@ -14,10 +15,8 @@
 
 def _load_manifest() -> dict:
     if _GLOBAL_MANIFEST.exists():
-        try:
+        with suppress(Exception):
             return json.loads(_GLOBAL_MANIFEST.read_text(encoding="utf-8"))
-        except Exception:
-            pass
     return {"version": 1, "repos": {}}
 
 
@@ -29,6 +28,7 @@ def _save_manifest(manifest: dict) -> None:
 def _load_global_graph() -> nx.Graph:
     if _GLOBAL_GRAPH.exists():
         from graphify.security import check_graph_file_size_cap
+
         check_graph_file_size_cap(_GLOBAL_GRAPH)
         data = json.loads(_GLOBAL_GRAPH.read_text(encoding="utf-8"))
         if "links" not in data and "edges" in data:
@@ -83,6 +83,7 @@ def global_add(source_path: Path, repo_tag: str) -> dict:
 
     # Load source graph
     from graphify.security import check_graph_file_size_cap
+
     check_graph_file_size_cap(source_path)
     data = json.loads(source_path.read_text(encoding="utf-8"))
     if "links" not in data and "edges" in data:
diff --git a/graphify/google_workspace.py b/graphify/google_workspace.py
index e9e60d813..bdf5cb975 100644
--- a/graphify/google_workspace.py
+++ b/graphify/google_workspace.py
@@ -5,6 +5,7 @@
 document content. This module exports them to Markdown sidecars via the
 googleworkspace CLI (`gws`) so Graphify can extract their actual contents.
 """
+
 from __future__ import annotations
 
 import hashlib
@@ -91,7 +92,9 @@ def read_google_shortcut(path: Path) -> dict[str, str | None]:
     }
 
 
-def _run_gws_export(file_id: str, mime_type: str, output: Path, resource_key: str | None = None) -> None:
+def _run_gws_export(
+    file_id: str, mime_type: str, output: Path, resource_key: str | None = None
+) -> None:
     exe = shutil.which("gws")
     if not exe:
         raise RuntimeError(
@@ -107,7 +110,7 @@ def _run_gws_export(file_id: str, mime_type: str, output: Path, resource_key: st
     output = output.resolve()
     output.parent.mkdir(parents=True, exist_ok=True)
     timeout = int(os.environ.get("GRAPHIFY_GOOGLE_WORKSPACE_TIMEOUT", "120"))
-    result = subprocess.run(
+    result = subprocess.run(  # nosec B603
         [exe, "drive", "files", "export", "--params", json.dumps(params), "-o", output.name],
         capture_output=True,
         cwd=output.parent,
@@ -126,7 +129,9 @@ def _sidecar_path(path: Path, out_dir: Path) -> Path:
     return out_dir / f"{path.stem}_{name_hash}.md"
 
 
-def _with_frontmatter(path: Path, shortcut: dict[str, str | None], body: str, exported_mime_type: str) -> str:
+def _with_frontmatter(
+    path: Path, shortcut: dict[str, str | None], body: str, exported_mime_type: str
+) -> str:
     source_url = shortcut.get("url") or ""
     account = shortcut.get("account") or ""
     account_line = ""
@@ -170,20 +175,26 @@ def convert_google_workspace_file(
         with tempfile.NamedTemporaryFile("w+b", suffix=".md", delete=False, dir=out_dir) as tmp:
             tmp_path = Path(tmp.name)
         try:
-            _run_gws_export(shortcut["file_id"] or "", "text/markdown", tmp_path, shortcut.get("resource_key"))
+            _run_gws_export(
+                shortcut["file_id"] or "", "text/markdown", tmp_path, shortcut.get("resource_key")
+            )
             body = tmp_path.read_text(encoding="utf-8", errors="replace")
         finally:
             tmp_path.unlink(missing_ok=True)
         if not body.strip():
             return None
-        out_path.write_text(_with_frontmatter(path, shortcut, body, "text/markdown"), encoding="utf-8")
+        out_path.write_text(
+            _with_frontmatter(path, shortcut, body, "text/markdown"), encoding="utf-8"
+        )
         return out_path
 
     if ext == ".gslides":
         with tempfile.NamedTemporaryFile("w+b", suffix=".txt", delete=False, dir=out_dir) as tmp:
             tmp_path = Path(tmp.name)
         try:
-            _run_gws_export(shortcut["file_id"] or "", "text/plain", tmp_path, shortcut.get("resource_key"))
+            _run_gws_export(
+                shortcut["file_id"] or "", "text/plain", tmp_path, shortcut.get("resource_key")
+            )
             body = tmp_path.read_text(encoding="utf-8", errors="replace")
         finally:
             tmp_path.unlink(missing_ok=True)
@@ -194,7 +205,9 @@ def convert_google_workspace_file(
 
     if ext == ".gsheet":
         if xlsx_to_markdown is None:
-            raise RuntimeError("Google Sheets export requires the office extra: pip install graphifyy[office,google]")
+            raise RuntimeError(
+                "Google Sheets export requires the office extra: pip install graphifyy[office,google]"
+            )
         with tempfile.NamedTemporaryFile("w+b", suffix=".xlsx", delete=False, dir=out_dir) as tmp:
             tmp_path = Path(tmp.name)
         try:
diff --git a/graphify/graph_loader.py b/graphify/graph_loader.py
index 437c4024a..a7c11f744 100644
--- a/graphify/graph_loader.py
+++ b/graphify/graph_loader.py
@@ -85,9 +85,7 @@ def _require_bool_field(
         return value
     if allow_none and value is None:
         return None
-    raise TypeError(
-        f"'{field}' must be a boolean, got {type(value).__name__} ({value!r})"
-    )
+    raise TypeError(f"'{field}' must be a boolean, got {type(value).__name__} ({value!r})")
 
 
 def load_graph_file(
@@ -259,8 +257,7 @@ def _load_multigraph(data: dict) -> nx.MultiDiGraph:
         key, attrs = strip_schema_key(attrs)
         if key is not None and not isinstance(key, str):
             raise TypeError(
-                f"multigraph edge 'key' must be a string, got "
-                f"{type(key).__name__} ({key!r})"
+                f"multigraph edge 'key' must be a string, got {type(key).__name__} ({key!r})"
             )
         if key is None:
             missing_key_count += 1
diff --git a/graphify/hooks.py b/graphify/hooks.py
index bf824ebb7..ded988cdc 100644
--- a/graphify/hooks.py
+++ b/graphify/hooks.py
@@ -43,7 +43,8 @@
 fi
 """
 
-_HOOK_SCRIPT = """\
+_HOOK_SCRIPT = (
+    """\
 # graphify-hook-start
 # Auto-rebuilds the knowledge graph after each commit (code files only, no LLM needed).
 # Installed by: graphify hook install
@@ -73,7 +74,9 @@
     exit 0
 fi
 
-""" + _PYTHON_DETECT + """
+"""
+    + _PYTHON_DETECT
+    + """
 export GRAPHIFY_CHANGED="$CHANGED"
 
 # Run rebuild detached so git commit returns immediately.
@@ -112,9 +115,11 @@
 disown 2>/dev/null || true
 # graphify-hook-end
 """
+)
 
 
-_CHECKOUT_SCRIPT = """\
+_CHECKOUT_SCRIPT = (
+    """\
 # graphify-checkout-hook-start
 # Auto-rebuilds the knowledge graph (code only) when switching branches.
 # Installed by: graphify hook install
@@ -145,7 +150,9 @@
 [ -f "$GIT_DIR/MERGE_HEAD" ] && exit 0
 [ -f "$GIT_DIR/CHERRY_PICK_HEAD" ] && exit 0
 
-""" + _PYTHON_DETECT + """
+"""
+    + _PYTHON_DETECT
+    + """
 _GRAPHIFY_LOG="${HOME}/.cache/graphify-rebuild.log"
 mkdir -p "$(dirname "$_GRAPHIFY_LOG")"
 echo "[graphify] Branch switched - launching background rebuild (log: $_GRAPHIFY_LOG)"
@@ -174,6 +181,7 @@
 disown 2>/dev/null || true
 # graphify-checkout-hook-end
 """
+)
 
 
 def _git_root(path: Path) -> Path | None:
@@ -223,10 +231,12 @@ def _hooks_dir(root: Path) -> Path:
     # absolute path for worktree/external-gitdir cases, and a path relative to
     # <root> for normal repos — anchoring on root covers both.
     import subprocess as _sp
+
     try:
-        res = _sp.run(
+        res = _sp.run(  # nosec B603 B607
             ["git", "-C", str(root), "rev-parse", "--git-path", "hooks"],
-            capture_output=True, text=True,
+            capture_output=True,
+            text=True,
         )
         raw = res.stdout.strip()
         # A valid hooks path can never contain newlines or NUL. Their presence
@@ -312,7 +322,9 @@ def uninstall(path: Path = Path(".")) -> str:
 
     hooks_dir = _user_hooks_dir(_hooks_dir(root))
     commit_msg = _uninstall_hook(hooks_dir, "post-commit", _HOOK_MARKER, _HOOK_MARKER_END)
-    checkout_msg = _uninstall_hook(hooks_dir, "post-checkout", _CHECKOUT_MARKER, _CHECKOUT_MARKER_END)
+    checkout_msg = _uninstall_hook(
+        hooks_dir, "post-checkout", _CHECKOUT_MARKER, _CHECKOUT_MARKER_END
+    )
 
     return f"post-commit: {commit_msg}\npost-checkout: {checkout_msg}"
 
@@ -328,7 +340,11 @@ def _check(name: str, marker: str) -> str:
         p = hooks_dir / name
         if not p.exists():
             return "not installed"
-        return "installed" if marker in p.read_text(encoding="utf-8") else "not installed (hook exists but graphify not found)"
+        return (
+            "installed"
+            if marker in p.read_text(encoding="utf-8")
+            else "not installed (hook exists but graphify not found)"
+        )
 
     commit = _check("post-commit", _HOOK_MARKER)
     checkout = _check("post-checkout", _CHECKOUT_MARKER)
diff --git a/graphify/ingest.py b/graphify/ingest.py
index 93e69bd79..b8c5a868a 100644
--- a/graphify/ingest.py
+++ b/graphify/ingest.py
@@ -92,6 +92,7 @@ def _html_to_markdown(html: str, url: str) -> str:
     html = re.sub(r"<style[^>]*>.*?</style>", "", html, flags=re.DOTALL | re.IGNORECASE)
     try:
         from markdownify import markdownify
+
         return markdownify(html, heading_style="ATX", bullets="-", strip=["img"])
     except ImportError:
         # Fallback: basic tag strip
@@ -104,7 +105,9 @@ def _fetch_tweet(url: str, author: str | None, contributor: str | None) -> tuple
     """Fetch a tweet URL. Returns (content, filename)."""
     # Normalize to twitter.com for oEmbed
     oembed_url = url.replace("x.com", "twitter.com")
-    oembed_api = f"https://publish.twitter.com/oembed?url={urllib.parse.quote(oembed_url)}&omit_script=true"
+    oembed_api = (
+        f"https://publish.twitter.com/oembed?url={urllib.parse.quote(oembed_url)}&omit_script=true"
+    )
     try:
         data = json.loads(safe_fetch_text(oembed_api))
         tweet_text = re.sub(r"<[^>]+>", "", data.get("html", "")).strip()
@@ -120,7 +123,7 @@ def _fetch_tweet(url: str, author: str | None, contributor: str | None) -> tuple
 type: tweet
 author: "{_yaml_str(tweet_author)}"
 captured_at: {now}
-contributor: "{_yaml_str(contributor or author or 'unknown')}"
+contributor: "{_yaml_str(contributor or author or "unknown")}"
 ---
 
 # Tweet by @{tweet_author}
@@ -147,7 +150,7 @@ def _fetch_webpage(url: str, author: str | None, contributor: str | None) -> tup
 type: webpage
 title: "{_yaml_str(title)}"
 captured_at: {now}
-contributor: "{_yaml_str(contributor or author or 'unknown')}"
+contributor: "{_yaml_str(contributor or author or "unknown")}"
 ---
 
 # {title}
@@ -170,12 +173,26 @@ def _fetch_arxiv(url: str, author: str | None, contributor: str | None) -> tuple
         api_url = f"https://export.arxiv.org/abs/{arxiv_id.group(1)}"
         try:
             html = _fetch_html(api_url)
-            abstract_match = re.search(r'class="abstract[^"]*"[^>]*>(.*?)</blockquote>', html, re.DOTALL | re.IGNORECASE)
-            abstract = re.sub(r"<[^>]+>", "", abstract_match.group(1)).strip() if abstract_match else ""
-            title_match = re.search(r'class="title[^"]*"[^>]*>(.*?)</h1>', html, re.DOTALL | re.IGNORECASE)
-            title = re.sub(r"<[^>]+>", " ", title_match.group(1)).strip() if title_match else arxiv_id.group(1)
-            authors_match = re.search(r'class="authors"[^>]*>(.*?)</div>', html, re.DOTALL | re.IGNORECASE)
-            paper_authors = re.sub(r"<[^>]+>", "", authors_match.group(1)).strip() if authors_match else ""
+            abstract_match = re.search(
+                r'class="abstract[^"]*"[^>]*>(.*?)</blockquote>', html, re.DOTALL | re.IGNORECASE
+            )
+            abstract = (
+                re.sub(r"<[^>]+>", "", abstract_match.group(1)).strip() if abstract_match else ""
+            )
+            title_match = re.search(
+                r'class="title[^"]*"[^>]*>(.*?)</h1>', html, re.DOTALL | re.IGNORECASE
+            )
+            title = (
+                re.sub(r"<[^>]+>", " ", title_match.group(1)).strip()
+                if title_match
+                else arxiv_id.group(1)
+            )
+            authors_match = re.search(
+                r'class="authors"[^>]*>(.*?)</div>', html, re.DOTALL | re.IGNORECASE
+            )
+            paper_authors = (
+                re.sub(r"<[^>]+>", "", authors_match.group(1)).strip() if authors_match else ""
+            )
         except Exception:
             title, abstract, paper_authors = arxiv_id.group(1), "", ""
     else:
@@ -184,12 +201,12 @@ def _fetch_arxiv(url: str, author: str | None, contributor: str | None) -> tuple
     now = datetime.now(timezone.utc).isoformat()
     content = f"""---
 source_url: "{_yaml_str(url)}"
-arxiv_id: "{_yaml_str(arxiv_id.group(1) if arxiv_id else '')}"
+arxiv_id: "{_yaml_str(arxiv_id.group(1) if arxiv_id else "")}"
 type: paper
 title: "{_yaml_str(title)}"
 paper_authors: "{_yaml_str(paper_authors)}"
 captured_at: {now}
-contributor: "{_yaml_str(contributor or author or 'unknown')}"
+contributor: "{_yaml_str(contributor or author or "unknown")}"
 ---
 
 # {title}
@@ -203,7 +220,11 @@ def _fetch_arxiv(url: str, author: str | None, contributor: str | None) -> tuple
 
 Source: {url}
 """
-    filename = f"arxiv_{arxiv_id.group(1).replace('.', '_')}.md" if arxiv_id else _safe_filename(url, ".md")
+    filename = (
+        f"arxiv_{arxiv_id.group(1).replace('.', '_')}.md"
+        if arxiv_id
+        else _safe_filename(url, ".md")
+    )
     return content, filename
 
 
@@ -215,7 +236,9 @@ def _download_binary(url: str, suffix: str, target_dir: Path) -> Path:
     return out_path
 
 
-def ingest(url: str, target_dir: Path, author: str | None = None, contributor: str | None = None) -> Path:
+def ingest(
+    url: str, target_dir: Path, author: str | None = None, contributor: str | None = None
+) -> Path:
     """
     Fetch a URL and save it into target_dir as a graphify-ready file.
 
@@ -243,6 +266,7 @@ def ingest(url: str, target_dir: Path, author: str | None = None, contributor: s
 
         if url_type == "youtube":
             from graphify.transcribe import download_audio
+
             out = download_audio(url, target_dir)
             print(f"Downloaded audio: {out.name}")
             return out
@@ -321,9 +345,12 @@ def save_query_result(
 
 if __name__ == "__main__":
     import argparse
+
     parser = argparse.ArgumentParser(description="Fetch a URL into a graphify /raw folder")
     parser.add_argument("url", help="URL to fetch")
-    parser.add_argument("target_dir", nargs="?", default="./raw", help="Target directory (default: ./raw)")
+    parser.add_argument(
+        "target_dir", nargs="?", default="./raw", help="Target directory (default: ./raw)"
+    )
     parser.add_argument("--author", help="Your name (stored as node metadata)")
     parser.add_argument("--contributor", help="Contributor name for team graphs")
     args = parser.parse_args()
diff --git a/graphify/llm.py b/graphify/llm.py
index 730975edf..5e0bb3e24 100644
--- a/graphify/llm.py
+++ b/graphify/llm.py
@@ -6,6 +6,7 @@
 from __future__ import annotations
 
 import json
+import importlib
 import os
 import sys
 import time
@@ -156,6 +157,7 @@ def _resolve_max_tokens(default: int) -> int:
             pass
     return default
 
+
 _EXTRACTION_SYSTEM = """\
 You are a graphify semantic extraction agent. Extract a knowledge graph fragment from the files provided.
 Output ONLY valid JSON — no explanation, no markdown fences, no preamble.
@@ -476,14 +478,13 @@ def _call_openai_compat(
 def _call_claude(api_key: str, model: str, user_message: str, max_tokens: int = 8192, *, deep_mode: bool = False) -> dict:
     """Call Anthropic Claude directly (not via OpenAI compat layer)."""
     try:
-        import anthropic
+        anthropic = importlib.import_module("anthropic")
     except ImportError as exc:
         raise ImportError(
-            "Claude direct extraction requires the anthropic package. "
-            "Run: pip install anthropic"
+            "Claude direct extraction requires the anthropic package. Run: pip install anthropic"
         ) from exc
 
-    client = anthropic.Anthropic(api_key=api_key)
+    client = getattr(anthropic, "Anthropic")(api_key=api_key)
     resp = client.messages.create(
         model=model,
         max_tokens=max_tokens,
@@ -564,7 +565,7 @@ def _call_claude_cli(user_message: str, max_tokens: int = 8192, *, deep_mode: bo
     cli_model = os.environ.get("GRAPHIFY_CLAUDE_CLI_MODEL", "").strip()
     if cli_model:
         cli_args.extend(["--model", cli_model])
-    proc = subprocess.run(
+    proc = subprocess.run(  # nosec B603
         cli_args,
         input=user_message,
         capture_output=True,
@@ -574,9 +575,7 @@ def _call_claude_cli(user_message: str, max_tokens: int = 8192, *, deep_mode: bo
         check=False,
     )
     if proc.returncode != 0:
-        raise RuntimeError(
-            f"claude -p exited {proc.returncode}: {proc.stderr.strip()[:500]}"
-        )
+        raise RuntimeError(f"claude -p exited {proc.returncode}: {proc.stderr.strip()[:500]}")
 
     try:
         envelope = json.loads(proc.stdout)
@@ -685,7 +684,7 @@ def extract_files_direct(
         # Ollama ignores auth but the OpenAI client library requires a non-empty
         # string. Use a placeholder and surface a visible warning so this never
         # silently routes traffic without the user realising — see F-029.
-        ollama_url = os.environ.get("OLLAMA_BASE_URL", cfg.get("base_url", ""))
+        ollama_url = str(os.environ.get("OLLAMA_BASE_URL", cfg.get("base_url", "")))
         _validate_ollama_base_url(ollama_url)
         print(
             "[graphify] WARNING: ollama backend selected with no OLLAMA_API_KEY set; "
@@ -870,14 +869,30 @@ def _extract_with_adaptive_retry(
                 f"and cannot be split further: {exc}",
                 file=sys.stderr,
             )
-            return {"nodes": [], "edges": [], "hyperedges": [], "input_tokens": 0, "output_tokens": 0, "model": model, "finish_reason": "stop"}
+            return {
+                "nodes": [],
+                "edges": [],
+                "hyperedges": [],
+                "input_tokens": 0,
+                "output_tokens": 0,
+                "model": model,
+                "finish_reason": "stop",
+            }
         if _depth >= max_depth:
             print(
                 f"[graphify] chunk of {len(chunk)} still overflows context at "
                 f"recursion depth {_depth} (max {max_depth}) — dropping",
                 file=sys.stderr,
             )
-            return {"nodes": [], "edges": [], "hyperedges": [], "input_tokens": 0, "output_tokens": 0, "model": model, "finish_reason": "stop"}
+            return {
+                "nodes": [],
+                "edges": [],
+                "hyperedges": [],
+                "input_tokens": 0,
+                "output_tokens": 0,
+                "model": model,
+                "finish_reason": "stop",
+            }
         print(
             f"[graphify] chunk of {len(chunk)} exceeded context at depth "
             f"{_depth} ({type(exc).__name__}); splitting in half and retrying",
@@ -997,11 +1012,14 @@ def extract_corpus_parallel(
     if token_budget is not None:
         chunks = _pack_chunks_by_tokens(files, token_budget=token_budget)
     else:
-        chunks = [files[i:i + chunk_size] for i in range(0, len(files), chunk_size)]
+        chunks = [files[i : i + chunk_size] for i in range(0, len(files), chunk_size)]
 
     merged: dict = {
-        "nodes": [], "edges": [], "hyperedges": [],
-        "input_tokens": 0, "output_tokens": 0,
+        "nodes": [],
+        "edges": [],
+        "hyperedges": [],
+        "input_tokens": 0,
+        "output_tokens": 0,
         "failed_chunks": 0,  # count of chunks that raised — loud failure on chunk errors
     }
     total = len(chunks)
@@ -1030,7 +1048,10 @@ def _run_one(idx: int, chunk: list[Path]) -> tuple[int, dict | None, Exception |
         max_concurrency = 1
     # claude-cli shells out to a Claude Code session; parallel subprocesses conflict
     # over session state. Force serial unless the user explicitly opts in.
-    if backend == "claude-cli" and os.environ.get("GRAPHIFY_CLAUDE_CLI_PARALLEL", "").strip() != "1":
+    if (
+        backend == "claude-cli"
+        and os.environ.get("GRAPHIFY_CLAUDE_CLI_PARALLEL", "").strip() != "1"
+    ):
         max_concurrency = 1
     workers = max(1, min(max_concurrency, total))
     if workers == 1:
@@ -1042,7 +1063,8 @@ def _run_one(idx: int, chunk: list[Path]) -> tuple[int, dict | None, Exception |
                 print(f"[graphify] chunk {idx + 1}/{total} failed: {exc}", file=sys.stderr)
                 merged["failed_chunks"] += 1
                 continue
-            assert result is not None
+            if result is None:
+                raise RuntimeError("chunk worker completed without result or exception")
             _merge_into(merged, result)
             if callable(on_chunk_done):
                 on_chunk_done(idx, total, result)
@@ -1058,7 +1080,8 @@ def _run_one(idx: int, chunk: list[Path]) -> tuple[int, dict | None, Exception |
                     )
                     merged["failed_chunks"] += 1
                     continue
-                assert result is not None
+                if result is None:
+                    raise RuntimeError("chunk worker completed without result or exception")
                 _merge_into(merged, result)
                 if callable(on_chunk_done):
                     on_chunk_done(idx, total, result)
@@ -1101,7 +1124,7 @@ def _call_llm(prompt: str, *, backend: str, max_tokens: int = 200) -> str:
     cfg = BACKENDS[backend]
     key = _get_backend_api_key(backend)
     if not key and backend == "ollama":
-        ollama_url = os.environ.get("OLLAMA_BASE_URL", cfg.get("base_url", ""))
+        ollama_url = str(os.environ.get("OLLAMA_BASE_URL", cfg.get("base_url", "")))
         _validate_ollama_base_url(ollama_url)
         key = "ollama"
     if not key and backend not in ("bedrock", "claude-cli"):
@@ -1112,10 +1135,10 @@ def _call_llm(prompt: str, *, backend: str, max_tokens: int = 200) -> str:
 
     if backend == "claude":
         try:
-            import anthropic
+            anthropic = importlib.import_module("anthropic")
         except ImportError as exc:
             raise ImportError("anthropic package required for claude backend") from exc
-        client = anthropic.Anthropic(api_key=key)
+        client = getattr(anthropic, "Anthropic")(api_key=key)
         resp = client.messages.create(
             model=mdl,
             max_tokens=max_tokens,
@@ -1124,10 +1147,12 @@ def _call_llm(prompt: str, *, backend: str, max_tokens: int = 200) -> str:
         return resp.content[0].text if resp.content else ""
 
     if backend == "claude-cli":
-        import shutil, subprocess
+        import shutil
+        import subprocess
+
         if shutil.which("claude") is None:
             raise RuntimeError("Claude Code CLI not found on $PATH")
-        proc = subprocess.run(
+        proc = subprocess.run(  # nosec B603 B607
             ["claude", "-p", "--output-format", "json", "--no-session-persistence"],
             input=prompt,
             capture_output=True,
@@ -1201,6 +1226,7 @@ def _validate_ollama_base_url(url: str) -> None:
     """
     try:
         from urllib.parse import urlparse
+
         parsed = urlparse(url)
     except Exception:
         print(
@@ -1241,7 +1267,11 @@ def detect_backend() -> str | None:
     for backend in ("gemini", "kimi", "claude", "openai", "deepseek"):
         if _get_backend_api_key(backend):
             return backend
-    if os.environ.get("AWS_PROFILE") or os.environ.get("AWS_REGION") or os.environ.get("AWS_DEFAULT_REGION"):
+    if (
+        os.environ.get("AWS_PROFILE")
+        or os.environ.get("AWS_REGION")
+        or os.environ.get("AWS_DEFAULT_REGION")
+    ):
         return "bedrock"
     ollama_url = os.environ.get("OLLAMA_BASE_URL")
     if ollama_url:
diff --git a/graphify/mcp_ingest.py b/graphify/mcp_ingest.py
index 1879dcc78..e3b808c49 100644
--- a/graphify/mcp_ingest.py
+++ b/graphify/mcp_ingest.py
@@ -66,12 +66,14 @@
 from graphify.security import sanitize_label
 
 
-MCP_CONFIG_FILENAMES: frozenset[str] = frozenset({
-    ".mcp.json",
-    "claude_desktop_config.json",
-    "mcp.json",
-    "mcp_servers.json",
-})
+MCP_CONFIG_FILENAMES: frozenset[str] = frozenset(
+    {
+        ".mcp.json",
+        "claude_desktop_config.json",
+        "mcp.json",
+        "mcp_servers.json",
+    }
+)
 
 _MAX_BYTES = 1_048_576  # 1 MiB — same cap as extract_json
 _MAX_SERVERS_PER_FILE = 200  # generous; flags pathological configs
@@ -130,7 +132,8 @@ def extract_mcp_config(path: Path) -> dict[str, Any]:
     seen_edge_keys: set[tuple[str, str, str]] = set()
 
     _add_node(
-        nodes, seen_node_ids,
+        nodes,
+        seen_node_ids,
         nid=file_nid,
         label=path.name,
         kind="mcp_config_file",
@@ -180,7 +183,8 @@ def _emit_server(
     """Emit nodes/edges for one entry under ``mcpServers``."""
     server_nid = _make_id(file_stem, "mcp_server", server_name)
     _add_node(
-        nodes, seen_node_ids,
+        nodes,
+        seen_node_ids,
         nid=server_nid,
         label=server_name,
         kind="mcp_server",
@@ -188,7 +192,8 @@ def _emit_server(
         line=1,  # JSON doesn't expose line numbers without a parser pass
     )
     _add_edge(
-        edges, seen_edge_keys,
+        edges,
+        seen_edge_keys,
         source=file_nid,
         target=server_nid,
         relation="contains",
@@ -201,7 +206,8 @@ def _emit_server(
         cmd_label = command.strip()
         cmd_nid = _make_id("mcp_command", cmd_label)
         _add_node(
-            nodes, seen_node_ids,
+            nodes,
+            seen_node_ids,
             nid=cmd_nid,
             label=cmd_label,
             kind="mcp_command",
@@ -209,7 +215,8 @@ def _emit_server(
             line=1,
         )
         _add_edge(
-            edges, seen_edge_keys,
+            edges,
+            seen_edge_keys,
             source=server_nid,
             target=cmd_nid,
             relation="references",
@@ -224,7 +231,8 @@ def _emit_server(
         if package:
             pkg_nid = _make_id("mcp_package", package)
             _add_node(
-                nodes, seen_node_ids,
+                nodes,
+                seen_node_ids,
                 nid=pkg_nid,
                 label=package,
                 kind="mcp_package",
@@ -232,7 +240,8 @@ def _emit_server(
                 line=1,
             )
             _add_edge(
-                edges, seen_edge_keys,
+                edges,
+                seen_edge_keys,
                 source=server_nid,
                 target=pkg_nid,
                 relation="references",
@@ -249,7 +258,8 @@ def _emit_server(
                 continue
             env_nid = _make_id("env_var", env_name)
             _add_node(
-                nodes, seen_node_ids,
+                nodes,
+                seen_node_ids,
                 nid=env_nid,
                 label=env_name,
                 kind="env_var",
@@ -257,7 +267,8 @@ def _emit_server(
                 line=1,
             )
             _add_edge(
-                edges, seen_edge_keys,
+                edges,
+                seen_edge_keys,
                 source=server_nid,
                 target=env_nid,
                 relation="requires_env",
@@ -276,7 +287,9 @@ def _emit_server(
 #   ["@scoped/some-mcp"]                                          (pnpx)
 #   ["mcp-server-fetch"]                                          (uvx direct)
 _NPM_PKG_RE = re.compile(r"^@[a-z0-9][a-z0-9._-]*/[a-z0-9][a-z0-9._-]*(?:@[\w.\-+]+)?$")
-_PY_MCP_PKG_RE = re.compile(r"^[a-z0-9][a-z0-9._-]*-mcp(?:-[a-z0-9._-]+)?$|^mcp-[a-z0-9][a-z0-9._-]*$")
+_PY_MCP_PKG_RE = re.compile(
+    r"^[a-z0-9][a-z0-9._-]*-mcp(?:-[a-z0-9._-]+)?$|^mcp-[a-z0-9][a-z0-9._-]*$"
+)
 _ARG_FLAG_RE = re.compile(r"^-{1,2}\w")
 
 
@@ -329,14 +342,16 @@ def _add_node(
     if not nid or nid in seen:
         return
     seen.add(nid)
-    nodes.append({
-        "id": nid,
-        "label": sanitize_label(label),
-        "file_type": "code",
-        "source_file": source_file,
-        "source_location": f"L{line}",
-        "metadata": {"mcp_kind": kind},
-    })
+    nodes.append(
+        {
+            "id": nid,
+            "label": sanitize_label(label),
+            "file_type": "code",
+            "source_file": source_file,
+            "source_location": f"L{line}",
+            "metadata": {"mcp_kind": kind},
+        }
+    )
 
 
 def _add_edge(
diff --git a/graphify/prs.py b/graphify/prs.py
index 319892e90..d69314a53 100644
--- a/graphify/prs.py
+++ b/graphify/prs.py
@@ -16,6 +16,7 @@
 from __future__ import annotations
 
 import json
+import importlib
 import os
 import re
 import subprocess
@@ -25,27 +26,53 @@
 from dataclasses import dataclass, field
 from datetime import datetime, timezone
 from pathlib import Path
+from typing import TYPE_CHECKING
 
+if TYPE_CHECKING:
+    import networkx as nx
 
 # ── ANSI colours ─────────────────────────────────────────────────────────────
 
 _NO_COLOR = not sys.stdout.isatty() or os.environ.get("NO_COLOR")
 
+
 def _c(code: str, text: str) -> str:
     if _NO_COLOR:
         return text
     return f"\033[{code}m{text}\033[0m"
 
-def green(t: str) -> str:   return _c("32", t)
-def red(t: str) -> str:     return _c("31", t)
-def yellow(t: str) -> str:  return _c("33", t)
-def cyan(t: str) -> str:    return _c("36", t)
-def bold(t: str) -> str:    return _c("1",  t)
-def dim(t: str) -> str:     return _c("2",  t)
-def magenta(t: str) -> str: return _c("35", t)
+
+def green(t: str) -> str:
+    return _c("32", t)
+
+
+def red(t: str) -> str:
+    return _c("31", t)
+
+
+def yellow(t: str) -> str:
+    return _c("33", t)
+
+
+def cyan(t: str) -> str:
+    return _c("36", t)
+
+
+def bold(t: str) -> str:
+    return _c("1", t)
+
+
+def dim(t: str) -> str:
+    return _c("2", t)
+
+
+def magenta(t: str) -> str:
+    return _c("35", t)
+
 
 _ANSI_RE = re.compile(r"\033\[[0-9;]*m")
 
+
 def _pad(s: str, width: int) -> str:
     """Pad an ANSI-colored string to visible width (strips escape codes for length calc)."""
     visible_len = len(_ANSI_RE.sub("", s))
@@ -54,6 +81,7 @@ def _pad(s: str, width: int) -> str:
 
 # ── Data model ────────────────────────────────────────────────────────────────
 
+
 @dataclass
 class PRInfo:
     number: int
@@ -62,8 +90,8 @@ class PRInfo:
     base_branch: str
     author: str
     is_draft: bool
-    review_decision: str        # APPROVED | CHANGES_REQUESTED | ""
-    ci_status: str              # SUCCESS | FAILURE | PENDING | NONE
+    review_decision: str  # APPROVED | CHANGES_REQUESTED | ""
+    ci_status: str  # SUCCESS | FAILURE | PENDING | NONE
     updated_at: datetime
     expected_base: str = "main"  # set by fetch_prs via _detect_default_branch
     worktree_path: str | None = None
@@ -91,7 +119,16 @@ def blast_radius(self) -> str:
 
 # ── Classification ────────────────────────────────────────────────────────────
 
-_STATUS_ORDER = ["WRONG-BASE", "CI-FAIL", "CHANGES-REQ", "DRAFT", "STALE", "PENDING", "APPROVED", "READY"]
+_STATUS_ORDER = [
+    "WRONG-BASE",
+    "CI-FAIL",
+    "CHANGES-REQ",
+    "DRAFT",
+    "STALE",
+    "PENDING",
+    "APPROVED",
+    "READY",
+]
 _STALE_DAYS = 14
 
 
@@ -115,29 +152,32 @@ def _classify(pr: "PRInfo", base: str = "v8") -> str:
 
 def _status_color(status: str) -> str:
     return {
-        "READY":       green(status),
-        "APPROVED":    bold(green(status)),
-        "CI-FAIL":     red(status),
+        "READY": green(status),
+        "APPROVED": bold(green(status)),
+        "CI-FAIL": red(status),
         "CHANGES-REQ": red(status),
-        "WRONG-BASE":  dim(status),
-        "STALE":       dim(status),
-        "DRAFT":       yellow(status),
-        "PENDING":     yellow(status),
+        "WRONG-BASE": dim(status),
+        "STALE": dim(status),
+        "DRAFT": yellow(status),
+        "PENDING": yellow(status),
     }.get(status, status)
 
 
 def _ci_icon(status: str) -> str:
-    return {"SUCCESS": green("✓"), "FAILURE": red("✗"), "PENDING": yellow("…"), "NONE": dim("–")}.get(status, "?")
+    return {
+        "SUCCESS": green("✓"),
+        "FAILURE": red("✗"),
+        "PENDING": yellow("…"),
+        "NONE": dim("–"),
+    }.get(status, "?")
 
 
 # ── GitHub data fetching ──────────────────────────────────────────────────────
 
+
 def _gh(*args: str) -> list | dict | None:
     try:
-        result = subprocess.run(
-            ["gh", *args],
-            capture_output=True, text=True, timeout=30
-        )
+        result = subprocess.run(["gh", *args], capture_output=True, text=True, timeout=30)  # nosec B603 B607
         if result.returncode != 0:
             return None
         return json.loads(result.stdout)
@@ -152,13 +192,16 @@ def _detect_default_branch(repo: str | None = None) -> str:
     if repo:
         args += ["--repo", repo]
     data = _gh(*args)
-    if data and data.get("defaultBranchRef", {}).get("name"):
-        return data["defaultBranchRef"]["name"]
+    default_branch_ref = data.get("defaultBranchRef") if isinstance(data, dict) else None
+    if isinstance(default_branch_ref, dict) and default_branch_ref.get("name"):
+        return str(default_branch_ref["name"])
     # Fall back to git symbolic-ref for the current repo
     try:
-        result = subprocess.run(
+        result = subprocess.run(  # nosec B603 B607
             ["git", "symbolic-ref", "refs/remotes/origin/HEAD"],
-            capture_output=True, text=True, timeout=5
+            capture_output=True,
+            text=True,
+            timeout=5,
         )
         if result.returncode == 0:
             # refs/remotes/origin/main → main
@@ -169,7 +212,9 @@ def _detect_default_branch(repo: str | None = None) -> str:
     return "main"
 
 
-_CI_FAILURE_CONCLUSIONS = frozenset({"FAILURE", "CANCELLED", "TIMED_OUT", "ACTION_REQUIRED", "STARTUP_FAILURE"})
+_CI_FAILURE_CONCLUSIONS = frozenset(
+    {"FAILURE", "CANCELLED", "TIMED_OUT", "ACTION_REQUIRED", "STARTUP_FAILURE"}
+)
 
 
 def _parse_ci(rollup: list) -> str:
@@ -189,9 +234,15 @@ def _parse_ci(rollup: list) -> str:
 def fetch_prs(repo: str | None = None, base: str | None = None, limit: int = 50) -> list[PRInfo]:
     resolved_base = base or _detect_default_branch(repo)
     args = [
-        "pr", "list", "--state", "open", "--limit", str(limit),
-        "--json", "number,title,headRefName,baseRefName,author,isDraft,"
-                  "reviewDecision,statusCheckRollup,updatedAt",
+        "pr",
+        "list",
+        "--state",
+        "open",
+        "--limit",
+        str(limit),
+        "--json",
+        "number,title,headRefName,baseRefName,author,isDraft,"
+        "reviewDecision,statusCheckRollup,updatedAt",
     ]
     if repo:
         args += ["--repo", repo]
@@ -203,18 +254,20 @@ def fetch_prs(repo: str | None = None, base: str | None = None, limit: int = 50)
     prs = []
     for item in raw:
         updated = datetime.fromisoformat(item["updatedAt"].replace("Z", "+00:00"))
-        prs.append(PRInfo(
-            number=item["number"],
-            title=item["title"],
-            branch=item["headRefName"],
-            base_branch=item["baseRefName"],
-            author=item["author"]["login"] if item.get("author") else "?",
-            is_draft=item.get("isDraft", False),
-            review_decision=item.get("reviewDecision") or "",
-            ci_status=_parse_ci(item.get("statusCheckRollup") or []),
-            updated_at=updated,
-            expected_base=resolved_base,
-        ))
+        prs.append(
+            PRInfo(
+                number=item["number"],
+                title=item["title"],
+                branch=item["headRefName"],
+                base_branch=item["baseRefName"],
+                author=item["author"]["login"] if item.get("author") else "?",
+                is_draft=item.get("isDraft", False),
+                review_decision=item.get("reviewDecision") or "",
+                ci_status=_parse_ci(item.get("statusCheckRollup") or []),
+                updated_at=updated,
+                expected_base=resolved_base,
+            )
+        )
     return prs
 
 
@@ -223,16 +276,17 @@ def fetch_pr_files(number: int, repo: str | None = None) -> list[str]:
     if repo:
         args += ["--repo", repo]
     try:
-        result = subprocess.run(["gh", *args], capture_output=True, text=True, timeout=30)
+        result = subprocess.run(["gh", *args], capture_output=True, text=True, timeout=30)  # nosec B603 B607
         if result.returncode != 0:
             return []
-        return [l.strip() for l in result.stdout.splitlines() if l.strip()]
+        return [label.strip() for label in result.stdout.splitlines() if label.strip()]
     except (subprocess.TimeoutExpired, FileNotFoundError):
         return []
 
 
 # ── Graph-native impact (used by MCP tools — works on nx.Graph directly) ─────
 
+
 def _path_match(graph_src: str, pr_file: str) -> bool:
     """True if graph_src and pr_file refer to the same file (path-boundary safe)."""
     if graph_src == pr_file:
@@ -278,7 +332,13 @@ def format_prs_text(prs: list["PRInfo"], base: str) -> str:
     actionable = [p for p in prs if p.base_branch == base]
     wrong = len(prs) - len(actionable)
     lines = [f"Open PRs targeting {base}: {len(actionable)}  ({wrong} on wrong base, not shown)\n"]
-    for p in sorted(actionable, key=lambda x: (_STATUS_ORDER.index(x.status) if x.status in _STATUS_ORDER else 99, x.days_old)):
+    for p in sorted(
+        actionable,
+        key=lambda x: (
+            _STATUS_ORDER.index(x.status) if x.status in _STATUS_ORDER else 99,
+            x.days_old,
+        ),
+    ):
         impact = f"  blast_radius={p.blast_radius}" if p.blast_radius else ""
         lines.append(
             f"#{p.number} [{p.status}] CI={p.ci_status} review={p.review_decision or 'none'} "
@@ -289,12 +349,12 @@ def format_prs_text(prs: list["PRInfo"], base: str) -> str:
 
 # ── Worktree mapping ──────────────────────────────────────────────────────────
 
+
 def fetch_worktrees() -> dict[str, str]:
     """Returns {branch: worktree_path}."""
     try:
-        result = subprocess.run(
-            ["git", "worktree", "list", "--porcelain"],
-            capture_output=True, text=True, timeout=10
+        result = subprocess.run(  # nosec B603 B607
+            ["git", "worktree", "list", "--porcelain"], capture_output=True, text=True, timeout=10
         )
         if result.returncode != 0:
             return {}
@@ -305,7 +365,9 @@ def fetch_worktrees() -> dict[str, str]:
     current_path = None
     for line in result.stdout.splitlines():
         if not line:
-            current_path = None  # blank line = record separator; reset to avoid leaking across detached HEADs
+            current_path = (
+                None  # blank line = record separator; reset to avoid leaking across detached HEADs
+            )
         elif line.startswith("worktree "):
             current_path = line[9:]
         elif line.startswith("branch refs/heads/") and current_path:
@@ -315,10 +377,12 @@ def fetch_worktrees() -> dict[str, str]:
 
 # ── Graph impact analysis ─────────────────────────────────────────────────────
 
+
 def _load_graph_json(graph_path: Path) -> dict | None:
     if not graph_path.exists():
         return None
     from graphify.security import check_graph_file_size_cap
+
     try:
         check_graph_file_size_cap(graph_path)
         return json.loads(graph_path.read_text(encoding="utf-8"))
@@ -366,10 +430,7 @@ def attach_graph_impact(
     actionable = [pr for pr in prs if pr.status != "WRONG-BASE"]
     workers = min(8, len(actionable)) if actionable else 1
     with ThreadPoolExecutor(max_workers=workers) as pool:
-        future_to_pr = {
-            pool.submit(fetch_pr_files, pr.number, repo): pr
-            for pr in actionable
-        }
+        future_to_pr = {pool.submit(fetch_pr_files, pr.number, repo): pr for pr in actionable}
         for fut in as_completed(future_to_pr):
             pr = future_to_pr[fut]
             try:
@@ -395,8 +456,9 @@ def attach_graph_impact(
 
 # ── Dashboard rendering ───────────────────────────────────────────────────────
 
+
 def _truncate(s: str, n: int) -> str:
-    return s if len(s) <= n else s[:n - 1] + "…"
+    return s if len(s) <= n else s[: n - 1] + "…"
 
 
 def render_dashboard(prs: list[PRInfo], base: str = "v8", show_wrong_base: bool = False) -> None:
@@ -404,7 +466,12 @@ def render_dashboard(prs: list[PRInfo], base: str = "v8", show_wrong_base: bool
     wrong_base = [p for p in prs if p.base_branch != base]
 
     # Sort: READY first, then by status order, then by recency
-    actionable.sort(key=lambda p: (_STATUS_ORDER.index(p.status) if p.status in _STATUS_ORDER else 99, p.days_old))
+    actionable.sort(
+        key=lambda p: (
+            _STATUS_ORDER.index(p.status) if p.status in _STATUS_ORDER else 99,
+            p.days_old,
+        )
+    )
 
     print()
     print(bold(f"  graphify prs  ·  base: {base}  ·  {len(actionable)} PRs"))
@@ -415,13 +482,17 @@ def render_dashboard(prs: list[PRInfo], base: str = "v8", show_wrong_base: bool
     else:
         # Header
         print(f"  {'#':>4}  {'CI':2}  {'STATUS':13}  {'UPDATED':8}  {'IMPACT':22}  TITLE")
-        print(f"  {'─'*4}  {'─'*2}  {'─'*13}  {'─'*8}  {'─'*22}  {'─'*40}")
+        print(f"  {'─' * 4}  {'─' * 2}  {'─' * 13}  {'─' * 8}  {'─' * 22}  {'─' * 40}")
 
         for pr in actionable:
             status_str = _pad(_status_color(pr.status), 13)
             ci_str = _ci_icon(pr.ci_status)
             age = f"{pr.days_old}d" if pr.days_old > 0 else "today"
-            impact = _pad(dim(_truncate(pr.blast_radius, 22)), 22) if pr.blast_radius else _pad(dim("–"), 22)
+            impact = (
+                _pad(dim(_truncate(pr.blast_radius, 22)), 22)
+                if pr.blast_radius
+                else _pad(dim("–"), 22)
+            )
             wt = f" {cyan('⬡')}" if pr.worktree_path else "  "
             draft = dim(" [draft]") if pr.is_draft else ""
             title = _truncate(pr.title, 52)
@@ -434,13 +505,20 @@ def render_dashboard(prs: list[PRInfo], base: str = "v8", show_wrong_base: bool
         by_status[p.status] = by_status.get(p.status, 0) + 1
 
     parts = []
-    if by_status.get("READY"):      parts.append(green(f"{by_status['READY']} ready"))
-    if by_status.get("APPROVED"):   parts.append(bold(green(f"{by_status['APPROVED']} approved")))
-    if by_status.get("PENDING"):    parts.append(yellow(f"{by_status['PENDING']} pending CI"))
-    if by_status.get("CI-FAIL"):    parts.append(red(f"{by_status['CI-FAIL']} CI failing"))
-    if by_status.get("CHANGES-REQ"):parts.append(red(f"{by_status['CHANGES-REQ']} changes requested"))
-    if by_status.get("DRAFT"):      parts.append(yellow(f"{by_status['DRAFT']} draft"))
-    if by_status.get("STALE"):      parts.append(dim(f"{by_status['STALE']} stale"))
+    if by_status.get("READY"):
+        parts.append(green(f"{by_status['READY']} ready"))
+    if by_status.get("APPROVED"):
+        parts.append(bold(green(f"{by_status['APPROVED']} approved")))
+    if by_status.get("PENDING"):
+        parts.append(yellow(f"{by_status['PENDING']} pending CI"))
+    if by_status.get("CI-FAIL"):
+        parts.append(red(f"{by_status['CI-FAIL']} CI failing"))
+    if by_status.get("CHANGES-REQ"):
+        parts.append(red(f"{by_status['CHANGES-REQ']} changes requested"))
+    if by_status.get("DRAFT"):
+        parts.append(yellow(f"{by_status['DRAFT']} draft"))
+    if by_status.get("STALE"):
+        parts.append(dim(f"{by_status['STALE']} stale"))
 
     if wrong_base:
         parts.append(dim(f"{len(wrong_base)} wrong base"))
@@ -471,7 +549,9 @@ def render_worktrees(prs: list[PRInfo], worktrees: dict[str, str]) -> None:
         if pr:
             status = _status_color(pr.status)
             print(f"  {cyan(path)}")
-            print(f"    {dim('branch:')} {branch}  ->  PR {bold(f'#{pr.number}')}  [{status}]  {_truncate(pr.title, 50)}")
+            print(
+                f"    {dim('branch:')} {branch}  ->  PR {bold(f'#{pr.number}')}  [{status}]  {_truncate(pr.title, 50)}"
+            )
         else:
             print(f"  {cyan(path)}")
             print(f"    {dim('branch:')} {branch}  {dim('(no open PR)')}")
@@ -509,7 +589,9 @@ def render_conflicts(
             comm_label_str = dim("  — " + ", ".join(labels[comm]))
         print(f"  {yellow(f'Community {comm}')}{comm_label_str}  ({len(ps)} PRs overlap)")
         for pr in ps:
-            print(f"    #{pr.number:4}  {_pad(_status_color(pr.status), 13)}  {_truncate(pr.title, 55)}")
+            print(
+                f"    #{pr.number:4}  {_pad(_status_color(pr.status), 13)}  {_truncate(pr.title, 55)}"
+            )
         print()
 
 
@@ -544,7 +626,7 @@ def render_pr_detail(pr: PRInfo, repo: str | None = None) -> None:
 # Best model per backend for reasoning tasks (different from extraction defaults)
 _TRIAGE_MODEL_DEFAULTS: dict[str, str] = {
     "claude": "claude-opus-4-7",
-    "kimi":   "kimi-k2.6",
+    "kimi": "kimi-k2.6",
     "openai": "gpt-4.1-mini",
     "gemini": "gemini-3-flash-preview",
 }
@@ -556,19 +638,24 @@ def _resolve_triage_backend() -> tuple[str, str]:
 
     explicit = os.environ.get("GRAPHIFY_TRIAGE_BACKEND", "").strip()
     if explicit in BACKENDS:
-        model = (os.environ.get("GRAPHIFY_TRIAGE_MODEL")
-                 or _TRIAGE_MODEL_DEFAULTS.get(explicit)
-                 or _default_model_for_backend(explicit))
+        model = (
+            os.environ.get("GRAPHIFY_TRIAGE_MODEL")
+            or _TRIAGE_MODEL_DEFAULTS.get(explicit)
+            or _default_model_for_backend(explicit)
+        )
         return explicit, model
 
     for b in ("claude", "kimi", "openai", "gemini"):
         if _get_backend_api_key(b):
-            model = (os.environ.get("GRAPHIFY_TRIAGE_MODEL")
-                     or _TRIAGE_MODEL_DEFAULTS.get(b)
-                     or _default_model_for_backend(b))
+            model = (
+                os.environ.get("GRAPHIFY_TRIAGE_MODEL")
+                or _TRIAGE_MODEL_DEFAULTS.get(b)
+                or _default_model_for_backend(b)
+            )
             return b, model
 
     import shutil
+
     if shutil.which("claude"):
         return "claude-cli", "claude-code-plan"
 
@@ -582,7 +669,9 @@ def triage_with_opus(prs: list[PRInfo], base: str) -> None:
         print(red("  graphify.llm not available - cannot run triage."), file=sys.stderr)
         sys.exit(1)
 
-    candidates = [p for p in prs if p.base_branch == base and p.status not in ("WRONG-BASE", "STALE")]
+    candidates = [
+        p for p in prs if p.base_branch == base and p.status not in ("WRONG-BASE", "STALE")
+    ]
     if not candidates:
         print(dim("  No actionable PRs to triage."))
         return
@@ -599,8 +688,7 @@ def triage_with_opus(prs: list[PRInfo], base: str) -> None:
         "You are a senior engineer helping triage a PR review queue. "
         "Given these open PRs, rank them by review priority for the repo maintainer. "
         "For each PR give: priority number, one sentence on what action to take and why. "
-        "Be direct and specific. Format each as: #<number> — <action>.\n\n"
-        + "\n\n".join(lines)
+        "Be direct and specific. Format each as: #<number> — <action>.\n\n" + "\n\n".join(lines)
     )
 
     try:
@@ -615,10 +703,12 @@ def triage_with_opus(prs: list[PRInfo], base: str) -> None:
 
     try:
         if backend == "claude":
-            import anthropic
-            client = anthropic.Anthropic(api_key=_get_backend_api_key("claude"))
+            anthropic = importlib.import_module("anthropic")
+
+            client = getattr(anthropic, "Anthropic")(api_key=_get_backend_api_key("claude"))
             with client.messages.stream(
-                model=model, max_tokens=1024,
+                model=model,
+                max_tokens=1024,
                 messages=[{"role": "user", "content": prompt}],
             ) as stream:
                 print("  ", end="", flush=True)
@@ -628,11 +718,14 @@ def triage_with_opus(prs: list[PRInfo], base: str) -> None:
 
         elif backend in ("kimi", "openai", "gemini", "ollama"):
             from openai import OpenAI
+
             cfg = BACKENDS[backend]
             api_key = _get_backend_api_key(backend) or "ollama"
             client = OpenAI(api_key=api_key, base_url=cfg.get("base_url", ""))
             with client.chat.completions.create(
-                model=model, max_tokens=1024, stream=True,
+                model=model,
+                max_tokens=1024,
+                stream=True,
                 messages=[{"role": "user", "content": prompt}],
             ) as stream:
                 print("  ", end="", flush=True)
@@ -644,9 +737,13 @@ def triage_with_opus(prs: list[PRInfo], base: str) -> None:
 
         elif backend == "claude-cli":
             import subprocess as _sp
-            proc = _sp.run(
+
+            proc = _sp.run(  # nosec B603 B607
                 ["claude", "-p", "--no-session-persistence"],
-                input=prompt, capture_output=True, text=True, timeout=120,
+                input=prompt,
+                capture_output=True,
+                text=True,
+                timeout=120,
             )
             if proc.returncode != 0:
                 print(red(f"  claude -p failed: {proc.stderr.strip()[:300]}"), file=sys.stderr)
@@ -665,6 +762,7 @@ def triage_with_opus(prs: list[PRInfo], base: str) -> None:
 
 # ── Entry point ───────────────────────────────────────────────────────────────
 
+
 def cmd_prs(argv: list[str]) -> None:
     base: str | None = None  # auto-detected from repo if not given
     repo: str | None = None
@@ -687,15 +785,18 @@ def cmd_prs(argv: list[str]) -> None:
         elif arg == "--wrong-base":
             show_wrong_base = True
         elif arg in ("--base", "-b") and i + 1 < len(argv):
-            base = argv[i + 1]; i += 1
+            base = argv[i + 1]
+            i += 1
         elif arg.startswith("--base="):
             base = arg.split("=", 1)[1]
         elif arg in ("--repo", "-R") and i + 1 < len(argv):
-            repo = argv[i + 1]; i += 1
+            repo = argv[i + 1]
+            i += 1
         elif arg.startswith("--graph="):
             graph_path = Path(arg.split("=", 1)[1])
         elif arg == "--graph" and i + 1 < len(argv):
-            graph_path = Path(argv[i + 1]); i += 1
+            graph_path = Path(argv[i + 1])
+            i += 1
         elif arg.lstrip("#").isdigit():
             pr_number = int(arg.lstrip("#"))
         elif arg in ("-h", "--help"):
diff --git a/graphify/report.py b/graphify/report.py
index f0210897d..7175cbc06 100644
--- a/graphify/report.py
+++ b/graphify/report.py
@@ -7,7 +7,9 @@
 
 def _safe_community_name(label: str) -> str:
     """Mirrors export.safe_name so community hub filenames and report wikilinks always agree."""
-    cleaned = re.sub(r'[\\/*?:"<>|#^[\]]', "", label.replace("\r\n", " ").replace("\r", " ").replace("\n", " ")).strip()
+    cleaned = re.sub(
+        r'[\\/*?:"<>|#^[\]]', "", label.replace("\r\n", " ").replace("\r", " ").replace("\n", " ")
+    ).strip()
     cleaned = re.sub(r"\.(md|mdx|markdown)$", "", cleaned, flags=re.IGNORECASE)
     return cleaned or "unnamed"
 
@@ -30,7 +32,9 @@ def generate(
 
     # JSON deserialization produces string keys; normalize to int so .get(cid) works.
     if community_labels:
-        community_labels = {int(k) if isinstance(k, str) else k: v for k, v in community_labels.items()}
+        community_labels = {
+            int(k) if isinstance(k, str) else k: v for k, v in community_labels.items()
+        }
 
     confidences = [d.get("confidence", "EXTRACTED") for _, _, d in G.edges(data=True)]
     total = len(confidences) or 1
@@ -56,10 +60,13 @@ def generate(
         ]
 
     from .analyze import _is_file_node as _ifn
-    non_empty = {cid: nodes for cid, nodes in communities.items()
-                 if any(not _ifn(G, n) for n in nodes)}
+
+    non_empty = {
+        cid: nodes for cid, nodes in communities.items() if any(not _ifn(G, n) for n in nodes)
+    }
     thin_count_summary = sum(
-        1 for nodes in communities.values()
+        1
+        for nodes in communities.values()
         if 0 < sum(1 for n in nodes if not _ifn(G, n)) < min_community_size
     )
     shown_count = len(communities) - thin_count_summary
@@ -68,9 +75,17 @@ def generate(
         "",
         "## Summary",
         f"- {G.number_of_nodes()} nodes · {G.number_of_edges()} edges · {len(communities)} communities"
-        + (f" ({shown_count} shown, {thin_count_summary} thin omitted)" if thin_count_summary else ""),
+        + (
+            f" ({shown_count} shown, {thin_count_summary} thin omitted)"
+            if thin_count_summary
+            else ""
+        ),
         f"- Extraction: {ext_pct}% EXTRACTED · {inf_pct}% INFERRED · {amb_pct}% AMBIGUOUS"
-        + (f" · INFERRED: {len(inf_edges)} edges (avg confidence: {inf_avg})" if inf_avg is not None else ""),
+        + (
+            f" · INFERRED: {len(inf_edges)} edges (avg confidence: {inf_avg})"
+            if inf_avg is not None
+            else ""
+        ),
         f"- Token cost: {token_cost.get('input', 0):,} input · {token_cost.get('output', 0):,} output",
     ]
 
@@ -155,10 +170,10 @@ def generate(
         if len(real_nodes) < min_community_size:
             continue
         display = [G.nodes[n].get("label", n) for n in real_nodes[:8]]
-        suffix = f" (+{len(real_nodes)-8} more)" if len(real_nodes) > 8 else ""
+        suffix = f" (+{len(real_nodes) - 8} more)" if len(real_nodes) > 8 else ""
         lines += [
             "",
-            f"### Community {cid} - \"{label}\"",
+            f'### Community {cid} - "{label}"',
             f"Cohesion: {score:.2f}",
             f"Nodes ({len(real_nodes)}): {', '.join(display)}{suffix}",
         ]
@@ -178,14 +193,16 @@ def generate(
     from .analyze import _is_file_node, _is_concept_node
 
     isolated = [
-        n for n in G.nodes()
+        n
+        for n in G.nodes()
         if G.degree(n) <= 1
         and not _is_file_node(G, n)
         and not _is_concept_node(G, n)
         and G.nodes[n].get("file_type") != "rationale"
     ]
     thin_communities = {
-        cid: nodes for cid, nodes in communities.items()
+        cid: nodes
+        for cid, nodes in communities.items()
         if 0 < sum(1 for n in nodes if not _is_file_node(G, n)) < 3
     }
     gap_count = len(isolated) + len(thin_communities)
@@ -194,17 +211,27 @@ def generate(
         lines += ["", "## Knowledge Gaps"]
         if isolated:
             isolated_labels = [G.nodes[n].get("label", n) for n in isolated[:5]]
-            suffix = f" (+{len(isolated)-5} more)" if len(isolated) > 5 else ""
-            lines.append(f"- **{len(isolated)} isolated node(s):** {', '.join(f'`{l}`' for l in isolated_labels)}{suffix}")
-            lines.append("  These have ≤1 connection - possible missing edges or undocumented components.")
+            suffix = f" (+{len(isolated) - 5} more)" if len(isolated) > 5 else ""
+            lines.append(
+                f"- **{len(isolated)} isolated node(s):** {', '.join(f'`{label}`' for label in isolated_labels)}{suffix}"
+            )
+            lines.append(
+                "  These have ≤1 connection - possible missing edges or undocumented components."
+            )
         if thin_communities:
-            lines.append(f"- **{len(thin_communities)} thin communities (<{min_community_size} nodes) omitted from report** — run `graphify query` to explore isolated nodes.")
+            lines.append(
+                f"- **{len(thin_communities)} thin communities (<{min_community_size} nodes) omitted from report** — run `graphify query` to explore isolated nodes."
+            )
         if amb_pct > 20:
-            lines.append(f"- **High ambiguity: {amb_pct}% of edges are AMBIGUOUS.** Review the Ambiguous Edges section above.")
+            lines.append(
+                f"- **High ambiguity: {amb_pct}% of edges are AMBIGUOUS.** Review the Ambiguous Edges section above."
+            )
 
     if suggested_questions:
         lines += ["", "## Suggested Questions"]
-        no_signal = len(suggested_questions) == 1 and suggested_questions[0].get("type") == "no_signal"
+        no_signal = (
+            len(suggested_questions) == 1 and suggested_questions[0].get("type") == "no_signal"
+        )
         if no_signal:
             lines.append(f"_{suggested_questions[0]['why']}_")
         else:
diff --git a/graphify/security.py b/graphify/security.py
index 91b500f65..00e6d56a5 100644
--- a/graphify/security.py
+++ b/graphify/security.py
@@ -8,6 +8,7 @@
 import urllib.parse
 import urllib.request
 from collections.abc import Mapping
+from email.message import Message
 from pathlib import Path
 from typing import Any
 
@@ -15,14 +16,14 @@
 import socket
 
 _ALLOWED_SCHEMES = {"http", "https"}
-_MAX_FETCH_BYTES = 52_428_800   # 50 MB hard cap for binary downloads
-_MAX_TEXT_BYTES  = 10_485_760   # 10 MB hard cap for HTML / text
+_MAX_FETCH_BYTES = 52_428_800  # 50 MB hard cap for binary downloads
+_MAX_TEXT_BYTES = 10_485_760  # 10 MB hard cap for HTML / text
 
 # Graph-load memory-bomb cap: reject .json files larger than this before
 # JSON-parsing them into a dict. Without this, a multi-gigabyte (or
 # specifically crafted) graph.json can exhaust process memory during
 # json.loads + node_link_graph rehydration.
-_MAX_GRAPH_FILE_BYTES = 512 * 1024 * 1024   # 512 MiB
+_MAX_GRAPH_FILE_BYTES = 512 * 1024 * 1024  # 512 MiB
 
 # AWS metadata, link-local, and common cloud metadata endpoints
 _BLOCKED_HOSTS = {"metadata.google.internal", "metadata.google.com"}
@@ -39,6 +40,7 @@
 # URL validation
 # ---------------------------------------------------------------------------
 
+
 def validate_url(url: str) -> str:
     """Raise ValueError if *url* is not http or https, or targets a private/internal IP.
 
@@ -50,18 +52,14 @@ def validate_url(url: str) -> str:
     parsed = urllib.parse.urlparse(url)
     if parsed.scheme.lower() not in _ALLOWED_SCHEMES:
         raise ValueError(
-            f"Blocked URL scheme '{parsed.scheme}' - only http and https are allowed. "
-            f"Got: {url!r}"
+            f"Blocked URL scheme '{parsed.scheme}' - only http and https are allowed. Got: {url!r}"
         )
 
     hostname = parsed.hostname
     if hostname:
         # Block known cloud metadata hostnames
         if hostname.lower() in _BLOCKED_HOSTS:
-            raise ValueError(
-                f"Blocked cloud metadata endpoint '{hostname}'. "
-                f"Got: {url!r}"
-            )
+            raise ValueError(f"Blocked cloud metadata endpoint '{hostname}'. Got: {url!r}")
 
         # Resolve hostname and block private/reserved IP ranges
         try:
@@ -73,7 +71,13 @@ def validate_url(url: str) -> str:
                 if isinstance(ip, ipaddress.IPv6Address) and ip in _NAT64_WKP:
                     embedded = ipaddress.ip_address(int(ip) & 0xFFFFFFFF)
                     ip = embedded
-                if ip.is_private or ip.is_reserved or ip.is_loopback or ip.is_link_local or ip in _CGN_NETWORK:
+                if (
+                    ip.is_private
+                    or ip.is_reserved
+                    or ip.is_loopback
+                    or ip.is_link_local
+                    or ip in _CGN_NETWORK
+                ):
                     raise ValueError(
                         f"Blocked private/internal IP {addr} (resolved from '{hostname}'). "
                         f"Got: {url!r}"
@@ -104,10 +108,14 @@ def _guarded(host, port, *args, **kwargs):
                 ip = ipaddress.ip_address(addr)
             except ValueError:
                 continue
-            if ip.is_private or ip.is_reserved or ip.is_loopback or ip.is_link_local or ip in _CGN_NETWORK:
-                raise OSError(
-                    f"SSRF blocked: IP {addr} resolved from '{host}' is private/reserved"
-                )
+            if (
+                ip.is_private
+                or ip.is_reserved
+                or ip.is_loopback
+                or ip.is_link_local
+                or ip in _CGN_NETWORK
+            ):
+                raise OSError(f"SSRF blocked: IP {addr} resolved from '{host}' is private/reserved")
         return results
 
     socket.getaddrinfo = _guarded
@@ -125,7 +133,7 @@ class _NoFileRedirectHandler(urllib.request.HTTPRedirectHandler):
     """
 
     def redirect_request(self, req, fp, code, msg, headers, newurl):
-        validate_url(newurl)          # raises ValueError if scheme is wrong
+        validate_url(newurl)  # raises ValueError if scheme is wrong
         return super().redirect_request(req, fp, code, msg, headers, newurl)
 
 
@@ -137,6 +145,7 @@ def _build_opener() -> urllib.request.OpenerDirector:
 # Safe fetch
 # ---------------------------------------------------------------------------
 
+
 def safe_fetch(url: str, max_bytes: int = _MAX_FETCH_BYTES, timeout: int = 30) -> bytes:
     """Fetch *url* and return raw bytes.
 
@@ -162,7 +171,7 @@ def safe_fetch(url: str, max_bytes: int = _MAX_FETCH_BYTES, timeout: int = 30) -
         # with a custom opener we check manually to be safe.
         status = getattr(resp, "status", None) or getattr(resp, "code", None)
         if status is not None and not (200 <= status < 300):
-            raise urllib.error.HTTPError(url, status, f"HTTP {status}", {}, None)
+            raise urllib.error.HTTPError(url, status, f"HTTP {status}", Message(), None)
 
         chunks: list[bytes] = []
         total = 0
@@ -194,6 +203,7 @@ def safe_fetch_text(url: str, max_bytes: int = _MAX_TEXT_BYTES, timeout: int = 1
 # Path validation
 # ---------------------------------------------------------------------------
 
+
 def validate_graph_path(path: str | Path, base: Path | None = None) -> Path:
     """Resolve *path* and verify it stays inside *base*.
 
@@ -217,8 +227,7 @@ def validate_graph_path(path: str | Path, base: Path | None = None) -> Path:
     base = base.resolve()
     if not base.exists():
         raise ValueError(
-            f"Graph base directory does not exist: {base}. "
-            "Run /graphify first to build the graph."
+            f"Graph base directory does not exist: {base}. Run /graphify first to build the graph."
         )
 
     resolved = Path(path).resolve()
@@ -254,8 +263,7 @@ def check_graph_file_size_cap(path: Path) -> None:
         return
     if size > _MAX_GRAPH_FILE_BYTES:
         raise ValueError(
-            f"graph file {path} is {size:_d} bytes, "
-            f"exceeds {_MAX_GRAPH_FILE_BYTES:_d}-byte cap"
+            f"graph file {path} is {size:_d} bytes, exceeds {_MAX_GRAPH_FILE_BYTES:_d}-byte cap"
         )
 
 
diff --git a/graphify/semantic_cleanup.py b/graphify/semantic_cleanup.py
index 6bac6b0d5..f0d96fd69 100644
--- a/graphify/semantic_cleanup.py
+++ b/graphify/semantic_cleanup.py
@@ -28,7 +28,9 @@
 MAX_SEMANTIC_FRAGMENT_HYPEREDGES = 10_000
 MAX_SEMANTIC_HYPEREDGE_NODES = 256
 MAX_SEMANTIC_ID_LENGTH = 256
-VALID_SEMANTIC_FILE_TYPES = frozenset({"code", "document", "paper", "image", "rationale", "concept"})
+VALID_SEMANTIC_FILE_TYPES = frozenset(
+    {"code", "document", "paper", "image", "rationale", "concept"}
+)
 _SEMANTIC_ID_RE = re.compile(r"^[A-Za-z0-9._:-]+$")
 
 
diff --git a/graphify/serve.py b/graphify/serve.py
index 6e5d4a1f6..9fa455bac 100644
--- a/graphify/serve.py
+++ b/graphify/serve.py
@@ -5,6 +5,7 @@
 import re
 import sys
 from pathlib import Path
+from typing import Any, cast
 import networkx as nx
 from networkx.readwrite import json_graph
 from graphify.security import sanitize_label, check_graph_file_size_cap
@@ -12,7 +13,7 @@
 
 try:
     import jieba as _jieba  # type: ignore[import-untyped]
-except ImportError:
+except (ImportError, SyntaxError):
     _jieba = None
 
 
@@ -33,12 +34,14 @@ def _load_graph(graph_path: str) -> nx.Graph:
             return json_graph.node_link_graph(data, edges="links")
         except TypeError:
             return json_graph.node_link_graph(data)
+    except json.JSONDecodeError as exc:
+        print(
+            f"error: graph.json is corrupted ({exc}). Re-run /graphify to rebuild.", file=sys.stderr
+        )
+        sys.exit(1)
     except (ValueError, FileNotFoundError) as exc:
         print(f"error: {exc}", file=sys.stderr)
         sys.exit(1)
-    except json.JSONDecodeError as exc:
-        print(f"error: graph.json is corrupted ({exc}). Re-run /graphify to rebuild.", file=sys.stderr)
-        sys.exit(1)
 
 
 def _communities_from_graph(G: nx.Graph) -> dict[int, list[str]]:
@@ -53,6 +56,7 @@ def _communities_from_graph(G: nx.Graph) -> dict[int, list[str]]:
 
 def _strip_diacritics(text: str) -> str:
     import unicodedata
+
     nfkd = unicodedata.normalize("NFKD", text)
     return "".join(c for c in nfkd if not unicodedata.combining(c))
 
@@ -71,7 +75,7 @@ def _segment_chinese(text: str) -> list[str]:
     if _jieba is not None:
         segments = [w for w in _jieba.cut(text) if len(w.strip()) > 0]
     else:
-        segments = [text[i:i + 2] for i in range(len(text) - 1)] or [text]
+        segments = [text[i : i + 2] for i in range(len(text) - 1)] or [text]
     if len(text) > 1 and text not in segments:
         segments.append(text)
     return segments
@@ -158,7 +162,9 @@ def _score_nodes(G: nx.Graph, terms: list[str]) -> list[tuple[float, str]]:
     return sorted(scored, reverse=True)
 
 
-def _pick_seeds(scored: list[tuple[float, str]], max_k: int = 3, gap_ratio: float = 0.2) -> list[str]:
+def _pick_seeds(
+    scored: list[tuple[float, str]], max_k: int = 3, gap_ratio: float = 0.2
+) -> list[str]:
     """Select BFS seed nodes, stopping when score drops too far below the top.
 
     Prevents high-frequency noise terms (error, exception) from stealing seed
@@ -253,7 +259,9 @@ def _infer_context_filters(question: str) -> list[str]:
     return inferred
 
 
-def _resolve_context_filters(question: str, explicit_filters: list[str] | None = None) -> tuple[list[str], str | None]:
+def _resolve_context_filters(
+    question: str, explicit_filters: list[str] | None = None
+) -> tuple[list[str], str | None]:
     normalized = _normalize_context_filters(explicit_filters)
     if normalized:
         return normalized, "explicit"
@@ -336,7 +344,14 @@ def _dfs(G: nx.Graph, start_nodes: list[str], depth: int) -> tuple[set[str], lis
     return visited, edges_seen
 
 
-def _subgraph_to_text(G: nx.Graph, nodes: set[str], edges: list[tuple], token_budget: int = 2000, *, seeds: list[str] | None = None) -> str:
+def _subgraph_to_text(
+    G: nx.Graph,
+    nodes: set[str],
+    edges: list[tuple],
+    token_budget: int = 2000,
+    *,
+    seeds: list[str] | None = None,
+) -> str:
     """Render subgraph as text, cutting at token_budget (approx 3 chars/token).
 
     seeds: exact-match nodes rendered first before the degree-sorted expansion,
@@ -345,8 +360,9 @@ def _subgraph_to_text(G: nx.Graph, nodes: set[str], edges: list[tuple], token_bu
     char_budget = token_budget * 3
     lines = []
     seed_set = set(seeds or [])
-    ordered = [n for n in (seeds or []) if n in nodes] + \
-              sorted(nodes - seed_set, key=lambda n: G.degree(n), reverse=True)
+    ordered = [n for n in (seeds or []) if n in nodes] + sorted(
+        nodes - seed_set, key=lambda n: G.degree(n), reverse=True
+    )
     for nid in ordered:
         d = G.nodes[nid]
         # Every LLM-derived field passes through sanitize_label before being
@@ -364,7 +380,11 @@ def _subgraph_to_text(G: nx.Graph, nodes: set[str], edges: list[tuple], token_bu
     for u, v in edges:
         if u in nodes and v in nodes:
             raw = G[u][v]
-            d = next(iter(raw.values()), {}) if isinstance(G, (nx.MultiGraph, nx.MultiDiGraph)) else raw
+            d = (
+                next(iter(raw.values()), {})
+                if isinstance(G, (nx.MultiGraph, nx.MultiDiGraph))
+                else raw
+            )
             context = d.get("context")
             context_suffix = f" context={sanitize_label(str(context))}" if context else ""
             line = (
@@ -378,7 +398,7 @@ def _subgraph_to_text(G: nx.Graph, nodes: set[str], edges: list[tuple], token_bu
     if len(output) > char_budget:
         cut_at = output[:char_budget].rfind("\n")
         cut_at = cut_at if cut_at > 0 else char_budget
-        total_nodes = sum(1 for l in lines if l.startswith("NODE "))
+        total_nodes = sum(1 for label in lines if label.startswith("NODE "))
         shown_nodes = output[:cut_at].count("\nNODE ") + (1 if output.startswith("NODE ") else 0)
         cut_count = total_nodes - shown_nodes
         output = (
@@ -405,7 +425,11 @@ def _query_graph_text(
         return "No matching nodes found."
     resolved_filters, filter_source = _resolve_context_filters(question, context_filters)
     traversal_graph = _filter_graph_by_context(G, resolved_filters)
-    nodes, edges = _dfs(traversal_graph, start_nodes, depth) if mode == "dfs" else _bfs(traversal_graph, start_nodes, depth)
+    nodes, edges = (
+        _dfs(traversal_graph, start_nodes, depth)
+        if mode == "dfs"
+        else _bfs(traversal_graph, start_nodes, depth)
+    )
     header_parts = [
         f"Traversal: {mode.upper()} depth={depth}",
         f"Start: {[G.nodes[n].get('label', n) for n in start_nodes]}",
@@ -435,7 +459,9 @@ def _find_node(G: nx.Graph, label: str) -> list[str]:
         nid_lower = nid.lower()
         if term == norm_label or term == bare_label or term == nid_lower:
             exact.append(nid)
-        elif norm_label.startswith(term) or bare_label.startswith(term) or nid_lower.startswith(term):
+        elif (
+            norm_label.startswith(term) or bare_label.startswith(term) or nid_lower.startswith(term)
+        ):
             prefix.append(nid)
         elif term in norm_label:
             substring.append(nid)
@@ -463,8 +489,8 @@ def _relay() -> None:
                     if line.strip():
                         dst.write(line)
                         dst.flush()
-        except Exception:
-            pass
+        except Exception as exc:
+            print(f"[graphify] warning: stdin relay stopped: {exc}", file=sys.stderr)
 
     threading.Thread(target=_relay, daemon=True).start()
     os.dup2(r_fd, sys.stdin.fileno())
@@ -480,7 +506,7 @@ def serve(graph_path: str = "graphify-out/graph.json") -> None:
         from mcp.server import Server
         from mcp.server.stdio import stdio_server
         from mcp import types
-        from mcp.types import AnyUrl
+        from pydantic.networks import AnyUrl
     except ImportError as e:
         raise ImportError('mcp not installed. Run: pip install "graphifyy[mcp]"') from e
 
@@ -533,11 +559,26 @@ async def list_tools() -> list[types.Tool]:
                 inputSchema={
                     "type": "object",
                     "properties": {
-                        "question": {"type": "string", "description": "Natural language question or keyword search"},
-                        "mode": {"type": "string", "enum": ["bfs", "dfs"], "default": "bfs",
-                                 "description": "bfs=broad context, dfs=trace a specific path"},
-                        "depth": {"type": "integer", "default": 3, "description": "Traversal depth (1-6)"},
-                        "token_budget": {"type": "integer", "default": 2000, "description": "Max output tokens"},
+                        "question": {
+                            "type": "string",
+                            "description": "Natural language question or keyword search",
+                        },
+                        "mode": {
+                            "type": "string",
+                            "enum": ["bfs", "dfs"],
+                            "default": "bfs",
+                            "description": "bfs=broad context, dfs=trace a specific path",
+                        },
+                        "depth": {
+                            "type": "integer",
+                            "default": 3,
+                            "description": "Traversal depth (1-6)",
+                        },
+                        "token_budget": {
+                            "type": "integer",
+                            "default": 2000,
+                            "description": "Max output tokens",
+                        },
                         "context_filter": {
                             "type": "array",
                             "items": {"type": "string"},
@@ -552,7 +593,9 @@ async def list_tools() -> list[types.Tool]:
                 description="Get full details for a specific node by label or ID.",
                 inputSchema={
                     "type": "object",
-                    "properties": {"label": {"type": "string", "description": "Node label or ID to look up"}},
+                    "properties": {
+                        "label": {"type": "string", "description": "Node label or ID to look up"}
+                    },
                     "required": ["label"],
                 },
             ),
@@ -563,7 +606,10 @@ async def list_tools() -> list[types.Tool]:
                     "type": "object",
                     "properties": {
                         "label": {"type": "string"},
-                        "relation_filter": {"type": "string", "description": "Optional: filter by relation type"},
+                        "relation_filter": {
+                            "type": "string",
+                            "description": "Optional: filter by relation type",
+                        },
                     },
                     "required": ["label"],
                 },
@@ -573,14 +619,22 @@ async def list_tools() -> list[types.Tool]:
                 description="Get all nodes in a community by community ID.",
                 inputSchema={
                     "type": "object",
-                    "properties": {"community_id": {"type": "integer", "description": "Community ID (0-indexed by size)"}},
+                    "properties": {
+                        "community_id": {
+                            "type": "integer",
+                            "description": "Community ID (0-indexed by size)",
+                        }
+                    },
                     "required": ["community_id"],
                 },
             ),
             types.Tool(
                 name="god_nodes",
                 description="Return the most connected nodes - the core abstractions of the knowledge graph.",
-                inputSchema={"type": "object", "properties": {"top_n": {"type": "integer", "default": 10}}},
+                inputSchema={
+                    "type": "object",
+                    "properties": {"top_n": {"type": "integer", "default": 10}},
+                },
             ),
             types.Tool(
                 name="graph_stats",
@@ -593,9 +647,19 @@ async def list_tools() -> list[types.Tool]:
                 inputSchema={
                     "type": "object",
                     "properties": {
-                        "source": {"type": "string", "description": "Source concept label or keyword"},
-                        "target": {"type": "string", "description": "Target concept label or keyword"},
-                        "max_hops": {"type": "integer", "default": 8, "description": "Maximum hops to consider"},
+                        "source": {
+                            "type": "string",
+                            "description": "Source concept label or keyword",
+                        },
+                        "target": {
+                            "type": "string",
+                            "description": "Target concept label or keyword",
+                        },
+                        "max_hops": {
+                            "type": "integer",
+                            "default": 8,
+                            "description": "Maximum hops to consider",
+                        },
                     },
                     "required": ["source", "target"],
                 },
@@ -610,8 +674,14 @@ async def list_tools() -> list[types.Tool]:
                 inputSchema={
                     "type": "object",
                     "properties": {
-                        "base": {"type": "string", "description": "Base branch to filter PRs by (auto-detected if omitted)"},
-                        "repo": {"type": "string", "description": "GitHub repo (owner/repo). Defaults to current repo."},
+                        "base": {
+                            "type": "string",
+                            "description": "Base branch to filter PRs by (auto-detected if omitted)",
+                        },
+                        "repo": {
+                            "type": "string",
+                            "description": "GitHub repo (owner/repo). Defaults to current repo.",
+                        },
                     },
                 },
             ),
@@ -626,7 +696,10 @@ async def list_tools() -> list[types.Tool]:
                     "type": "object",
                     "properties": {
                         "pr_number": {"type": "integer", "description": "PR number to analyse"},
-                        "repo": {"type": "string", "description": "GitHub repo (owner/repo). Defaults to current repo."},
+                        "repo": {
+                            "type": "string",
+                            "description": "GitHub repo (owner/repo). Defaults to current repo.",
+                        },
                     },
                     "required": ["pr_number"],
                 },
@@ -641,8 +714,14 @@ async def list_tools() -> list[types.Tool]:
                 inputSchema={
                     "type": "object",
                     "properties": {
-                        "base": {"type": "string", "description": "Base branch to filter PRs by (auto-detected if omitted)"},
-                        "repo": {"type": "string", "description": "GitHub repo (owner/repo). Defaults to current repo."},
+                        "base": {
+                            "type": "string",
+                            "description": "Base branch to filter PRs by (auto-detected if omitted)",
+                        },
+                        "repo": {
+                            "type": "string",
+                            "description": "GitHub repo (owner/repo). Defaults to current repo.",
+                        },
                     },
                 },
             ),
@@ -665,20 +744,25 @@ def _tool_query_graph(arguments: dict) -> str:
 
     def _tool_get_node(arguments: dict) -> str:
         label = arguments["label"].lower()
-        matches = [(nid, d) for nid, d in G.nodes(data=True)
-                   if label in (d.get("label") or "").lower() or label == nid.lower()]
+        matches = [
+            (nid, d)
+            for nid, d in G.nodes(data=True)
+            if label in (d.get("label") or "").lower() or label == nid.lower()
+        ]
         if not matches:
             return f"No node matching '{label}' found."
         nid, d = matches[0]
         # Sanitise every LLM-derived field before concatenation (F-010).
-        return "\n".join([
-            f"Node: {sanitize_label(d.get('label', nid))}",
-            f"  ID: {sanitize_label(nid)}",
-            f"  Source: {sanitize_label(str(d.get('source_file', '')))} {sanitize_label(str(d.get('source_location', '')))}",
-            f"  Type: {sanitize_label(str(d.get('file_type', '')))}",
-            f"  Community: {sanitize_label(str(d.get('community', '')))}",
-            f"  Degree: {G.degree(nid)}",
-        ])
+        return "\n".join(
+            [
+                f"Node: {sanitize_label(d.get('label', nid))}",
+                f"  ID: {sanitize_label(nid)}",
+                f"  Source: {sanitize_label(str(d.get('source_file', '')))} {sanitize_label(str(d.get('source_location', '')))}",
+                f"  Type: {sanitize_label(str(d.get('file_type', '')))}",
+                f"  Community: {sanitize_label(str(d.get('community', '')))}",
+                f"  Degree: {G.degree(nid)}",
+            ]
+        )
 
     def _tool_get_neighbors(arguments: dict) -> str:
         label = arguments["label"].lower()
@@ -688,7 +772,8 @@ def _tool_get_neighbors(arguments: dict) -> str:
             return f"No node matching '{label}' found."
         nid = matches[0]
         lines = [f"Neighbors of {sanitize_label(G.nodes[nid].get('label', nid))}:"]
-        for nb in G.successors(nid):
+        directed_graph = cast(Any, G)
+        for nb in directed_graph.successors(nid):
             d = edge_data(G, nid, nb)
             rel = d.get("relation", "")
             if rel_filter and rel_filter not in rel.lower():
@@ -697,7 +782,7 @@ def _tool_get_neighbors(arguments: dict) -> str:
                 f"  --> {sanitize_label(G.nodes[nb].get('label', nb))} "
                 f"[{sanitize_label(str(rel))}] [{sanitize_label(str(d.get('confidence', '')))}]"
             )
-        for nb in G.predecessors(nid):
+        for nb in directed_graph.predecessors(nid):
             d = edge_data(G, nb, nid)
             rel = d.get("relation", "")
             if rel_filter and rel_filter not in rel.lower():
@@ -725,6 +810,7 @@ def _tool_get_community(arguments: dict) -> str:
 
     def _tool_god_nodes(arguments: dict) -> str:
         from graphify.analyze import god_nodes as _god_nodes
+
         nodes = _god_nodes(G, top_n=int(arguments.get("top_n", 10)))
         lines = ["God nodes (most connected):"]
         lines += [f"  {i}. {n['label']} - {n['degree']} edges" for i, n in enumerate(nodes, 1)]
@@ -737,9 +823,9 @@ def _tool_graph_stats(_: dict) -> str:
             f"Nodes: {G.number_of_nodes()}\n"
             f"Edges: {G.number_of_edges()}\n"
             f"Communities: {len(communities)}\n"
-            f"EXTRACTED: {round(confs.count('EXTRACTED')/total*100)}%\n"
-            f"INFERRED: {round(confs.count('INFERRED')/total*100)}%\n"
-            f"AMBIGUOUS: {round(confs.count('AMBIGUOUS')/total*100)}%\n"
+            f"EXTRACTED: {round(confs.count('EXTRACTED') / total * 100)}%\n"
+            f"INFERRED: {round(confs.count('INFERRED') / total * 100)}%\n"
+            f"AMBIGUOUS: {round(confs.count('AMBIGUOUS') / total * 100)}%\n"
         )
 
     def _tool_shortest_path(arguments: dict) -> str:
@@ -799,6 +885,7 @@ def _tool_shortest_path(arguments: dict) -> str:
 
     def _tool_list_prs(arguments: dict) -> str:
         from graphify.prs import fetch_prs, fetch_worktrees, format_prs_text, _detect_default_branch
+
         repo = arguments.get("repo") or None
         base = arguments.get("base") or _detect_default_branch(repo)
         try:
@@ -812,15 +899,21 @@ def _tool_list_prs(arguments: dict) -> str:
 
     def _tool_get_pr_impact(arguments: dict) -> str:
         from graphify.prs import fetch_pr_files, compute_pr_impact, _gh, _parse_ci
+
         number = int(arguments["pr_number"])
         repo = arguments.get("repo") or None
         # Use gh pr view directly — works for any base branch, not just the default
-        view_args = ["pr", "view", str(number), "--json",
-                     "title,headRefName,baseRefName,author,isDraft,reviewDecision,statusCheckRollup,updatedAt"]
+        view_args = [
+            "pr",
+            "view",
+            str(number),
+            "--json",
+            "title,headRefName,baseRefName,author,isDraft,reviewDecision,statusCheckRollup,updatedAt",
+        ]
         if repo:
             view_args += ["--repo", repo]
         pr_data = _gh(*view_args)
-        if pr_data is None:
+        if not isinstance(pr_data, dict):
             return f"PR #{number} not found or gh not authenticated."
         files = fetch_pr_files(number, repo)
         if not files:
@@ -842,7 +935,15 @@ def _tool_get_pr_impact(arguments: dict) -> str:
 
     def _tool_triage_prs(arguments: dict) -> str:
         from concurrent.futures import ThreadPoolExecutor, as_completed
-        from graphify.prs import fetch_prs, fetch_worktrees, fetch_pr_files, compute_pr_impact, _STATUS_ORDER, _detect_default_branch
+        from graphify.prs import (
+            fetch_prs,
+            fetch_worktrees,
+            fetch_pr_files,
+            compute_pr_impact,
+            _STATUS_ORDER,
+            _detect_default_branch,
+        )
+
         repo = arguments.get("repo") or None
         base = arguments.get("base") or _detect_default_branch(repo)
         try:
@@ -852,7 +953,9 @@ def _tool_triage_prs(arguments: dict) -> str:
         worktrees = fetch_worktrees()
         for pr in prs:
             pr.worktree_path = worktrees.get(pr.branch)
-        actionable = [p for p in prs if p.base_branch == base and p.status not in ("WRONG-BASE", "STALE")]
+        actionable = [
+            p for p in prs if p.base_branch == base and p.status not in ("WRONG-BASE", "STALE")
+        ]
         if not actionable:
             return f"No actionable PRs targeting {base}."
         # Fetch diffs concurrently then compute graph impact using in-memory G
@@ -873,7 +976,10 @@ def _tool_triage_prs(arguments: dict) -> str:
             "Rank these by review priority. Higher blast_radius = more graph communities affected = higher merge risk.\n"
         )
         lines = [header]
-        for p in sorted(actionable, key=lambda x: (_STATUS_ORDER.index(x.status) if x.status in _STATUS_ORDER else 99)):
+        for p in sorted(
+            actionable,
+            key=lambda x: _STATUS_ORDER.index(x.status) if x.status in _STATUS_ORDER else 99,
+        ):
             impact = f"  blast_radius={p.blast_radius}" if p.blast_radius else ""
             wt = f"  worktree={p.worktree_path}" if p.worktree_path else ""
             lines.append(
@@ -899,20 +1005,55 @@ def _load_community_labels() -> dict[int, str]:
         labels_path = Path(graph_path).parent / ".graphify_labels.json"
         if labels_path.exists():
             try:
-                return {int(k): v for k, v in json.loads(labels_path.read_text(encoding="utf-8")).items()}
-            except Exception:
-                pass
+                return {
+                    int(k): v
+                    for k, v in json.loads(labels_path.read_text(encoding="utf-8")).items()
+                }
+            except Exception as exc:
+                print(
+                    f"[graphify] warning: could not load community labels: {exc}", file=sys.stderr
+                )
         return {cid: f"Community {cid}" for cid in communities}
 
     @server.list_resources()
     async def list_resources() -> list[types.Resource]:
         return [
-            types.Resource(uri=AnyUrl("graphify://report"), name="Graph Report", description="Full GRAPH_REPORT.md", mimeType="text/markdown"),
-            types.Resource(uri=AnyUrl("graphify://stats"), name="Graph Stats", description="Node/edge/community counts and confidence breakdown", mimeType="text/plain"),
-            types.Resource(uri=AnyUrl("graphify://god-nodes"), name="God Nodes", description="Top 10 most-connected nodes", mimeType="text/plain"),
-            types.Resource(uri=AnyUrl("graphify://surprises"), name="Surprising Connections", description="Cross-community surprising connections", mimeType="text/plain"),
-            types.Resource(uri=AnyUrl("graphify://audit"), name="Confidence Audit", description="EXTRACTED/INFERRED/AMBIGUOUS edge breakdown", mimeType="text/plain"),
-            types.Resource(uri=AnyUrl("graphify://questions"), name="Suggested Questions", description="Suggested questions for this codebase", mimeType="text/plain"),
+            types.Resource(
+                uri=AnyUrl("graphify://report"),
+                name="Graph Report",
+                description="Full GRAPH_REPORT.md",
+                mimeType="text/markdown",
+            ),
+            types.Resource(
+                uri=AnyUrl("graphify://stats"),
+                name="Graph Stats",
+                description="Node/edge/community counts and confidence breakdown",
+                mimeType="text/plain",
+            ),
+            types.Resource(
+                uri=AnyUrl("graphify://god-nodes"),
+                name="God Nodes",
+                description="Top 10 most-connected nodes",
+                mimeType="text/plain",
+            ),
+            types.Resource(
+                uri=AnyUrl("graphify://surprises"),
+                name="Surprising Connections",
+                description="Cross-community surprising connections",
+                mimeType="text/plain",
+            ),
+            types.Resource(
+                uri=AnyUrl("graphify://audit"),
+                name="Confidence Audit",
+                description="EXTRACTED/INFERRED/AMBIGUOUS edge breakdown",
+                mimeType="text/plain",
+            ),
+            types.Resource(
+                uri=AnyUrl("graphify://questions"),
+                name="Suggested Questions",
+                description="Suggested questions for this codebase",
+                mimeType="text/plain",
+            ),
         ]
 
     @server.read_resource()
@@ -931,12 +1072,15 @@ async def read_resource(uri: AnyUrl) -> str:
         if uri_str == "graphify://surprises":
             try:
                 from graphify.analyze import surprising_connections
+
                 surprises = surprising_connections(G, communities, top_n=10)
                 if not surprises:
                     return "No surprising connections found."
                 lines = ["Surprising cross-community connections:"]
                 for s in surprises:
-                    lines.append(f"  {s.get('source', '')} <-> {s.get('target', '')} [{s.get('relation', '')}]")
+                    lines.append(
+                        f"  {s.get('source', '')} <-> {s.get('target', '')} [{s.get('relation', '')}]"
+                    )
                 return "\n".join(lines)
             except Exception as exc:
                 return f"Could not compute surprising connections: {exc}"
@@ -945,13 +1089,14 @@ async def read_resource(uri: AnyUrl) -> str:
             total = len(confs) or 1
             return (
                 f"Total edges: {total}\n"
-                f"EXTRACTED: {confs.count('EXTRACTED')} ({round(confs.count('EXTRACTED')/total*100)}%)\n"
-                f"INFERRED: {confs.count('INFERRED')} ({round(confs.count('INFERRED')/total*100)}%)\n"
-                f"AMBIGUOUS: {confs.count('AMBIGUOUS')} ({round(confs.count('AMBIGUOUS')/total*100)}%)\n"
+                f"EXTRACTED: {confs.count('EXTRACTED')} ({round(confs.count('EXTRACTED') / total * 100)}%)\n"
+                f"INFERRED: {confs.count('INFERRED')} ({round(confs.count('INFERRED') / total * 100)}%)\n"
+                f"AMBIGUOUS: {confs.count('AMBIGUOUS')} ({round(confs.count('AMBIGUOUS') / total * 100)}%)\n"
             )
         if uri_str == "graphify://questions":
             try:
                 from graphify.analyze import suggest_questions
+
                 community_labels = _load_community_labels()
                 questions = suggest_questions(G, communities, community_labels, top_n=10)
                 if not questions:
diff --git a/graphify/symbol_resolution.py b/graphify/symbol_resolution.py
index 5cb0dad15..a19a11dda 100644
--- a/graphify/symbol_resolution.py
+++ b/graphify/symbol_resolution.py
@@ -13,7 +13,6 @@
 from graphify.security import sanitize_metadata
 
 
-
 @dataclass(frozen=True)
 class ImportedSymbol:
     """A Python imported name that can be used as deterministic resolution evidence."""
@@ -243,7 +242,9 @@ def resolve_python_import_guided_calls(
         if path.suffix != ".py":
             continue
         slot: Any = per_file[index] if index < len(per_file) else None
-        result_by_file[str(path)] = slot if isinstance(slot, dict) else {"nodes": [], "edges": []}  # empty fragment for missing/non-dict slots
+        result_by_file[str(path)] = (
+            slot if isinstance(slot, dict) else {"nodes": [], "edges": []}
+        )  # empty fragment for missing/non-dict slots
     resolved_edges: list[dict[str, Any]] = []
 
     for path in paths:
@@ -289,13 +290,15 @@ def resolve_python_import_guided_calls(
                     "source_file": raw_call.get("source_file", source_file),
                     "source_location": raw_call.get("source_location") or imported.source_location,
                     "weight": 1.0,
-                    "metadata": sanitize_metadata({
-                        "resolver": "python_import_guided",
-                        "local_name": imported.local_name,
-                        "imported_name": imported.imported_name,
-                        "module_stem": imported.module_stem,
-                        "import_source_location": imported.source_location,
-                    }),
+                    "metadata": sanitize_metadata(
+                        {
+                            "resolver": "python_import_guided",
+                            "local_name": imported.local_name,
+                            "imported_name": imported.imported_name,
+                            "module_stem": imported.module_stem,
+                            "import_source_location": imported.source_location,
+                        }
+                    ),
                 }
             )
 
@@ -402,7 +405,9 @@ def resolve_bash_source_edges(
           Anything else is silently skipped.
     """
     path_by_index = [Path(p).resolve() for p in paths]
-    file_nid_by_path = {p: _file_node_id_for_path(p, root) for p in path_by_index}  # resolved paths only
+    file_nid_by_path = {
+        p: _file_node_id_for_path(p, root) for p in path_by_index
+    }  # resolved paths only
 
     functions_by_file: dict[str, dict[str, str]] = {}
     for result, path in zip(per_file, path_by_index):
diff --git a/graphify/transcribe.py b/graphify/transcribe.py
index 6d21038f3..2a1ca1fb9 100644
--- a/graphify/transcribe.py
+++ b/graphify/transcribe.py
@@ -4,10 +4,11 @@
 
 import os
 from pathlib import Path
+from typing import Any, cast
 
 
-VIDEO_EXTENSIONS = {'.mp4', '.mov', '.webm', '.mkv', '.avi', '.m4v', '.mp3', '.wav', '.m4a', '.ogg'}
-URL_PREFIXES = ('http://', 'https://', 'www.')
+VIDEO_EXTENSIONS = {".mp4", ".mov", ".webm", ".mkv", ".avi", ".m4v", ".mp3", ".wav", ".m4a", ".ogg"}
+URL_PREFIXES = ("http://", "https://", "www.")
 
 _DEFAULT_MODEL = "base"
 _TRANSCRIPTS_DIR = "graphify-out/transcripts"
@@ -21,22 +22,22 @@ def _model_name() -> str:
 def _get_whisper():
     try:
         from faster_whisper import WhisperModel
+
         return WhisperModel
     except ImportError as exc:
         raise ImportError(
-            "Video transcription requires faster-whisper. "
-            "Run: pip install 'graphifyy[video]'"
+            "Video transcription requires faster-whisper. Run: pip install 'graphifyy[video]'"
         ) from exc
 
 
 def _get_yt_dlp():
     try:
         import yt_dlp
+
         return yt_dlp
     except ImportError as exc:
         raise ImportError(
-            "YouTube/URL download requires yt-dlp. "
-            "Run: pip install 'graphifyy[video]'"
+            "YouTube/URL download requires yt-dlp. Run: pip install 'graphifyy[video]'"
         ) from exc
 
 
@@ -52,35 +53,37 @@ def download_audio(url: str, output_dir: Path) -> Path:
     Uses cached file if already downloaded.
     """
     from graphify.security import validate_url
+
     validate_url(url)  # blocks private IPs, bad schemes before yt-dlp runs
     yt_dlp = _get_yt_dlp()
     output_dir.mkdir(parents=True, exist_ok=True)
 
     # yt-dlp uses %(title)s which can be long/weird — use a stable name based on URL hash
     import hashlib
+
     url_hash = hashlib.sha1(url.encode(), usedforsecurity=False).hexdigest()[:12]
     out_template = str(output_dir / f"yt_{url_hash}.%(ext)s")
 
     # Check for already-downloaded file
-    for ext in ('.m4a', '.opus', '.mp3', '.ogg', '.wav', '.webm'):
+    for ext in (".m4a", ".opus", ".mp3", ".ogg", ".wav", ".webm"):
         candidate = output_dir / f"yt_{url_hash}{ext}"
         if candidate.exists():
             print(f"  cached audio: {candidate.name}")
             return candidate
 
     ydl_opts = {
-        'format': 'bestaudio[ext=m4a]/bestaudio/best',
-        'outtmpl': out_template,
-        'quiet': True,
-        'no_warnings': True,
-        'noplaylist': True,
-        'postprocessors': [],  # no ffmpeg needed — use native audio
+        "format": "bestaudio[ext=m4a]/bestaudio/best",
+        "outtmpl": out_template,
+        "quiet": True,
+        "no_warnings": True,
+        "noplaylist": True,
+        "postprocessors": [],  # no ffmpeg needed — use native audio
     }
 
     print(f"  downloading audio: {url[:80]} ...", flush=True)
-    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
+    with yt_dlp.YoutubeDL(cast(Any, ydl_opts)) as ydl:
         info = ydl.extract_info(url, download=True)
-        ext = info.get('ext', 'm4a')
+        ext = info.get("ext", "m4a")
         downloaded = output_dir / f"yt_{url_hash}.{ext}"
         if not downloaded.exists():
             # yt-dlp may have picked a different extension
diff --git a/graphify/tree_html.py b/graphify/tree_html.py
index 3d825add4..83317ad5d 100644
--- a/graphify/tree_html.py
+++ b/graphify/tree_html.py
@@ -95,7 +95,9 @@ def build_tree(
     dir_index: Dict[str, Dict[str, Any]] = {}
     label_root = project_label or root_path.name or root or "/"
     root_node: Dict[str, Any] = {
-        "name": label_root, "total_count": 0, "children": [],
+        "name": label_root,
+        "total_count": 0,
+        "children": [],
     }
     dir_index[str(root_path)] = root_node
 
@@ -105,8 +107,7 @@ def _ensure_dir(abs_path: Path) -> Dict[str, Any]:
             return dir_index[key]
         if abs_path == abs_path.parent:
             return root_node
-        parent = (_ensure_dir(abs_path.parent)
-                  if abs_path.parent != abs_path else root_node)
+        parent = _ensure_dir(abs_path.parent) if abs_path.parent != abs_path else root_node
         node = {"name": abs_path.name, "total_count": 0, "children": []}
         dir_index[key] = node
         parent["children"].append(node)
@@ -128,16 +129,20 @@ def _ensure_dir(abs_path: Path) -> Dict[str, Any]:
             # Skip the redundant file-name node graphify emits.
             if label == src_path.name and n.get("file_type") == "code":
                 continue
-            sym_children.append({
-                "name": label,
-                "total_count": 1,
-                "children": [],
-            })
+            sym_children.append(
+                {
+                    "name": label,
+                    "total_count": 1,
+                    "children": [],
+                }
+            )
         # Sort: code symbols first by name, then anything else.
-        sym_children.sort(key=lambda c: (
-            c["name"].startswith("_"),
-            c["name"].lower(),
-        ))
+        sym_children.sort(
+            key=lambda c: (
+                c["name"].startswith("_"),
+                c["name"].lower(),
+            )
+        )
         if len(sym_children) > max_children:
             extra = len(sym_children) - max_children
             sym_children = sym_children[:max_children] + [
@@ -153,10 +158,12 @@ def _ensure_dir(abs_path: Path) -> Dict[str, Any]:
     # Sort each dir's children + propagate total_count up.
     def _finalise(d: Dict[str, Any]) -> int:
         kids = d.get("children") or []
-        kids.sort(key=lambda c: (
-            0 if (c.get("children") and len(c["children"]) > 0) else 1,
-            c["name"].lower(),
-        ))
+        kids.sort(
+            key=lambda c: (
+                0 if (c.get("children") and len(c["children"]) > 0) else 1,
+                c["name"].lower(),
+            )
+        )
         if not kids:
             return d.get("total_count") or 1
         n = 0
@@ -570,10 +577,10 @@ def write_tree_html(
     top_k_edges: int = 0,
 ) -> Path:
     from graphify.security import check_graph_file_size_cap
+
     check_graph_file_size_cap(graph_path)
     graph = json.loads(graph_path.read_text(encoding="utf-8"))
-    tree = build_tree(graph, root=root, max_children=max_children,
-                      project_label=project_label)
+    tree = build_tree(graph, root=root, max_children=max_children, project_label=project_label)
     title = f"{tree['name']} — graphify tree viewer"
     header = f"{tree['name']} — Knowledge Graph"
     html = emit_html(tree, title=title, header=header)
diff --git a/graphify/watch.py b/graphify/watch.py
index 8a2c75279..50647f967 100644
--- a/graphify/watch.py
+++ b/graphify/watch.py
@@ -8,6 +8,15 @@
 import time
 from pathlib import Path
 
+from graphify.detect import (
+    CODE_EXTENSIONS,
+    DOC_EXTENSIONS,
+    IMAGE_EXTENSIONS,
+    PAPER_EXTENSIONS,
+    _is_ignored,
+    _load_graphifyignore,
+)
+
 _GRAPHIFY_OUT = os.environ.get("GRAPHIFY_OUT", "graphify-out")
 _PENDING_FILENAME = ".pending_changes"
 _PENDING_DRAIN_MAX_PASSES = 20
@@ -168,6 +177,7 @@ def _apply_resource_limits() -> None:
         return
     try:
         import resource
+
         which = resource.RLIMIT_DATA if sys.platform == "darwin" else resource.RLIMIT_AS
         soft, hard = resource.getrlimit(which)
         new_hard = hard if hard != resource.RLIM_INFINITY and hard < limit else limit
@@ -179,22 +189,14 @@ def _apply_resource_limits() -> None:
 def _git_head() -> str | None:
     """Return current git HEAD commit hash, or None outside a repo."""
     import subprocess as _sp
+
     try:
-        r = _sp.run(["git", "rev-parse", "HEAD"], capture_output=True, text=True, timeout=3)
+        r = _sp.run(["git", "rev-parse", "HEAD"], capture_output=True, text=True, timeout=3)  # nosec B603 B607
         return r.stdout.strip() if r.returncode == 0 else None
     except Exception:
         return None
 
 
-from graphify.detect import (
-    CODE_EXTENSIONS,
-    DOC_EXTENSIONS,
-    PAPER_EXTENSIONS,
-    IMAGE_EXTENSIONS,
-    _load_graphifyignore,
-    _is_ignored,
-)
-
 _WATCHED_EXTENSIONS = CODE_EXTENSIONS | DOC_EXTENSIONS | PAPER_EXTENSIONS | IMAGE_EXTENSIONS
 _CODE_EXTENSIONS = CODE_EXTENSIONS
 
@@ -309,6 +311,7 @@ def _canonical_topology_for_compare(graph_data: dict) -> dict:
 
 def _topology_from_graph(G) -> dict:
     from networkx.readwrite import json_graph
+
     try:
         data = json_graph.node_link_data(G, edges="links")
     except TypeError:
@@ -369,6 +372,7 @@ def _rebuild_code(
     follow_symlinks: bool = False,
     force: bool = False,
     no_cluster: bool = False,
+    no_viz: bool = False,
     acquire_lock: bool = True,
     block_on_lock: bool = False,
 ) -> bool:
@@ -393,6 +397,9 @@ def _rebuild_code(
     ``no_cluster`` skips community detection and writes raw merged extraction
     JSON to graphify-out/graph.json (mirrors ``extract --no-cluster``).
 
+    ``no_viz`` skips graph.html generation while still refreshing graph.json
+    and GRAPH_REPORT.md.
+
     Returns True on success, False on error or skipped-due-to-lock.
     """
     out = watch_path / _GRAPHIFY_OUT
@@ -426,6 +433,7 @@ def _rebuild_code(
                 follow_symlinks=follow_symlinks,
                 force=force,
                 no_cluster=no_cluster,
+                no_viz=no_viz,
                 acquire_lock=False,
             )
             # Late-arrival drain: another hook may have queued work while we
@@ -461,10 +469,10 @@ def _rebuild_code(
         from graphify.security import check_graph_file_size_cap
 
         detected = detect(watch_path, follow_symlinks=follow_symlinks)
-        code_files = [Path(f) for f in detected['files']['code']]
+        code_files = [Path(f) for f in detected["files"]["code"]]
 
         # Include document files that have AST extractors (e.g. .md, .mdx, .qmd)
-        for doc_file in detected['files'].get('document', []):
+        for doc_file in detected["files"].get("document", []):
             p = Path(doc_file)
             if _get_extractor(p) is not None:
                 code_files.append(p)
@@ -497,10 +505,17 @@ def _rebuild_code(
             extract_targets = code_files
 
         commit = _git_head()
-        result = extract(extract_targets, cache_root=watch_root) if extract_targets else {
-            "nodes": [], "edges": [], "hyperedges": [],
-            "input_tokens": 0, "output_tokens": 0,
-        }
+        result = (
+            extract(extract_targets, cache_root=watch_root)
+            if extract_targets
+            else {
+                "nodes": [],
+                "edges": [],
+                "hyperedges": [],
+                "input_tokens": 0,
+                "output_tokens": 0,
+            }
+        )
 
         # Preserve semantic nodes/edges from a previous full run.
         # AST-only rebuild replaces nodes for changed files; everything else is kept.
@@ -536,21 +551,24 @@ def _rebuild_code(
                         sf = n.get("source_file")
                         if not sf:
                             continue
-                        if Path(sf).suffix.lower() not in _CODE_EXTENSIONS:
+                        source_file = str(sf)
+                        if Path(source_file).suffix.lower() not in _CODE_EXTENSIONS:
                             continue
-                        norm = _nsf(sf, _root_str)
+                        norm = _nsf(source_file, _root_str) or source_file
                         if norm not in current_sources:
-                            evict_sources.add(sf)
+                            evict_sources.add(source_file)
                             evict_sources.add(norm)
                             deleted_paths.add(norm)
                 preserved_nodes = [
-                    n for n in existing.get("nodes", [])
+                    n
+                    for n in existing.get("nodes", [])
                     if n["id"] not in new_ast_ids
                     and (not evict_sources or n.get("source_file") not in evict_sources)
                 ]
                 all_ids = new_ast_ids | {n["id"] for n in preserved_nodes}
                 preserved_edges = [
-                    e for e in existing.get("links", existing.get("edges", []))
+                    e
+                    for e in existing.get("links", existing.get("edges", []))
                     if e.get("source") in all_ids and e.get("target") in all_ids
                 ]
                 result = {
@@ -560,8 +578,11 @@ def _rebuild_code(
                     "input_tokens": 0,
                     "output_tokens": 0,
                 }
-            except Exception:
-                pass  # corrupt graph.json - proceed with AST-only
+            except Exception as exc:
+                print(
+                    f"[graphify] warning: could not preserve existing graph data: {exc}",
+                    file=sys.stderr,
+                )
 
         _relativize_source_files(result, project_root)
         out.mkdir(exist_ok=True)
@@ -579,15 +600,22 @@ def _rebuild_code(
                 try:
                     check_graph_file_size_cap(existing_graph)
                     existing_payload = json.loads(existing_graph.read_text(encoding="utf-8"))
-                    same_graph = (
-                        json.dumps(_canonical_graph_for_compare(existing_payload), sort_keys=True, ensure_ascii=False)
-                        == json.dumps(_canonical_graph_for_compare(candidate_graph_data), sort_keys=True, ensure_ascii=False)
+                    same_graph = json.dumps(
+                        _canonical_graph_for_compare(existing_payload),
+                        sort_keys=True,
+                        ensure_ascii=False,
+                    ) == json.dumps(
+                        _canonical_graph_for_compare(candidate_graph_data),
+                        sort_keys=True,
+                        ensure_ascii=False,
                     )
                 except Exception:
                     same_graph = False
             if not same_graph:
                 if not _check_shrink(
-                    force, existing_graph_data, candidate_graph_data,
+                    force,
+                    existing_graph_data,
+                    candidate_graph_data,
                     had_explicit_deletions=bool(deleted_paths),
                 ):
                     return False
@@ -595,9 +623,10 @@ def _rebuild_code(
 
             try:
                 from graphify.detect import save_manifest
+
                 save_manifest(detected["files"], kind="ast")
-            except Exception:
-                pass
+            except Exception as exc:
+                print(f"[graphify] warning: could not save AST manifest: {exc}", file=sys.stderr)
 
             # clear stale needs_update flag if present
             flag = out / "needs_update"
@@ -605,7 +634,9 @@ def _rebuild_code(
                 flag.unlink()
 
             if same_graph:
-                print("[graphify watch] No code-graph changes detected (--no-cluster); outputs left untouched.")
+                print(
+                    "[graphify watch] No code-graph changes detected (--no-cluster); outputs left untouched."
+                )
             else:
                 print(
                     "[graphify watch] Rebuilt (no clustering): "
@@ -615,7 +646,12 @@ def _rebuild_code(
             return True
 
         detection = {
-            "files": {"code": [str(f) for f in code_files], "document": [], "paper": [], "image": []},
+            "files": {
+                "code": [str(f) for f in code_files],
+                "document": [],
+                "paper": [],
+                "image": [],
+            },
             "total_files": len(code_files),
             "total_words": detected.get("total_words", 0),
         }
@@ -624,22 +660,32 @@ def _rebuild_code(
         candidate_topology = _topology_from_graph(G)
         if existing_graph_data:
             try:
-                same_topology = (
-                    json.dumps(_canonical_topology_for_compare(existing_graph_data), sort_keys=True, ensure_ascii=False)
-                    == json.dumps(_canonical_topology_for_compare(candidate_topology), sort_keys=True, ensure_ascii=False)
+                same_topology = json.dumps(
+                    _canonical_topology_for_compare(existing_graph_data),
+                    sort_keys=True,
+                    ensure_ascii=False,
+                ) == json.dumps(
+                    _canonical_topology_for_compare(candidate_topology),
+                    sort_keys=True,
+                    ensure_ascii=False,
                 )
             except Exception:
                 same_topology = False
             if same_topology:
                 try:
                     from graphify.detect import save_manifest
+
                     save_manifest(detected["files"], kind="ast")
-                except Exception:
-                    pass
+                except Exception as exc:
+                    print(
+                        f"[graphify] warning: could not save AST manifest: {exc}", file=sys.stderr
+                    )
                 flag = out / "needs_update"
                 if flag.exists():
                     flag.unlink()
-                print("[graphify watch] No code-graph topology changes detected; outputs left untouched.")
+                print(
+                    "[graphify watch] No code-graph topology changes detected; outputs left untouched."
+                )
                 return True
 
         communities = cluster(G)
@@ -651,7 +697,9 @@ def _rebuild_code(
         surprises = surprising_connections(G, communities)
         labels_file = out / ".graphify_labels.json"
         try:
-            raw = json.loads(labels_file.read_text(encoding="utf-8")) if labels_file.exists() else {}
+            raw = (
+                json.loads(labels_file.read_text(encoding="utf-8")) if labels_file.exists() else {}
+            )
             labels = {int(k): v for k, v in raw.items() if int(k) in communities}
         except Exception:
             raw = {}
@@ -660,11 +708,24 @@ def _rebuild_code(
             if cid not in labels:
                 labels[cid] = "Community " + str(cid)
         questions = suggest_questions(G, communities, labels)
-        report = generate(G, communities, cohesion, labels, gods, surprises, detection,
-                          {"input": 0, "output": 0}, report_root, suggested_questions=questions,
-                          built_at_commit=commit)
+        report = generate(
+            G,
+            communities,
+            cohesion,
+            labels,
+            gods,
+            surprises,
+            detection,
+            {"input": 0, "output": 0},
+            report_root,
+            suggested_questions=questions,
+            built_at_commit=commit,
+        )
         report_path = out / "GRAPH_REPORT.md"
-        labels_json = json.dumps({str(k): v for k, v in sorted(labels.items())}, ensure_ascii=False, indent=2) + "\n"
+        labels_json = (
+            json.dumps({str(k): v for k, v in sorted(labels.items())}, ensure_ascii=False, indent=2)
+            + "\n"
+        )
         graph_tmp = out / ".graph.tmp.json"
         json_written = to_json(G, communities, str(graph_tmp), force=True, built_at_commit=commit)
         if not json_written:
@@ -676,9 +737,14 @@ def _rebuild_code(
             try:
                 check_graph_file_size_cap(existing_graph)
                 existing_payload = json.loads(existing_graph.read_text(encoding="utf-8"))
-                same_graph = (
-                    json.dumps(_canonical_graph_for_compare(existing_payload), sort_keys=True, ensure_ascii=False)
-                    == json.dumps(_canonical_graph_for_compare(candidate_graph_data), sort_keys=True, ensure_ascii=False)
+                same_graph = json.dumps(
+                    _canonical_graph_for_compare(existing_payload),
+                    sort_keys=True,
+                    ensure_ascii=False,
+                ) == json.dumps(
+                    _canonical_graph_for_compare(candidate_graph_data),
+                    sort_keys=True,
+                    ensure_ascii=False,
                 )
             except Exception:
                 same_graph = False
@@ -688,15 +754,20 @@ def _rebuild_code(
         no_change = same_graph and same_report
         if no_change:
             graph_tmp.unlink(missing_ok=True)
-            print("[graphify watch] No code-graph changes detected; graph.json/GRAPH_REPORT.md left untouched.")
+            print(
+                "[graphify watch] No code-graph changes detected; graph.json/GRAPH_REPORT.md left untouched."
+            )
         else:
             if not _check_shrink(
-                force, existing_graph_data, candidate_graph_data,
+                force,
+                existing_graph_data,
+                candidate_graph_data,
                 tmp=graph_tmp,
                 had_explicit_deletions=bool(deleted_paths),
             ):
                 return False
             from graphify.export import backup_if_protected as _backup
+
             _backup(out)
             graph_tmp.replace(existing_graph)
             report_path.write_text(report, encoding="utf-8")
@@ -704,14 +775,19 @@ def _rebuild_code(
 
         try:
             from graphify.detect import save_manifest
+
             save_manifest(detected["files"], kind="ast")
-        except Exception:
-            pass
+        except Exception as exc:
+            print(f"[graphify] warning: could not save AST manifest: {exc}", file=sys.stderr)
 
         # to_html raises ValueError for graphs > MAX_NODES_FOR_VIZ (5000).
         # Wrap so core outputs (graph.json + GRAPH_REPORT.md) always land.
         html_written = False
-        if not no_change:
+        if no_viz:
+            stale = out / "graph.html"
+            if stale.exists():
+                stale.unlink()
+        elif not no_change:
             try:
                 to_html(G, communities, str(out / "graph.html"), community_labels=labels or None)
                 html_written = True
@@ -727,6 +803,7 @@ def _rebuild_code(
         if callflow_files and not no_change:
             try:
                 from graphify.callflow_html import write_callflow_html
+
                 for cf in callflow_files:
                     write_callflow_html(
                         graph=out / "graph.json",
@@ -744,9 +821,13 @@ def _rebuild_code(
             flag.unlink()
 
         if not no_change:
-            print(f"[graphify watch] Rebuilt: {G.number_of_nodes()} nodes, "
-                  f"{G.number_of_edges()} edges, {len(communities)} communities")
-            products = "graph.json" + (", graph.html" if html_written else "") + " and GRAPH_REPORT.md"
+            print(
+                f"[graphify watch] Rebuilt: {G.number_of_nodes()} nodes, "
+                f"{G.number_of_edges()} edges, {len(communities)} communities"
+            )
+            products = (
+                "graph.json" + (", graph.html" if html_written else "") + " and GRAPH_REPORT.md"
+            )
             if callflow_files:
                 products += f", {len(callflow_files)} callflow HTML"
             print(f"[graphify watch] {products} updated in {out}")
@@ -823,7 +904,7 @@ def on_any_event(self, event):
             nonlocal last_trigger, pending
             if event.is_directory:
                 return
-            path = Path(event.src_path)
+            path = Path(os.fsdecode(event.src_path))
             # Check .graphifyignore BEFORE the extension/dotfile/out filters so
             # the cheapest short-circuit for users with broad ignore patterns
             # (node_modules/, .venv/, build/, …) fires first. _is_ignored
@@ -848,8 +929,10 @@ def on_any_event(self, event):
     observer.start()
 
     print(f"[graphify watch] Watching {watch_path.resolve()} - press Ctrl+C to stop")
-    print(f"[graphify watch] Code changes rebuild graph automatically. "
-          f"Doc/image changes require /graphify --update.")
+    print(
+        "[graphify watch] Code changes rebuild graph automatically. "
+        "Doc/image changes require /graphify --update."
+    )
     print(f"[graphify watch] Debounce: {debounce}s")
 
     try:
@@ -875,9 +958,16 @@ def on_any_event(self, event):
 
 if __name__ == "__main__":
     import argparse
-    parser = argparse.ArgumentParser(description="Watch a folder and auto-update the graphify graph")
+
+    parser = argparse.ArgumentParser(
+        description="Watch a folder and auto-update the graphify graph"
+    )
     parser.add_argument("path", nargs="?", default=".", help="Folder to watch (default: .)")
-    parser.add_argument("--debounce", type=float, default=3.0,
-                        help="Seconds to wait after last change before updating (default: 3)")
+    parser.add_argument(
+        "--debounce",
+        type=float,
+        default=3.0,
+        help="Seconds to wait after last change before updating (default: 3)",
+    )
     args = parser.parse_args()
     watch(Path(args.path), debounce=args.debounce)
diff --git a/graphify/wiki.py b/graphify/wiki.py
index eb662317f..2b5158104 100644
--- a/graphify/wiki.py
+++ b/graphify/wiki.py
@@ -17,13 +17,20 @@ def _safe_filename(name: str) -> str:
     chars to stay well under common filesystem limits.
     """
     import re
+
     s = name.replace("/", "-").replace(" ", "_").replace(":", "-")
-    s = re.sub(r'[<>:"/\\|?*]', '_', s)
-    s = s.strip('. ')
-    return s[:200] if s else 'unnamed'
+    s = re.sub(r'[<>:"/\\|?*]', "_", s)
+    s = s.strip(". ")
+    return s[:200] if s else "unnamed"
 
 
-def _cross_community_links(G: nx.Graph, nodes: list[str], own_cid: int, labels: dict[int, str], node_community: dict[str, int]) -> list[tuple[str, int]]:
+def _cross_community_links(
+    G: nx.Graph,
+    nodes: list[str],
+    own_cid: int,
+    labels: dict[int, str],
+    node_community: dict[str, int],
+) -> list[tuple[str, int]]:
     """Return (community_label, edge_count) pairs for cross-community connections, sorted descending."""
     counts: dict[str, int] = Counter()
     for nid in nodes:
@@ -102,7 +109,9 @@ def _community_article(
     return "\n".join(lines)
 
 
-def _god_node_article(G: nx.Graph, nid: str, labels: dict[int, str], node_community: dict[str, int] | None = None) -> str:
+def _god_node_article(
+    G: nx.Graph, nid: str, labels: dict[int, str], node_community: dict[str, int] | None = None
+) -> str:
     d = G.nodes[nid]
     node_label = d.get("label", nid)
     src = d.get("source_file", "")
@@ -209,6 +218,7 @@ def to_wiki(
     # NetworkX 3.x returns DegreeView({}) for missing nodes instead of raising,
     # which crashes sorted() with TypeError; G.neighbors()/G.nodes[] also raise.
     import sys as _sys
+
     _g_nodes = set(G.nodes)
     _orig_total = sum(len(ns) for ns in communities.values())
     communities = {cid: [n for n in nodes if n in _g_nodes] for cid, nodes in communities.items()}
@@ -259,7 +269,9 @@ def _unique_slug(base: str) -> str:
     # Community articles
     for cid, nodes in communities.items():
         label = labels.get(cid, f"Community {cid}")
-        article = _community_article(G, cid, nodes, label, labels, cohesion.get(cid), node_community)
+        article = _community_article(
+            G, cid, nodes, label, labels, cohesion.get(cid), node_community
+        )
         slug = _unique_slug(_safe_filename(label))
         (out / f"{slug}.md").write_text(article, encoding="utf-8")
         count += 1
@@ -269,7 +281,7 @@ def _unique_slug(base: str) -> str:
         nid = node_data.get("id")
         if nid and nid in G:
             article = _god_node_article(G, nid, labels, node_community)
-            slug = _unique_slug(_safe_filename(node_data['label']))
+            slug = _unique_slug(_safe_filename(node_data["label"]))
             (out / f"{slug}.md").write_text(article, encoding="utf-8")
             count += 1
 
diff --git a/pyproject.toml b/pyproject.toml
index 1e3b2d170..7acee8e66 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -49,7 +49,7 @@ Repository = "https://github.com/safishamsi/graphify"
 Issues = "https://github.com/safishamsi/graphify/issues"
 
 [project.optional-dependencies]
-mcp = ["mcp"]
+mcp = ["mcp", "starlette>=1.0.1"]
 neo4j = ["neo4j"]
 pdf = ["pypdf", "markdownify"]
 watch = ["watchdog"]
@@ -65,7 +65,7 @@ gemini = ["openai", "tiktoken"]
 openai = ["openai", "tiktoken"]
 chinese = ["jieba"]
 sql = ["tree-sitter-sql"]
-all = ["mcp", "neo4j", "pypdf", "markdownify", "watchdog", "graspologic; python_version < '3.13'", "python-docx", "openpyxl", "faster-whisper; python_version >= '3.11'", "yt-dlp", "matplotlib", "openai", "tiktoken", "boto3", "tree-sitter-sql", "jieba"]
+all = ["mcp", "starlette>=1.0.1", "neo4j", "pypdf", "markdownify", "watchdog", "graspologic; python_version < '3.13'", "python-docx", "openpyxl", "faster-whisper; python_version >= '3.11'", "yt-dlp", "matplotlib", "openai", "tiktoken", "boto3", "tree-sitter-sql", "jieba"]
 
 [project.scripts]
 graphify = "graphify.__main__:main"
diff --git a/tests/bench_extract.py b/tests/bench_extract.py
index b618d3d78..e63765e15 100644
--- a/tests/bench_extract.py
+++ b/tests/bench_extract.py
@@ -31,8 +31,8 @@
 if str(_project_root) not in sys.path:
     sys.path.insert(0, str(_project_root))
 
-from graphify.extract import extract, collect_files
-from graphify.cache import clear_cache
+from graphify.extract import extract, collect_files  # noqa: E402
+from graphify.cache import clear_cache  # noqa: E402
 
 
 def _count_by_ext(paths: list[Path]) -> dict[str, int]:
@@ -102,9 +102,7 @@ def _run_extraction(
     """Run extraction, return (elapsed_seconds, node_count, edge_count)."""
     clear_cache(cache_root)
     t0 = time.perf_counter()
-    result = extract(
-        paths, cache_root=cache_root, parallel=parallel, max_workers=max_workers
-    )
+    result = extract(paths, cache_root=cache_root, parallel=parallel, max_workers=max_workers)
     elapsed = time.perf_counter() - t0
     nodes = len(result.get("nodes", []))
     edges = len(result.get("edges", []))
@@ -149,9 +147,7 @@ def main() -> None:
     par_time, par_nodes, par_edges = _run_extraction(
         paths, cache_root, parallel=True, max_workers=workers
     )
-    print(
-        f"Parallel ({workers}): {par_time:.2f}s ({par_nodes:,} nodes, {par_edges:,} edges)"
-    )
+    print(f"Parallel ({workers}): {par_time:.2f}s ({par_nodes:,} nodes, {par_edges:,} edges)")
 
     # Results
     print()
diff --git a/tests/test_affected_cli.py b/tests/test_affected_cli.py
index 65a3be8ca..7fb0214f0 100644
--- a/tests/test_affected_cli.py
+++ b/tests/test_affected_cli.py
@@ -12,13 +12,21 @@ def _write_graph(tmp_path):
     graph = nx.DiGraph()
     graph.add_node("target", label="Foo", source_file="pkg/foo.py", source_location="L1")
     graph.add_node("caller", label="X()", source_file="app.py", source_location="L4")
-    graph.add_node("barrel", label="__init__.py", source_file="pkg/__init__.py", source_location=None)
+    graph.add_node(
+        "barrel", label="__init__.py", source_file="pkg/__init__.py", source_location=None
+    )
     graph.add_node("consumer", label="app.py", source_file="app.py", source_location=None)
     graph.add_edge("caller", "target", relation="calls", context="call", confidence="EXTRACTED")
-    graph.add_edge("barrel", "target", relation="re_exports", context="export", confidence="EXTRACTED")
-    graph.add_edge("consumer", "target", relation="imports", context="import", confidence="EXTRACTED")
+    graph.add_edge(
+        "barrel", "target", relation="re_exports", context="export", confidence="EXTRACTED"
+    )
+    graph.add_edge(
+        "consumer", "target", relation="imports", context="import", confidence="EXTRACTED"
+    )
     graph_path = tmp_path / "graph.json"
-    graph_path.write_text(json.dumps(json_graph.node_link_data(graph, edges="links")), encoding="utf-8")
+    graph_path.write_text(
+        json.dumps(json_graph.node_link_data(graph, edges="links")), encoding="utf-8"
+    )
     return graph_path
 
 
diff --git a/tests/test_analyze.py b/tests/test_analyze.py
index ecf1555d3..64d5b768b 100644
--- a/tests/test_analyze.py
+++ b/tests/test_analyze.py
@@ -1,11 +1,21 @@
 """Tests for analyze.py."""
+
 import json
 import networkx as nx
 import pytest
 from pathlib import Path
 from graphify.build import build_from_json
 from graphify.cluster import cluster
-from graphify.analyze import god_nodes, surprising_connections, _is_concept_node, graph_diff, _surprise_score, _file_category, _is_json_key_node, find_import_cycles
+from graphify.analyze import (
+    god_nodes,
+    surprising_connections,
+    _is_concept_node,
+    graph_diff,
+    _surprise_score,
+    _file_category,
+    _is_json_key_node,
+    find_import_cycles,
+)
 from graphify.extract import _make_id
 
 FIXTURES = Path(__file__).parent / "fixtures"
@@ -52,8 +62,14 @@ def test_surprising_connections_excludes_concept_nodes():
     G = make_graph()
     # Add a concept node with empty source_file
     G.add_node("concept_x", label="Abstract Concept", file_type="document", source_file="")
-    G.add_edge("n_transformer", "concept_x", relation="relates_to",
-               confidence="INFERRED", source_file="", weight=0.5)
+    G.add_edge(
+        "n_transformer",
+        "concept_x",
+        relation="relates_to",
+        confidence="INFERRED",
+        source_file="",
+        weight=0.5,
+    )
     communities = cluster(G)
     surprises = surprising_connections(G, communities)
     labels = [s["source"] for s in surprises] + [s["target"] for s in surprises]
@@ -65,21 +81,49 @@ def test_surprising_connections_single_file_uses_community_bridges():
     G = nx.Graph()
     # Build a graph with 2 clear communities + 1 bridge edge
     for i in range(5):
-        G.add_node(f"a{i}", label=f"A{i}", file_type="code", source_file="single.py",
-                   source_location=f"L{i}")
+        G.add_node(
+            f"a{i}",
+            label=f"A{i}",
+            file_type="code",
+            source_file="single.py",
+            source_location=f"L{i}",
+        )
     for i in range(5):
-        G.add_node(f"b{i}", label=f"B{i}", file_type="code", source_file="single.py",
-                   source_location=f"L{i+10}")
+        G.add_node(
+            f"b{i}",
+            label=f"B{i}",
+            file_type="code",
+            source_file="single.py",
+            source_location=f"L{i + 10}",
+        )
     # Dense intra-community edges
     for i in range(4):
-        G.add_edge(f"a{i}", f"a{i+1}", relation="calls", confidence="EXTRACTED",
-                   source_file="single.py", weight=1.0)
+        G.add_edge(
+            f"a{i}",
+            f"a{i + 1}",
+            relation="calls",
+            confidence="EXTRACTED",
+            source_file="single.py",
+            weight=1.0,
+        )
     for i in range(4):
-        G.add_edge(f"b{i}", f"b{i+1}", relation="calls", confidence="EXTRACTED",
-                   source_file="single.py", weight=1.0)
+        G.add_edge(
+            f"b{i}",
+            f"b{i + 1}",
+            relation="calls",
+            confidence="EXTRACTED",
+            source_file="single.py",
+            weight=1.0,
+        )
     # One cross-community bridge
-    G.add_edge("a4", "b0", relation="references", confidence="INFERRED",
-               source_file="single.py", weight=0.5)
+    G.add_edge(
+        "a4",
+        "b0",
+        relation="references",
+        confidence="INFERRED",
+        source_file="single.py",
+        weight=0.5,
+    )
 
     communities = cluster(G)
     surprises = surprising_connections(G, communities)
@@ -97,12 +141,19 @@ def test_surprising_connections_ambiguous_scores_higher_than_extracted():
         ("d", "Delta", "repo2/eval.py"),
     ]:
         G.add_node(nid, label=label, source_file=src, file_type="code")
-    G.add_edge("a", "b", relation="calls", confidence="AMBIGUOUS", weight=1.0, source_file="repo1/model.py")
-    G.add_edge("c", "d", relation="calls", confidence="EXTRACTED", weight=1.0, source_file="repo1/data.py")
-    communities = {0: ["a", "c"], 1: ["b", "d"]}
+    G.add_edge(
+        "a", "b", relation="calls", confidence="AMBIGUOUS", weight=1.0, source_file="repo1/model.py"
+    )
+    G.add_edge(
+        "c", "d", relation="calls", confidence="EXTRACTED", weight=1.0, source_file="repo1/data.py"
+    )
     nc = {"a": 0, "c": 0, "b": 1, "d": 1}
-    score_amb, _ = _surprise_score(G, "a", "b", G.edges["a", "b"], nc, "repo1/model.py", "repo2/train.py")
-    score_ext, _ = _surprise_score(G, "c", "d", G.edges["c", "d"], nc, "repo1/data.py", "repo2/eval.py")
+    score_amb, _ = _surprise_score(
+        G, "a", "b", G.edges["a", "b"], nc, "repo1/model.py", "repo2/train.py"
+    )
+    score_ext, _ = _surprise_score(
+        G, "c", "d", G.edges["c", "d"], nc, "repo1/data.py", "repo2/eval.py"
+    )
     assert score_amb > score_ext
 
 
@@ -137,11 +188,24 @@ def test_surprising_connections_cross_type_scores_higher():
         ("d", "Dataset", "code/data.py"),
     ]:
         G.add_node(nid, label=label, source_file=src, file_type="code")
-    G.add_edge("a", "b", relation="references", confidence="EXTRACTED", weight=1.0, source_file="code/model.py")
-    G.add_edge("c", "d", relation="calls", confidence="EXTRACTED", weight=1.0, source_file="code/train.py")
+    G.add_edge(
+        "a",
+        "b",
+        relation="references",
+        confidence="EXTRACTED",
+        weight=1.0,
+        source_file="code/model.py",
+    )
+    G.add_edge(
+        "c", "d", relation="calls", confidence="EXTRACTED", weight=1.0, source_file="code/train.py"
+    )
     nc = {"a": 0, "b": 1, "c": 0, "d": 0}
-    score_cross, reasons_cross = _surprise_score(G, "a", "b", G.edges["a", "b"], nc, "code/model.py", "papers/flash.pdf")
-    score_same, _ = _surprise_score(G, "c", "d", G.edges["c", "d"], nc, "code/train.py", "code/data.py")
+    score_cross, reasons_cross = _surprise_score(
+        G, "a", "b", G.edges["a", "b"], nc, "code/model.py", "papers/flash.pdf"
+    )
+    score_same, _ = _surprise_score(
+        G, "c", "d", G.edges["c", "d"], nc, "code/train.py", "code/data.py"
+    )
     assert score_cross > score_same
     assert any("code" in r and "paper" in r for r in reasons_cross)
 
@@ -159,51 +223,105 @@ def _make_cross_lang_graph():
 def test_cross_language_inferred_calls_suppressed():
     """Cross-language INFERRED calls edge should score lower than same-language EXTRACTED."""
     G = _make_cross_lang_graph()
-    G.add_edge("py_auth", "ts_member", relation="calls", confidence="INFERRED",
-               weight=0.8, source_file="backend/auth.py")
-    G.add_edge("py_a", "py_b", relation="calls", confidence="EXTRACTED",
-               weight=1.0, source_file="backend/service.py")
+    G.add_edge(
+        "py_auth",
+        "ts_member",
+        relation="calls",
+        confidence="INFERRED",
+        weight=0.8,
+        source_file="backend/auth.py",
+    )
+    G.add_edge(
+        "py_a",
+        "py_b",
+        relation="calls",
+        confidence="EXTRACTED",
+        weight=1.0,
+        source_file="backend/service.py",
+    )
     nc = {"py_auth": 0, "ts_member": 1, "py_a": 0, "py_b": 0}
-    score_cross, _ = _surprise_score(G, "py_auth", "ts_member",
-                                      G.edges["py_auth", "ts_member"], nc,
-                                      "backend/auth.py", "frontend/types.ts")
-    score_same, _ = _surprise_score(G, "py_a", "py_b",
-                                     G.edges["py_a", "py_b"], nc,
-                                     "backend/service.py", "backend/utils.py")
+    score_cross, _ = _surprise_score(
+        G,
+        "py_auth",
+        "ts_member",
+        G.edges["py_auth", "ts_member"],
+        nc,
+        "backend/auth.py",
+        "frontend/types.ts",
+    )
+    score_same, _ = _surprise_score(
+        G, "py_a", "py_b", G.edges["py_a", "py_b"], nc, "backend/service.py", "backend/utils.py"
+    )
     assert score_cross <= score_same
 
 
 def test_cross_language_inferred_uses_suppressed():
     """Cross-language INFERRED uses edge (the exact rsl-siege-manager false positive) should be suppressed."""
     G = _make_cross_lang_graph()
-    G.add_edge("py_auth", "ts_member", relation="uses", confidence="INFERRED",
-               weight=0.8, source_file="backend/auth.py")
-    G.add_edge("py_a", "py_b", relation="calls", confidence="EXTRACTED",
-               weight=1.0, source_file="backend/service.py")
+    G.add_edge(
+        "py_auth",
+        "ts_member",
+        relation="uses",
+        confidence="INFERRED",
+        weight=0.8,
+        source_file="backend/auth.py",
+    )
+    G.add_edge(
+        "py_a",
+        "py_b",
+        relation="calls",
+        confidence="EXTRACTED",
+        weight=1.0,
+        source_file="backend/service.py",
+    )
     nc = {"py_auth": 0, "ts_member": 1, "py_a": 0, "py_b": 0}
-    score_cross, _ = _surprise_score(G, "py_auth", "ts_member",
-                                      G.edges["py_auth", "ts_member"], nc,
-                                      "backend/auth.py", "frontend/types.ts")
-    score_same, _ = _surprise_score(G, "py_a", "py_b",
-                                     G.edges["py_a", "py_b"], nc,
-                                     "backend/service.py", "backend/utils.py")
+    score_cross, _ = _surprise_score(
+        G,
+        "py_auth",
+        "ts_member",
+        G.edges["py_auth", "ts_member"],
+        nc,
+        "backend/auth.py",
+        "frontend/types.ts",
+    )
+    score_same, _ = _surprise_score(
+        G, "py_a", "py_b", G.edges["py_a", "py_b"], nc, "backend/service.py", "backend/utils.py"
+    )
     assert score_cross <= score_same
 
 
 def test_cross_language_semantically_similar_not_suppressed():
     """`semantically_similar_to` across languages is a genuine insight — must not be suppressed."""
     G = _make_cross_lang_graph()
-    G.add_edge("py_auth", "ts_member", relation="semantically_similar_to",
-               confidence="INFERRED", weight=0.85, source_file="backend/auth.py")
-    G.add_edge("py_a", "py_b", relation="calls", confidence="EXTRACTED",
-               weight=1.0, source_file="backend/service.py")
+    G.add_edge(
+        "py_auth",
+        "ts_member",
+        relation="semantically_similar_to",
+        confidence="INFERRED",
+        weight=0.85,
+        source_file="backend/auth.py",
+    )
+    G.add_edge(
+        "py_a",
+        "py_b",
+        relation="calls",
+        confidence="EXTRACTED",
+        weight=1.0,
+        source_file="backend/service.py",
+    )
     nc = {"py_auth": 0, "ts_member": 1, "py_a": 0, "py_b": 0}
-    score_sem, _ = _surprise_score(G, "py_auth", "ts_member",
-                                    G.edges["py_auth", "ts_member"], nc,
-                                    "backend/auth.py", "frontend/types.ts")
-    score_same, _ = _surprise_score(G, "py_a", "py_b",
-                                     G.edges["py_a", "py_b"], nc,
-                                     "backend/service.py", "backend/utils.py")
+    score_sem, _ = _surprise_score(
+        G,
+        "py_auth",
+        "ts_member",
+        G.edges["py_auth", "ts_member"],
+        nc,
+        "backend/auth.py",
+        "frontend/types.ts",
+    )
+    score_same, _ = _surprise_score(
+        G, "py_a", "py_b", G.edges["py_a", "py_b"], nc, "backend/service.py", "backend/utils.py"
+    )
     assert score_sem > score_same
 
 
@@ -214,27 +332,43 @@ def test_same_language_inferred_calls_not_suppressed():
     G.add_node("py_b", label="ModuleB", source_file="src/b.py", file_type="code")
     G.add_node("py_c", label="ModuleC", source_file="src/c.py", file_type="code")
     G.add_node("py_d", label="ModuleD", source_file="src/d.py", file_type="code")
-    G.add_edge("py_a", "py_b", relation="calls", confidence="INFERRED",
-               weight=0.8, source_file="src/a.py")
-    G.add_edge("py_c", "py_d", relation="calls", confidence="EXTRACTED",
-               weight=1.0, source_file="src/c.py")
+    G.add_edge(
+        "py_a", "py_b", relation="calls", confidence="INFERRED", weight=0.8, source_file="src/a.py"
+    )
+    G.add_edge(
+        "py_c", "py_d", relation="calls", confidence="EXTRACTED", weight=1.0, source_file="src/c.py"
+    )
     nc = {"py_a": 0, "py_b": 1, "py_c": 0, "py_d": 1}
-    score_inf, _ = _surprise_score(G, "py_a", "py_b", G.edges["py_a", "py_b"], nc,
-                                    "src/a.py", "src/b.py")
-    score_ext, _ = _surprise_score(G, "py_c", "py_d", G.edges["py_c", "py_d"], nc,
-                                    "src/c.py", "src/d.py")
+    score_inf, _ = _surprise_score(
+        G, "py_a", "py_b", G.edges["py_a", "py_b"], nc, "src/a.py", "src/b.py"
+    )
+    score_ext, _ = _surprise_score(
+        G, "py_c", "py_d", G.edges["py_c", "py_d"], nc, "src/c.py", "src/d.py"
+    )
     assert score_inf > score_ext
 
 
 def test_cross_language_extracted_calls_not_suppressed():
     """EXTRACTED cross-language edges are real structural facts — must not be penalised."""
     G = _make_cross_lang_graph()
-    G.add_edge("py_auth", "ts_member", relation="calls", confidence="EXTRACTED",
-               weight=1.0, source_file="backend/auth.py")
+    G.add_edge(
+        "py_auth",
+        "ts_member",
+        relation="calls",
+        confidence="EXTRACTED",
+        weight=1.0,
+        source_file="backend/auth.py",
+    )
     nc = {"py_auth": 0, "ts_member": 1}
-    score, _ = _surprise_score(G, "py_auth", "ts_member",
-                                G.edges["py_auth", "ts_member"], nc,
-                                "backend/auth.py", "frontend/types.ts")
+    score, _ = _surprise_score(
+        G,
+        "py_auth",
+        "ts_member",
+        G.edges["py_auth", "ts_member"],
+        nc,
+        "backend/auth.py",
+        "frontend/types.ts",
+    )
     assert score >= 1
 
 
@@ -287,6 +421,7 @@ def test_surprising_connections_have_required_keys():
 
 # --- graph_diff tests ---
 
+
 def _make_simple_graph(nodes, edges):
     """Helper: build a small nx.Graph from node/edge specs."""
     G = nx.Graph()
@@ -349,6 +484,7 @@ def test_graph_diff_empty_diff():
 
 # --- code↔doc INFERRED suppression tests ---
 
+
 def _make_code_doc_graph():
     G = nx.Graph()
     G.add_node("py_fn", label="ProcessData", source_file="src/processor.py", file_type="code")
@@ -361,63 +497,105 @@ def _make_code_doc_graph():
 def test_code_doc_inferred_calls_suppressed():
     """Code→doc INFERRED calls edge should score lower than same-language EXTRACTED."""
     G = _make_code_doc_graph()
-    G.add_edge("py_fn", "md_doc", relation="calls", confidence="INFERRED",
-               weight=0.8, source_file="src/processor.py")
-    G.add_edge("py_a", "py_b", relation="calls", confidence="EXTRACTED",
-               weight=1.0, source_file="src/service.py")
+    G.add_edge(
+        "py_fn",
+        "md_doc",
+        relation="calls",
+        confidence="INFERRED",
+        weight=0.8,
+        source_file="src/processor.py",
+    )
+    G.add_edge(
+        "py_a",
+        "py_b",
+        relation="calls",
+        confidence="EXTRACTED",
+        weight=1.0,
+        source_file="src/service.py",
+    )
     nc = {"py_fn": 0, "md_doc": 1, "py_a": 0, "py_b": 0}
-    score_noise, _ = _surprise_score(G, "py_fn", "md_doc",
-                                     G.edges["py_fn", "md_doc"], nc,
-                                     "src/processor.py", "docs/readme.md")
-    score_real, _ = _surprise_score(G, "py_a", "py_b",
-                                    G.edges["py_a", "py_b"], nc,
-                                    "src/service.py", "src/utils.py")
+    score_noise, _ = _surprise_score(
+        G, "py_fn", "md_doc", G.edges["py_fn", "md_doc"], nc, "src/processor.py", "docs/readme.md"
+    )
+    score_real, _ = _surprise_score(
+        G, "py_a", "py_b", G.edges["py_a", "py_b"], nc, "src/service.py", "src/utils.py"
+    )
     assert score_noise <= score_real
 
 
 def test_code_doc_inferred_uses_suppressed():
     """Code→doc INFERRED uses edge should score lower than same-language EXTRACTED."""
     G = _make_code_doc_graph()
-    G.add_edge("py_fn", "md_doc", relation="uses", confidence="INFERRED",
-               weight=0.8, source_file="src/processor.py")
-    G.add_edge("py_a", "py_b", relation="calls", confidence="EXTRACTED",
-               weight=1.0, source_file="src/service.py")
+    G.add_edge(
+        "py_fn",
+        "md_doc",
+        relation="uses",
+        confidence="INFERRED",
+        weight=0.8,
+        source_file="src/processor.py",
+    )
+    G.add_edge(
+        "py_a",
+        "py_b",
+        relation="calls",
+        confidence="EXTRACTED",
+        weight=1.0,
+        source_file="src/service.py",
+    )
     nc = {"py_fn": 0, "md_doc": 1, "py_a": 0, "py_b": 0}
-    score_noise, _ = _surprise_score(G, "py_fn", "md_doc",
-                                     G.edges["py_fn", "md_doc"], nc,
-                                     "src/processor.py", "docs/readme.md")
-    score_real, _ = _surprise_score(G, "py_a", "py_b",
-                                    G.edges["py_a", "py_b"], nc,
-                                    "src/service.py", "src/utils.py")
+    score_noise, _ = _surprise_score(
+        G, "py_fn", "md_doc", G.edges["py_fn", "md_doc"], nc, "src/processor.py", "docs/readme.md"
+    )
+    score_real, _ = _surprise_score(
+        G, "py_a", "py_b", G.edges["py_a", "py_b"], nc, "src/service.py", "src/utils.py"
+    )
     assert score_noise <= score_real
 
 
 def test_code_doc_extracted_calls_not_suppressed():
     """EXTRACTED code↔doc edges are real facts — must not be penalised."""
     G = _make_code_doc_graph()
-    G.add_edge("py_fn", "md_doc", relation="calls", confidence="EXTRACTED",
-               weight=1.0, source_file="src/processor.py")
+    G.add_edge(
+        "py_fn",
+        "md_doc",
+        relation="calls",
+        confidence="EXTRACTED",
+        weight=1.0,
+        source_file="src/processor.py",
+    )
     nc = {"py_fn": 0, "md_doc": 1}
-    score, _ = _surprise_score(G, "py_fn", "md_doc",
-                               G.edges["py_fn", "md_doc"], nc,
-                               "src/processor.py", "docs/readme.md")
+    score, _ = _surprise_score(
+        G, "py_fn", "md_doc", G.edges["py_fn", "md_doc"], nc, "src/processor.py", "docs/readme.md"
+    )
     assert score >= 1
 
 
 def test_code_doc_inferred_semantically_similar_not_suppressed():
     """`semantically_similar_to` across code↔doc is explicit LLM insight — must not be suppressed."""
     G = _make_code_doc_graph()
-    G.add_edge("py_fn", "md_doc", relation="semantically_similar_to",
-               confidence="INFERRED", weight=0.85, source_file="src/processor.py")
-    G.add_edge("py_a", "py_b", relation="calls", confidence="EXTRACTED",
-               weight=1.0, source_file="src/service.py")
+    G.add_edge(
+        "py_fn",
+        "md_doc",
+        relation="semantically_similar_to",
+        confidence="INFERRED",
+        weight=0.85,
+        source_file="src/processor.py",
+    )
+    G.add_edge(
+        "py_a",
+        "py_b",
+        relation="calls",
+        confidence="EXTRACTED",
+        weight=1.0,
+        source_file="src/service.py",
+    )
     nc = {"py_fn": 0, "md_doc": 1, "py_a": 0, "py_b": 0}
-    score_sem, _ = _surprise_score(G, "py_fn", "md_doc",
-                                   G.edges["py_fn", "md_doc"], nc,
-                                   "src/processor.py", "docs/readme.md")
-    score_same, _ = _surprise_score(G, "py_a", "py_b",
-                                    G.edges["py_a", "py_b"], nc,
-                                    "src/service.py", "src/utils.py")
+    score_sem, _ = _surprise_score(
+        G, "py_fn", "md_doc", G.edges["py_fn", "md_doc"], nc, "src/processor.py", "docs/readme.md"
+    )
+    score_same, _ = _surprise_score(
+        G, "py_a", "py_b", G.edges["py_a", "py_b"], nc, "src/service.py", "src/utils.py"
+    )
     assert score_sem > score_same
 
 
@@ -430,17 +608,24 @@ def test_code_unknown_extension_inferred_calls_suppressed():
     G.add_node("unk", label="Handler", source_file="vendor/unknown.xyz", file_type="document")
     G.add_node("py_a", label="A", source_file="src/a.py", file_type="code")
     G.add_node("py_b", label="B", source_file="src/b.py", file_type="code")
-    G.add_edge("py_fn", "unk", relation="calls", confidence="INFERRED",
-               weight=0.8, source_file="src/handler.py")
-    G.add_edge("py_a", "py_b", relation="calls", confidence="EXTRACTED",
-               weight=1.0, source_file="src/a.py")
+    G.add_edge(
+        "py_fn",
+        "unk",
+        relation="calls",
+        confidence="INFERRED",
+        weight=0.8,
+        source_file="src/handler.py",
+    )
+    G.add_edge(
+        "py_a", "py_b", relation="calls", confidence="EXTRACTED", weight=1.0, source_file="src/a.py"
+    )
     nc = {"py_fn": 0, "unk": 1, "py_a": 0, "py_b": 0}
-    score_unk, _ = _surprise_score(G, "py_fn", "unk",
-                                   G.edges["py_fn", "unk"], nc,
-                                   "src/handler.py", "vendor/unknown.xyz")
-    score_same, _ = _surprise_score(G, "py_a", "py_b",
-                                    G.edges["py_a", "py_b"], nc,
-                                    "src/a.py", "src/b.py")
+    score_unk, _ = _surprise_score(
+        G, "py_fn", "unk", G.edges["py_fn", "unk"], nc, "src/handler.py", "vendor/unknown.xyz"
+    )
+    score_same, _ = _surprise_score(
+        G, "py_a", "py_b", G.edges["py_a", "py_b"], nc, "src/a.py", "src/b.py"
+    )
     assert score_unk <= score_same
 
 
@@ -448,26 +633,49 @@ def test_code_paper_inferred_calls_not_suppressed():
     """Code↔paper INFERRED calls should still surface — it is a meaningful link."""
     G = nx.Graph()
     G.add_node("py_model", label="Transformer", source_file="src/model.py", file_type="code")
-    G.add_node("pdf_paper", label="Attention Is All You Need", source_file="papers/vaswani.pdf",
-               file_type="paper")
+    G.add_node(
+        "pdf_paper",
+        label="Attention Is All You Need",
+        source_file="papers/vaswani.pdf",
+        file_type="paper",
+    )
     G.add_node("py_a", label="ServiceA", source_file="src/service.py", file_type="code")
     G.add_node("py_b", label="ServiceB", source_file="src/utils.py", file_type="code")
-    G.add_edge("py_model", "pdf_paper", relation="calls", confidence="INFERRED",
-               weight=0.8, source_file="src/model.py")
-    G.add_edge("py_a", "py_b", relation="calls", confidence="EXTRACTED",
-               weight=1.0, source_file="src/service.py")
+    G.add_edge(
+        "py_model",
+        "pdf_paper",
+        relation="calls",
+        confidence="INFERRED",
+        weight=0.8,
+        source_file="src/model.py",
+    )
+    G.add_edge(
+        "py_a",
+        "py_b",
+        relation="calls",
+        confidence="EXTRACTED",
+        weight=1.0,
+        source_file="src/service.py",
+    )
     nc = {"py_model": 0, "pdf_paper": 1, "py_a": 0, "py_b": 1}
-    score_cross, _ = _surprise_score(G, "py_model", "pdf_paper",
-                                     G.edges["py_model", "pdf_paper"], nc,
-                                     "src/model.py", "papers/vaswani.pdf")
-    score_same, _ = _surprise_score(G, "py_a", "py_b",
-                                    G.edges["py_a", "py_b"], nc,
-                                    "src/service.py", "src/utils.py")
+    score_cross, _ = _surprise_score(
+        G,
+        "py_model",
+        "pdf_paper",
+        G.edges["py_model", "pdf_paper"],
+        nc,
+        "src/model.py",
+        "papers/vaswani.pdf",
+    )
+    score_same, _ = _surprise_score(
+        G, "py_a", "py_b", G.edges["py_a", "py_b"], nc, "src/service.py", "src/utils.py"
+    )
     assert score_cross > score_same
 
 
 # --- JSON key node filtering tests ---
 
+
 def test_is_json_key_node_noise_label():
     G = nx.Graph()
     G.add_node("j1", label="name", source_file="schema.json")
@@ -482,13 +690,17 @@ def test_is_json_key_node_non_json_file():
 
 # --- npm dep-block key god-node filtering tests ---
 
-@pytest.mark.parametrize("dep_key", [
-    "dependencies",
-    "devDependencies",
-    "peerDependencies",
-    "optionalDependencies",
-    "bundledDependencies",
-])
+
+@pytest.mark.parametrize(
+    "dep_key",
+    [
+        "dependencies",
+        "devDependencies",
+        "peerDependencies",
+        "optionalDependencies",
+        "bundledDependencies",
+    ],
+)
 def test_god_nodes_excludes_npm_dep_block_keys(dep_key: str) -> None:
     """npm package.json dep-block keys must be filtered from god_nodes output.
 
@@ -554,8 +766,7 @@ def test_god_nodes_excludes_npm_dep_block_keys(dep_key: str) -> None:
         f"but it appeared in the result: {result}"
     )
     assert "real_node" in result_ids, (
-        f"god_nodes() should include real-domain node 'AuthService' "
-        f"but it was absent: {result}"
+        f"god_nodes() should include real-domain node 'AuthService' but it was absent: {result}"
     )
 
 
diff --git a/tests/test_astro_extraction.py b/tests/test_astro_extraction.py
index c21e66c24..8df0cf8b3 100644
--- a/tests/test_astro_extraction.py
+++ b/tests/test_astro_extraction.py
@@ -7,6 +7,7 @@
 recovers nothing. The :func:`extract_astro` regex pass salvages imports from the
 frontmatter and any `<script>` blocks — same strategy as :func:`extract_svelte`.
 """
+
 from __future__ import annotations
 
 from pathlib import Path
diff --git a/tests/test_benchmark.py b/tests/test_benchmark.py
index b5751adcc..d480ed17b 100644
--- a/tests/test_benchmark.py
+++ b/tests/test_benchmark.py
@@ -1,19 +1,30 @@
 """Tests for graphify/benchmark.py."""
+
 from __future__ import annotations
 import json
 import pytest
 import networkx as nx
 from networkx.readwrite import json_graph
 
-from graphify.benchmark import run_benchmark, print_benchmark, _query_subgraph_tokens, _SAMPLE_QUESTIONS, _safe, _hr
+from graphify.benchmark import (
+    run_benchmark,
+    print_benchmark,
+    _query_subgraph_tokens,
+    _safe,
+    _hr,
+)
 
 
 def _make_graph() -> nx.Graph:
     G = nx.Graph()
-    G.add_node("n1", label="authentication", source_file="auth.py", source_location="L1", community=0)
+    G.add_node(
+        "n1", label="authentication", source_file="auth.py", source_location="L1", community=0
+    )
     G.add_node("n2", label="api_handler", source_file="api.py", source_location="L5", community=0)
     G.add_node("n3", label="main_entry", source_file="main.py", source_location="L1", community=1)
-    G.add_node("n4", label="error_handler", source_file="errors.py", source_location="L1", community=1)
+    G.add_node(
+        "n4", label="error_handler", source_file="errors.py", source_location="L1", community=1
+    )
     G.add_node("n5", label="database_layer", source_file="db.py", source_location="L1", community=2)
     G.add_edge("n1", "n2", relation="calls", confidence="INFERRED")
     G.add_edge("n2", "n3", relation="imports", confidence="EXTRACTED")
@@ -29,16 +40,19 @@ def _write_graph(G: nx.Graph, path) -> None:
 
 # --- _query_subgraph_tokens ---
 
+
 def test_query_returns_positive_for_matching_question():
     G = _make_graph()
     tokens = _query_subgraph_tokens(G, "how does authentication work")
     assert tokens > 0
 
+
 def test_query_returns_zero_for_no_match():
     G = _make_graph()
     tokens = _query_subgraph_tokens(G, "xyzzy plugh zorkmid")
     assert tokens == 0
 
+
 def test_query_bfs_expands_neighbors():
     G = _make_graph()
     # "authentication" matches n1, BFS depth=3 should reach n2, n3, n4
@@ -49,13 +63,16 @@ def test_query_bfs_expands_neighbors():
 
 def test_query_keeps_short_non_english_terms():
     G = nx.Graph()
-    G.add_node("frontend", label="前端", source_file="docs/前端.md", source_location="L1", community=0)
+    G.add_node(
+        "frontend", label="前端", source_file="docs/前端.md", source_location="L1", community=0
+    )
     tokens = _query_subgraph_tokens(G, "前端", depth=1)
     assert tokens > 0
 
 
 # --- run_benchmark ---
 
+
 def test_run_benchmark_returns_reduction(tmp_path):
     G = _make_graph()
     graph_file = tmp_path / "graph.json"
@@ -64,6 +81,7 @@ def test_run_benchmark_returns_reduction(tmp_path):
     assert "reduction_ratio" in result
     assert result["reduction_ratio"] > 1.0
 
+
 def test_run_benchmark_corpus_tokens_proportional(tmp_path):
     G = _make_graph()
     graph_file = tmp_path / "graph.json"
@@ -73,18 +91,23 @@ def test_run_benchmark_corpus_tokens_proportional(tmp_path):
     # corpus_tokens scales linearly with corpus_words (within integer-division rounding)
     assert abs(r2["corpus_tokens"] - r1["corpus_tokens"] * 10) <= r1["corpus_tokens"]
 
+
 def test_run_benchmark_per_question_list(tmp_path):
     G = _make_graph()
     graph_file = tmp_path / "graph.json"
     _write_graph(G, graph_file)
-    result = run_benchmark(str(graph_file), corpus_words=5_000,
-                           questions=["how does authentication work", "what is the main entry"])
+    result = run_benchmark(
+        str(graph_file),
+        corpus_words=5_000,
+        questions=["how does authentication work", "what is the main entry"],
+    )
     assert len(result["per_question"]) >= 1
     for p in result["per_question"]:
         assert "question" in p
         assert "query_tokens" in p
         assert "reduction" in p
 
+
 def test_run_benchmark_estimates_corpus_if_no_words(tmp_path):
     G = _make_graph()
     graph_file = tmp_path / "graph.json"
@@ -92,6 +115,7 @@ def test_run_benchmark_estimates_corpus_if_no_words(tmp_path):
     result = run_benchmark(str(graph_file), corpus_words=None)
     assert result["corpus_words"] > 0
 
+
 def test_run_benchmark_error_on_empty_graph(tmp_path):
     G = nx.Graph()
     graph_file = tmp_path / "empty.json"
@@ -99,6 +123,7 @@ def test_run_benchmark_error_on_empty_graph(tmp_path):
     result = run_benchmark(str(graph_file), corpus_words=1_000)
     assert "error" in result
 
+
 def test_run_benchmark_includes_node_edge_counts(tmp_path):
     G = _make_graph()
     graph_file = tmp_path / "graph.json"
@@ -110,6 +135,7 @@ def test_run_benchmark_includes_node_edge_counts(tmp_path):
 
 # --- print_benchmark ---
 
+
 def test_print_benchmark_no_crash(tmp_path, capsys):
     G = _make_graph()
     graph_file = tmp_path / "graph.json"
@@ -120,6 +146,7 @@ def test_print_benchmark_no_crash(tmp_path, capsys):
     assert "reduction" in out.lower()
     assert "x" in out
 
+
 def test_print_benchmark_error_message(capsys):
     print_benchmark({"error": "test error message"})
     out = capsys.readouterr().out
@@ -131,8 +158,11 @@ def test_print_benchmark_error_message(capsys):
 # unconditionally printed U+2500 and U+2192. _safe() falls back to ASCII when
 # stdout cannot encode the glyph.
 
+
 def test_safe_returns_unicode_when_encodable():
-    import io, sys
+    import io
+    import sys
+
     real_stdout = sys.stdout
     try:
         sys.stdout = io.TextIOWrapper(io.BytesIO(), encoding="utf-8")
@@ -141,8 +171,11 @@ def test_safe_returns_unicode_when_encodable():
     finally:
         sys.stdout = real_stdout
 
+
 def test_safe_falls_back_when_unencodable():
-    import io, sys
+    import io
+    import sys
+
     real_stdout = sys.stdout
     try:
         sys.stdout = io.TextIOWrapper(io.BytesIO(), encoding="cp1252")
@@ -151,9 +184,12 @@ def test_safe_falls_back_when_unencodable():
     finally:
         sys.stdout = real_stdout
 
+
 def test_print_benchmark_survives_cp1252_stdout(tmp_path, monkeypatch, capsys):
     """Regression: U+2500 / U+2192 used to crash with UnicodeEncodeError on cp1252."""
-    import io, sys
+    import io
+    import sys
+
     G = _make_graph()
     graph_file = tmp_path / "graph.json"
     _write_graph(G, graph_file)
diff --git a/tests/test_build.py b/tests/test_build.py
index 19bd65199..3bce5ddb0 100644
--- a/tests/test_build.py
+++ b/tests/test_build.py
@@ -490,13 +490,27 @@ def test_build_merge_prune_absolute_paths_match_relative_nodes(tmp_path):
     graph_path = tmp_path / "graph.json"
 
     # Simulate a graph with relative source_file paths (as built normally)
-    chunk = {"nodes": [
-        {"id": "n1", "label": "login", "file_type": "code", "source_file": "module_a/auth.py"},
-        {"id": "n2", "label": "format_date", "file_type": "code", "source_file": "module_b/utils.py"},
-    ], "edges": [
-        {"source": "n1", "target": "n2", "relation": "calls", "confidence": "EXTRACTED",
-         "source_file": "module_b/utils.py", "weight": 1.0},
-    ]}
+    chunk = {
+        "nodes": [
+            {"id": "n1", "label": "login", "file_type": "code", "source_file": "module_a/auth.py"},
+            {
+                "id": "n2",
+                "label": "format_date",
+                "file_type": "code",
+                "source_file": "module_b/utils.py",
+            },
+        ],
+        "edges": [
+            {
+                "source": "n1",
+                "target": "n2",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "module_b/utils.py",
+                "weight": 1.0,
+            },
+        ],
+    }
     G0 = build([chunk], dedup=False)
     graph_path.write_text(json.dumps(nx.node_link_data(G0, edges="edges")), encoding="utf-8")
 
@@ -519,9 +533,17 @@ def test_build_merge_prune_windows_backslash_paths(tmp_path):
     root.mkdir()
     graph_path = tmp_path / "graph.json"
 
-    chunk = {"nodes": [
-        {"id": "n1", "label": "parse_date", "file_type": "code", "source_file": "module_b/utils.py"},
-    ], "edges": []}
+    chunk = {
+        "nodes": [
+            {
+                "id": "n1",
+                "label": "parse_date",
+                "file_type": "code",
+                "source_file": "module_b/utils.py",
+            },
+        ],
+        "edges": [],
+    }
     G0 = build([chunk], dedup=False)
     graph_path.write_text(json.dumps(nx.node_link_data(G0, edges="edges")), encoding="utf-8")
 
@@ -1136,24 +1158,36 @@ def test_build_merge_directed_override_warns(tmp_path, capsys):
 def test_build_merge_rejects_non_bool_multigraph_in_saved_graph(tmp_path):
     """A saved graph.json with a non-bool 'multigraph' value must be rejected."""
     import json as _json
+
     graph_path = tmp_path / "graph.json"
-    graph_path.write_text(_json.dumps({
-        "directed": True, "multigraph": "false",
-        "nodes": [{"id": "a"}, {"id": "b"}],
-        "links": [{"source": "a", "target": "b", "relation": "calls"}],
-    }))
+    graph_path.write_text(
+        _json.dumps(
+            {
+                "directed": True,
+                "multigraph": "false",
+                "nodes": [{"id": "a"}, {"id": "b"}],
+                "links": [{"source": "a", "target": "b", "relation": "calls"}],
+            }
+        )
+    )
     with pytest.raises(TypeError, match="'multigraph' in .* must be a boolean"):
         build_merge([], graph_path=graph_path)
 
 
 def test_build_merge_rejects_non_bool_directed_in_saved_graph(tmp_path):
     import json as _json
+
     graph_path = tmp_path / "graph.json"
-    graph_path.write_text(_json.dumps({
-        "directed": "true", "multigraph": False,
-        "nodes": [{"id": "a"}, {"id": "b"}],
-        "links": [{"source": "a", "target": "b", "relation": "calls"}],
-    }))
+    graph_path.write_text(
+        _json.dumps(
+            {
+                "directed": "true",
+                "multigraph": False,
+                "nodes": [{"id": "a"}, {"id": "b"}],
+                "links": [{"source": "a", "target": "b", "relation": "calls"}],
+            }
+        )
+    )
     with pytest.raises(TypeError, match="'directed' in .* must be a boolean"):
         build_merge([], graph_path=graph_path)
 
@@ -1232,19 +1266,26 @@ def test_multigraph_preserves_first_explicit_key_in_collision_group():
         ],
         "edges": [
             {
-                "source": "a", "target": "b",
+                "source": "a",
+                "target": "b",
                 "key": "user-key",
-                "relation": "calls", "confidence": "EXTRACTED",
-                "source_file": "a.py", "context": "first",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "a.py",
+                "context": "first",
             },
             {
-                "source": "a", "target": "b",
+                "source": "a",
+                "target": "b",
                 "key": "user-key",
-                "relation": "calls", "confidence": "EXTRACTED",
-                "source_file": "a.py", "context": "second",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "a.py",
+                "context": "second",
             },
         ],
-        "input_tokens": 0, "output_tokens": 0,
+        "input_tokens": 0,
+        "output_tokens": 0,
     }
     G = build_from_json(extraction, multigraph=True)
     keys = set(G["a"]["b"].keys())
diff --git a/tests/test_cache.py b/tests/test_cache.py
index c3f19dd69..1adb1b5d5 100644
--- a/tests/test_cache.py
+++ b/tests/test_cache.py
@@ -1,7 +1,14 @@
 """Tests for graphify/cache.py."""
+
 import pytest
-from pathlib import Path
-from graphify.cache import file_hash, cache_dir, load_cached, save_cached, cached_files, clear_cache, _body_content
+from graphify.cache import (
+    file_hash,
+    load_cached,
+    save_cached,
+    cached_files,
+    clear_cache,
+    _body_content,
+)
 
 
 @pytest.fixture
diff --git a/tests/test_callflow_html.py b/tests/test_callflow_html.py
index 9605c9ba1..d0fcd39d1 100644
--- a/tests/test_callflow_html.py
+++ b/tests/test_callflow_html.py
@@ -14,15 +14,57 @@ def _make_graphify_out(tmp_path: Path) -> Path:
         "multigraph": False,
         "graph": {},
         "nodes": [
-            {"id": "api", "label": "ApiClient", "source_file": "src/api.py", "file_type": "code", "community": 0},
-            {"id": "run", "label": "run()", "source_file": "src/main.py", "file_type": "code", "community": 0},
-            {"id": "export", "label": "write_html()", "source_file": "src/export.py", "file_type": "code", "community": 1},
-            {"id": "evil", "label": "<script>alert(1)</script>", "source_file": "src/evil.py", "file_type": "code", "community": 1},
+            {
+                "id": "api",
+                "label": "ApiClient",
+                "source_file": "src/api.py",
+                "file_type": "code",
+                "community": 0,
+            },
+            {
+                "id": "run",
+                "label": "run()",
+                "source_file": "src/main.py",
+                "file_type": "code",
+                "community": 0,
+            },
+            {
+                "id": "export",
+                "label": "write_html()",
+                "source_file": "src/export.py",
+                "file_type": "code",
+                "community": 1,
+            },
+            {
+                "id": "evil",
+                "label": "<script>alert(1)</script>",
+                "source_file": "src/evil.py",
+                "file_type": "code",
+                "community": 1,
+            },
         ],
         "links": [
-            {"source": "run", "target": "api", "relation": "calls", "confidence": "EXTRACTED", "confidence_score": 1.0},
-            {"source": "api", "target": "export", "relation": "uses", "confidence": "EXTRACTED", "confidence_score": 1.0},
-            {"source": "export", "target": "evil", "relation": "calls", "confidence": "EXTRACTED", "confidence_score": 1.0},
+            {
+                "source": "run",
+                "target": "api",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+            },
+            {
+                "source": "api",
+                "target": "export",
+                "relation": "uses",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+            },
+            {
+                "source": "export",
+                "target": "evil",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+            },
         ],
         "hyperedges": [],
         "built_at_commit": "abcdef123456",
@@ -103,16 +145,36 @@ def test_export_callflow_html_cli_accepts_positional_graph_path(tmp_path):
         "multigraph": False,
         "graph": {},
         "nodes": [
-            {"id": "external", "label": "ExternalOnly", "source_file": "src/external.py", "file_type": "code", "community": 0},
-            {"id": "writer", "label": "write_external()", "source_file": "src/writer.py", "file_type": "code", "community": 1},
+            {
+                "id": "external",
+                "label": "ExternalOnly",
+                "source_file": "src/external.py",
+                "file_type": "code",
+                "community": 0,
+            },
+            {
+                "id": "writer",
+                "label": "write_external()",
+                "source_file": "src/writer.py",
+                "file_type": "code",
+                "community": 1,
+            },
         ],
         "links": [
-            {"source": "external", "target": "writer", "relation": "calls", "confidence": "EXTRACTED", "confidence_score": 1.0},
+            {
+                "source": "external",
+                "target": "writer",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+            },
         ],
         "hyperedges": [],
     }
     (external_out / "graph.json").write_text(json.dumps(graph), encoding="utf-8")
-    (external_out / ".graphify_labels.json").write_text(json.dumps({"0": "External Runtime", "1": "External Export"}), encoding="utf-8")
+    (external_out / ".graphify_labels.json").write_text(
+        json.dumps({"0": "External Runtime", "1": "External Export"}), encoding="utf-8"
+    )
     (external_out / "GRAPH_REPORT.md").write_text(
         "\n".join(
             [
@@ -156,10 +218,25 @@ def test_export_callflow_html_cli_accepts_positional_graph_path(tmp_path):
 
 def test_derive_sections_groups_by_architecture_keywords():
     nodes = [
-        {"id": "extract_py", "label": "extract_python", "source_file": "graphify/extract.py", "community": 0},
-        {"id": "extract_js", "label": "extract_js", "source_file": "graphify/extract.py", "community": 0},
+        {
+            "id": "extract_py",
+            "label": "extract_python",
+            "source_file": "graphify/extract.py",
+            "community": 0,
+        },
+        {
+            "id": "extract_js",
+            "label": "extract_js",
+            "source_file": "graphify/extract.py",
+            "community": 0,
+        },
         {"id": "to_html", "label": "to_html", "source_file": "graphify/export.py", "community": 1},
-        {"id": "test_html", "label": "test_export_html", "source_file": "tests/test_export.py", "community": 2},
+        {
+            "id": "test_html",
+            "label": "test_export_html",
+            "source_file": "tests/test_export.py",
+            "community": 2,
+        },
     ]
 
     sections = derive_sections_from_communities(nodes, {}, "en", 6)
diff --git a/tests/test_charmap_encoding.py b/tests/test_charmap_encoding.py
index f255dbbb6..f9d24c1a8 100644
--- a/tests/test_charmap_encoding.py
+++ b/tests/test_charmap_encoding.py
@@ -11,15 +11,12 @@
   b) Assert that extract_corpus_parallel reports loud failure (non-zero exit
      or summary block) when ≥1 chunk fails.
 """
+
 from __future__ import annotations
 
 import json
-import sys
-from io import StringIO
-from pathlib import Path
-from unittest.mock import MagicMock, call, patch
+from unittest.mock import MagicMock, patch
 
-import pytest
 
 from graphify import llm
 
@@ -31,14 +28,15 @@
     "type": "result",
     "subtype": "success",
     "is_error": False,
-    "result": json.dumps({
-        "nodes": [{"id": "n1", "label": "N1", "file_type": "document",
-                   "source_file": "u.md"}],
-        "edges": [],
-        "hyperedges": [],
-        "input_tokens": 0,
-        "output_tokens": 0,
-    }),
+    "result": json.dumps(
+        {
+            "nodes": [{"id": "n1", "label": "N1", "file_type": "document", "source_file": "u.md"}],
+            "edges": [],
+            "hyperedges": [],
+            "input_tokens": 0,
+            "output_tokens": 0,
+        }
+    ),
     "stop_reason": "end_turn",
     "usage": {
         "input_tokens": 1,
@@ -54,6 +52,7 @@
 
 # ── Test A: subprocess encoding ───────────────────────────────────────────────
 
+
 class TestSubprocessEncoding:
     """_call_claude_cli must pass encoding="utf-8" to subprocess.run so that
     non-ASCII content in chunk messages does not raise UnicodeEncodeError on
@@ -68,8 +67,10 @@ def test_subprocess_called_with_utf8_encoding(self, monkeypatch):
         """subprocess.run must be invoked with encoding='utf-8'."""
         completed = self._make_completed()
         monkeypatch.setattr(llm, "_response_is_hollow", lambda raw, parsed: False)
-        with patch("shutil.which", return_value="/fake/bin/claude"), \
-             patch("subprocess.run", return_value=completed) as mock_run:
+        with (
+            patch("shutil.which", return_value="/fake/bin/claude"),
+            patch("subprocess.run", return_value=completed) as mock_run,
+        ):
             llm._call_claude_cli(_UNICODE_CONTENT, max_tokens=8192)
         _args, kwargs = mock_run.call_args
         assert kwargs.get("encoding") == "utf-8", (
@@ -85,8 +86,10 @@ def test_subprocess_does_not_use_text_true_without_encoding(self, monkeypatch):
         """
         completed = self._make_completed()
         monkeypatch.setattr(llm, "_response_is_hollow", lambda raw, parsed: False)
-        with patch("shutil.which", return_value="/fake/bin/claude"), \
-             patch("subprocess.run", return_value=completed) as mock_run:
+        with (
+            patch("shutil.which", return_value="/fake/bin/claude"),
+            patch("subprocess.run", return_value=completed) as mock_run,
+        ):
             llm._call_claude_cli(_UNICODE_CONTENT, max_tokens=8192)
         _args, kwargs = mock_run.call_args
         # If text=True is present, encoding must also be set to 'utf-8'.
@@ -111,12 +114,12 @@ def test_unicode_chars_survive_subprocess_roundtrip(self, monkeypatch, tmp_path)
 
         completed = self._make_completed()
         monkeypatch.setattr(llm, "_response_is_hollow", lambda raw, parsed: False)
-        with patch("shutil.which", return_value="/fake/bin/claude"), \
-             patch("subprocess.run", return_value=completed):
+        with (
+            patch("shutil.which", return_value="/fake/bin/claude"),
+            patch("subprocess.run", return_value=completed),
+        ):
             # Should not raise
-            result = llm.extract_files_direct(
-                files=[f], backend="claude-cli", root=tmp_path
-            )
+            result = llm.extract_files_direct(files=[f], backend="claude-cli", root=tmp_path)
         assert len(result["nodes"]) >= 1
 
     def test_call_llm_claude_cli_subprocess_encoding(self, monkeypatch):
@@ -126,8 +129,10 @@ def test_call_llm_claude_cli_subprocess_encoding(self, monkeypatch):
             stdout=json.dumps({"result": "ok", "stop_reason": "end_turn"}),
             stderr="",
         )
-        with patch("shutil.which", return_value="/fake/bin/claude"), \
-             patch("subprocess.run", return_value=completed) as mock_run:
+        with (
+            patch("shutil.which", return_value="/fake/bin/claude"),
+            patch("subprocess.run", return_value=completed) as mock_run,
+        ):
             llm._call_llm(_UNICODE_CONTENT, backend="claude-cli", max_tokens=200)
         _args, kwargs = mock_run.call_args
         assert kwargs.get("encoding") == "utf-8", (
@@ -138,6 +143,7 @@ def test_call_llm_claude_cli_subprocess_encoding(self, monkeypatch):
 
 # ── Test B: loud failure on chunk error ────────────────────────────────────────
 
+
 class TestLoudChunkFailure:
     """extract_corpus_parallel must surface chunk failures loudly — either via
     non-zero exit (exception raised from the function) or a printed summary
@@ -196,10 +202,11 @@ def test_no_false_alarm_when_all_chunks_succeed(self, monkeypatch, tmp_path, cap
         f.write_text("z = 1\n", encoding="utf-8")
 
         good_result = {
-            "nodes": [{"id": "n1", "label": "N1", "file_type": "code",
-                       "source_file": str(f)}],
-            "edges": [], "hyperedges": [],
-            "input_tokens": 1, "output_tokens": 1,
+            "nodes": [{"id": "n1", "label": "N1", "file_type": "code", "source_file": str(f)}],
+            "edges": [],
+            "hyperedges": [],
+            "input_tokens": 1,
+            "output_tokens": 1,
             "elapsed_seconds": 0.1,
         }
         monkeypatch.setattr(
@@ -217,6 +224,7 @@ def test_no_false_alarm_when_all_chunks_succeed(self, monkeypatch, tmp_path, cap
 
 # ── Substitution validation (rsl-siege-manager path via Python) ────────────────
 
+
 class TestSubstitutionValidation:
     """Exercises the same code path as the rsl-siege-manager reproduction
     without requiring the `claude` CLI or its auth.
@@ -262,9 +270,7 @@ def test_cp1252_would_fail_but_utf8_succeeds(self, tmp_path):
         try:
             prompt.encode("utf-8")
         except UnicodeEncodeError as e:
-            raise AssertionError(
-                f"UTF-8 encode must succeed but failed: {e}"
-            ) from e
+            raise AssertionError(f"UTF-8 encode must succeed but failed: {e}") from e
 
         # cp1252 must fail (confirming these chars are the failing surface)
         try:
@@ -279,9 +285,7 @@ def test_cp1252_would_fail_but_utf8_succeeds(self, tmp_path):
         except UnicodeEncodeError:
             pass  # Expected — confirms these chars hit the pre-fix failure surface
 
-    def test_subprocess_encoding_kwarg_in_extract_files_direct(
-        self, monkeypatch, tmp_path
-    ):
+    def test_subprocess_encoding_kwarg_in_extract_files_direct(self, monkeypatch, tmp_path):
         """End-to-end path: write unicode file → extract_files_direct → subprocess.
 
         Subprocess must receive encoding='utf-8', not the locale default.
@@ -290,39 +294,49 @@ def test_subprocess_encoding_kwarg_in_extract_files_direct(
         f.write_text(self._UNICODE_CHARS, encoding="utf-8")
 
         _ENVELOPE_SIMPLE = {
-            "type": "result", "subtype": "success", "is_error": False,
-            "result": json.dumps({
-                "nodes": [{"id": "u_chunk", "label": "Unicode Chunk",
-                           "file_type": "document",
-                           "source_file": "unicode_chunk.md"}],
-                "edges": [], "hyperedges": [],
-                "input_tokens": 1, "output_tokens": 1,
-            }),
+            "type": "result",
+            "subtype": "success",
+            "is_error": False,
+            "result": json.dumps(
+                {
+                    "nodes": [
+                        {
+                            "id": "u_chunk",
+                            "label": "Unicode Chunk",
+                            "file_type": "document",
+                            "source_file": "unicode_chunk.md",
+                        }
+                    ],
+                    "edges": [],
+                    "hyperedges": [],
+                    "input_tokens": 1,
+                    "output_tokens": 1,
+                }
+            ),
             "stop_reason": "end_turn",
             "usage": {
-                "input_tokens": 1, "output_tokens": 1,
-                "cache_read_input_tokens": 0, "cache_creation_input_tokens": 0,
+                "input_tokens": 1,
+                "output_tokens": 1,
+                "cache_read_input_tokens": 0,
+                "cache_creation_input_tokens": 0,
             },
             "modelUsage": {
                 "claude-opus-4-7": {"inputTokens": 1, "outputTokens": 1},
             },
         }
-        completed = MagicMock(
-            returncode=0, stdout=json.dumps(_ENVELOPE_SIMPLE), stderr=""
-        )
+        completed = MagicMock(returncode=0, stdout=json.dumps(_ENVELOPE_SIMPLE), stderr="")
         monkeypatch.setattr(llm, "_response_is_hollow", lambda raw, parsed: False)
 
-        with patch("shutil.which", return_value="/fake/bin/claude"), \
-             patch("subprocess.run", return_value=completed) as mock_run:
-            result = llm.extract_files_direct(
-                files=[f], backend="claude-cli", root=tmp_path
-            )
+        with (
+            patch("shutil.which", return_value="/fake/bin/claude"),
+            patch("subprocess.run", return_value=completed) as mock_run,
+        ):
+            result = llm.extract_files_direct(files=[f], backend="claude-cli", root=tmp_path)
 
         assert mock_run.called
         _args, kwargs = mock_run.call_args
         assert kwargs.get("encoding") == "utf-8", (
-            "subprocess.run must be called with encoding='utf-8'; "
-            f"got {kwargs.get('encoding')!r}"
+            f"subprocess.run must be called with encoding='utf-8'; got {kwargs.get('encoding')!r}"
         )
         # Confirm the unicode content was in the input (not truncated/replaced)
         inp = kwargs.get("input", "")
diff --git a/tests/test_chunking.py b/tests/test_chunking.py
index 087464ab8..f037349cf 100644
--- a/tests/test_chunking.py
+++ b/tests/test_chunking.py
@@ -1,6 +1,6 @@
 """Tests for token-aware chunking and parallel chunk execution in graphify.llm."""
+
 import time
-from pathlib import Path
 from unittest.mock import patch
 
 import pytest
@@ -13,12 +13,14 @@ def no_tokenizer():
     compresses repeated/synthetic content heavily, which would make pack-size
     assertions tied to specific input sizes flaky."""
     from graphify import llm
+
     with patch.object(llm, "_TOKENIZER", None):
         yield
 
 
 # ---- Token-aware packing -----------------------------------------------------
 
+
 def test_pack_chunks_packs_small_files_together(tmp_path):
     """Many small files should land in a single chunk, not one chunk per file."""
     from graphify.llm import _pack_chunks_by_tokens
@@ -64,10 +66,14 @@ def test_pack_chunks_groups_by_directory(tmp_path):
     dir_a.mkdir()
     dir_b.mkdir()
 
-    a1 = dir_a / "x.py"; a1.write_text("a")
-    a2 = dir_a / "y.py"; a2.write_text("a")
-    b1 = dir_b / "x.py"; b1.write_text("b")
-    b2 = dir_b / "y.py"; b2.write_text("b")
+    a1 = dir_a / "x.py"
+    a1.write_text("a")
+    a2 = dir_a / "y.py"
+    a2.write_text("a")
+    b1 = dir_b / "x.py"
+    b1.write_text("b")
+    b2 = dir_b / "y.py"
+    b2.write_text("b")
 
     # Big budget — everything fits in one chunk in principle, but the order
     # within the chunk should keep dir_a's files contiguous and dir_b's
@@ -87,8 +93,10 @@ def test_pack_chunks_oversized_file_gets_its_own_chunk(tmp_path, no_tokenizer):
     """A file larger than the budget can't be split — it goes alone in a chunk."""
     from graphify.llm import _pack_chunks_by_tokens
 
-    big = tmp_path / "big.py"; big.write_text("x" * 200_000)  # ~50k tokens (cap-bound)
-    small = tmp_path / "small.py"; small.write_text("x")
+    big = tmp_path / "big.py"
+    big.write_text("x" * 200_000)  # ~50k tokens (cap-bound)
+    small = tmp_path / "small.py"
+    small.write_text("x")
 
     chunks = _pack_chunks_by_tokens([big, small], token_budget=1_000)
     sizes = [len(c) for c in chunks]
@@ -100,13 +108,15 @@ def test_pack_chunks_oversized_file_gets_its_own_chunk(tmp_path, no_tokenizer):
 def test_pack_chunks_rejects_non_positive_budget(tmp_path):
     from graphify.llm import _pack_chunks_by_tokens
 
-    f = tmp_path / "x.py"; f.write_text("a")
+    f = tmp_path / "x.py"
+    f.write_text("a")
     with pytest.raises(ValueError):
         _pack_chunks_by_tokens([f], token_budget=0)
 
 
 # ---- Tokenizer fallback ------------------------------------------------------
 
+
 def test_estimate_file_tokens_uses_tiktoken_when_available(tmp_path):
     """When tiktoken is installed, the estimator should call into it for
     accurate counts rather than the chars/4 heuristic."""
@@ -139,6 +149,7 @@ def test_estimate_file_tokens_falls_back_to_chars_when_no_tokenizer(tmp_path):
 
 # ---- Parallel execution ------------------------------------------------------
 
+
 def _stub_chunk_result(file_count: int, idx: int) -> dict:
     """Build a deterministic fake extraction result for a chunk."""
     return {
@@ -157,7 +168,8 @@ def test_corpus_parallel_runs_chunks_concurrently(tmp_path):
 
     files = []
     for i in range(8):
-        f = tmp_path / f"f{i}.py"; f.write_text("x")
+        f = tmp_path / f"f{i}.py"
+        f.write_text("x")
         files.append(f)
 
     def slow_extract(chunk, **kwargs):
@@ -183,7 +195,8 @@ def test_corpus_parallel_sequential_when_max_concurrency_is_one(tmp_path):
 
     files = []
     for i in range(3):
-        f = tmp_path / f"f{i}.py"; f.write_text("x")
+        f = tmp_path / f"f{i}.py"
+        f.write_text("x")
         files.append(f)
 
     call_order = []
@@ -208,7 +221,8 @@ def test_corpus_parallel_continues_after_chunk_failure(tmp_path, capsys):
 
     files = []
     for i in range(4):
-        f = tmp_path / f"f{i}.py"; f.write_text("x")
+        f = tmp_path / f"f{i}.py"
+        f.write_text("x")
         files.append(f)
 
     call_count = {"n": 0}
@@ -236,7 +250,8 @@ def test_corpus_parallel_legacy_mode_when_token_budget_is_none(tmp_path):
 
     files = []
     for i in range(45):
-        f = tmp_path / f"f{i}.py"; f.write_text("x")
+        f = tmp_path / f"f{i}.py"
+        f.write_text("x")
         files.append(f)
 
     chunks_seen = []
@@ -260,7 +275,8 @@ def test_corpus_parallel_token_budget_default_packs_files(tmp_path):
 
     files = []
     for i in range(50):
-        f = tmp_path / f"f{i}.py"; f.write_text("x = 1\n")
+        f = tmp_path / f"f{i}.py"
+        f.write_text("x = 1\n")
         files.append(f)
 
     chunks_seen = []
@@ -279,6 +295,7 @@ def record(chunk, **kwargs):
 
 # ---- Adaptive retry on truncation -------------------------------------------
 
+
 def _stub_with_finish(file_count: int, finish_reason: str = "stop") -> dict:
     """Build a stub extraction result with a controllable finish_reason."""
     return {
@@ -398,7 +415,8 @@ def test_adaptive_retry_single_file_truncation_does_not_recurse(tmp_path, capsys
     warning and return what we got. No infinite loop."""
     from graphify.llm import _extract_with_adaptive_retry
 
-    f = tmp_path / "huge.py"; f.write_text("x")
+    f = tmp_path / "huge.py"
+    f.write_text("x")
 
     calls = []
 
diff --git a/tests/test_claude_cli_backend.py b/tests/test_claude_cli_backend.py
index eeb6fd27b..1b2c1fa28 100644
--- a/tests/test_claude_cli_backend.py
+++ b/tests/test_claude_cli_backend.py
@@ -3,6 +3,7 @@
 Mocks subprocess.run + shutil.which so the suite runs on CI without
 the `claude` binary or a live network call.
 """
+
 from __future__ import annotations
 
 import json
@@ -16,19 +17,31 @@
     "type": "result",
     "subtype": "success",
     "is_error": False,
-    "result": json.dumps({
-        "nodes": [
-            {"id": "foo_module", "label": "Foo", "file_type": "document", "source_file": "foo.md"},
-            {"id": "foo_greet", "label": "greet", "file_type": "code", "source_file": "foo.md"},
-        ],
-        "edges": [
-            {"source": "foo_module", "target": "foo_greet",
-             "relation": "references", "confidence": "EXTRACTED", "confidence_score": 1.0},
-        ],
-        "hyperedges": [],
-        "input_tokens": 0,
-        "output_tokens": 0,
-    }),
+    "result": json.dumps(
+        {
+            "nodes": [
+                {
+                    "id": "foo_module",
+                    "label": "Foo",
+                    "file_type": "document",
+                    "source_file": "foo.md",
+                },
+                {"id": "foo_greet", "label": "greet", "file_type": "code", "source_file": "foo.md"},
+            ],
+            "edges": [
+                {
+                    "source": "foo_module",
+                    "target": "foo_greet",
+                    "relation": "references",
+                    "confidence": "EXTRACTED",
+                    "confidence_score": 1.0,
+                },
+            ],
+            "hyperedges": [],
+            "input_tokens": 0,
+            "output_tokens": 0,
+        }
+    ),
     "stop_reason": "end_turn",
     "usage": {
         "input_tokens": 6,
@@ -44,8 +57,10 @@
 def fake_claude(monkeypatch):
     completed = MagicMock(returncode=0, stdout=json.dumps(_ENVELOPE), stderr="")
     monkeypatch.setattr(llm, "_response_is_hollow", lambda raw, parsed: False)
-    with patch("shutil.which", return_value="/fake/bin/claude"), \
-         patch("subprocess.run", return_value=completed) as run:
+    with (
+        patch("shutil.which", return_value="/fake/bin/claude"),
+        patch("subprocess.run", return_value=completed) as run,
+    ):
         yield run
 
 
@@ -67,8 +82,10 @@ def test_finish_reason_length_on_max_tokens(monkeypatch):
     envelope = dict(_ENVELOPE, stop_reason="max_tokens")
     completed = MagicMock(returncode=0, stdout=json.dumps(envelope), stderr="")
     monkeypatch.setattr(llm, "_response_is_hollow", lambda raw, parsed: False)
-    with patch("shutil.which", return_value="/fake/bin/claude"), \
-         patch("subprocess.run", return_value=completed):
+    with (
+        patch("shutil.which", return_value="/fake/bin/claude"),
+        patch("subprocess.run", return_value=completed),
+    ):
         result = llm._call_claude_cli("dummy", max_tokens=8192)
     assert result["finish_reason"] == "length"
 
@@ -81,16 +98,20 @@ def test_raises_when_cli_missing():
 
 def test_raises_on_nonzero_exit():
     completed = MagicMock(returncode=2, stdout="", stderr="auth failed")
-    with patch("shutil.which", return_value="/fake/bin/claude"), \
-         patch("subprocess.run", return_value=completed):
+    with (
+        patch("shutil.which", return_value="/fake/bin/claude"),
+        patch("subprocess.run", return_value=completed),
+    ):
         with pytest.raises(RuntimeError, match="exited 2"):
             llm._call_claude_cli("dummy", max_tokens=8192)
 
 
 def test_raises_on_garbage_envelope():
     completed = MagicMock(returncode=0, stdout="not json", stderr="")
-    with patch("shutil.which", return_value="/fake/bin/claude"), \
-         patch("subprocess.run", return_value=completed):
+    with (
+        patch("shutil.which", return_value="/fake/bin/claude"),
+        patch("subprocess.run", return_value=completed),
+    ):
         with pytest.raises(RuntimeError, match="unparseable JSON envelope"):
             llm._call_claude_cli("dummy", max_tokens=8192)
 
diff --git a/tests/test_claude_md.py b/tests/test_claude_md.py
index f81f10dd3..3198b797e 100644
--- a/tests/test_claude_md.py
+++ b/tests/test_claude_md.py
@@ -1,13 +1,13 @@
 """Tests for graphify claude install / uninstall commands."""
-from pathlib import Path
-import pytest
-from graphify.__main__ import claude_install, claude_uninstall, _CLAUDE_MD_MARKER, _CLAUDE_MD_SECTION
+
+from graphify.__main__ import claude_install, claude_uninstall, _CLAUDE_MD_MARKER
 
 
 # ---------------------------------------------------------------------------
 # install
 # ---------------------------------------------------------------------------
 
+
 def test_install_creates_claude_md(tmp_path):
     """Creates CLAUDE.md when none exists."""
     claude_install(tmp_path)
@@ -58,6 +58,7 @@ def test_install_idempotent_message(tmp_path, capsys):
 # uninstall
 # ---------------------------------------------------------------------------
 
+
 def test_uninstall_removes_section(tmp_path):
     """Removes the graphify section after it was installed."""
     claude_install(tmp_path)
@@ -101,9 +102,11 @@ def test_uninstall_no_op_when_no_file(tmp_path, capsys):
 # settings.json PreToolUse hook
 # ---------------------------------------------------------------------------
 
+
 def test_install_creates_settings_json(tmp_path):
     """claude_install also writes .claude/settings.json with PreToolUse hook."""
     import json
+
     claude_install(tmp_path)
     settings_path = tmp_path / ".claude" / "settings.json"
     assert settings_path.exists()
@@ -115,6 +118,7 @@ def test_install_creates_settings_json(tmp_path):
 def test_install_settings_json_idempotent(tmp_path):
     """Running claude_install twice does not duplicate the PreToolUse hook."""
     import json
+
     claude_install(tmp_path)
     claude_install(tmp_path)
     settings_path = tmp_path / ".claude" / "settings.json"
@@ -127,6 +131,7 @@ def test_install_settings_json_idempotent(tmp_path):
 def test_uninstall_removes_settings_hook(tmp_path):
     """claude_uninstall removes the PreToolUse hook from settings.json."""
     import json
+
     claude_install(tmp_path)
     claude_uninstall(tmp_path)
     settings_path = tmp_path / ".claude" / "settings.json"
diff --git a/tests/test_cli_export.py b/tests/test_cli_export.py
index 942dbcf26..8909b8ef0 100644
--- a/tests/test_cli_export.py
+++ b/tests/test_cli_export.py
@@ -3,6 +3,7 @@
 Each test builds a minimal graph in a temp dir, runs the CLI command as a subprocess,
 and asserts the expected output file exists and is non-empty / valid.
 """
+
 from __future__ import annotations
 import json
 import os
@@ -10,13 +11,14 @@
 import sys
 from pathlib import Path
 
-import pytest
 
 PYTHON = sys.executable
 FIXTURES = Path(__file__).parent / "fixtures"
 
 
-def _run(args: list[str], cwd: Path, env: dict[str, str] | None = None) -> subprocess.CompletedProcess:
+def _run(
+    args: list[str], cwd: Path, env: dict[str, str] | None = None
+) -> subprocess.CompletedProcess:
     return subprocess.run(
         [PYTHON, "-m", "graphify"] + args,
         cwd=cwd,
@@ -53,14 +55,13 @@ def _make_graph(tmp_path: Path) -> Path:
         "surprises": surprises,
     }
     (out / ".graphify_analysis.json").write_text(json.dumps(analysis))
-    (out / ".graphify_labels.json").write_text(
-        json.dumps({str(k): v for k, v in labels.items()})
-    )
+    (out / ".graphify_labels.json").write_text(json.dumps({str(k): v for k, v in labels.items()}))
     return out
 
 
 # ── graphify export html ─────────────────────────────────────────────────────
 
+
 def test_export_html_creates_file(tmp_path):
     _make_graph(tmp_path)
     r = _run(["export", "html"], tmp_path)
@@ -83,8 +84,24 @@ def test_export_html_error_without_graph(tmp_path):
     assert r.returncode != 0
 
 
+def test_update_accepts_no_viz_and_removes_stale_html(tmp_path):
+    (tmp_path / "app.py").write_text("def alpha():\n    return 1\n", encoding="utf-8")
+    out = tmp_path / "graphify-out"
+    out.mkdir()
+    stale_html = out / "graph.html"
+    stale_html.write_text("<html/>", encoding="utf-8")
+
+    env = os.environ | {"GRAPHIFY_NO_TIPS": "1"}
+    r = _run(["update", ".", "--force", "--no-viz"], tmp_path, env=env)
+
+    assert r.returncode == 0, r.stderr
+    assert not stale_html.exists()
+    assert "Skipped graph.html" not in r.stdout
+
+
 # ── graphify export obsidian ─────────────────────────────────────────────────
 
+
 def test_export_obsidian_creates_vault(tmp_path):
     _make_graph(tmp_path)
     r = _run(["export", "obsidian"], tmp_path)
@@ -106,6 +123,7 @@ def test_export_obsidian_custom_dir(tmp_path):
 
 # ── graphify export wiki ─────────────────────────────────────────────────────
 
+
 def test_export_wiki_creates_articles(tmp_path):
     _make_graph(tmp_path)
     r = _run(["export", "wiki"], tmp_path)
@@ -130,6 +148,7 @@ def test_export_wiki_accepts_edges_only_graph_json(tmp_path):
 
 # ── graphify export graphml ──────────────────────────────────────────────────
 
+
 def test_export_graphml_creates_file(tmp_path):
     _make_graph(tmp_path)
     r = _run(["export", "graphml"], tmp_path)
@@ -143,6 +162,7 @@ def test_export_graphml_creates_file(tmp_path):
 
 # ── graphify export neo4j (cypher) ───────────────────────────────────────────
 
+
 def test_export_neo4j_creates_cypher(tmp_path):
     _make_graph(tmp_path)
     r = _run(["export", "neo4j"], tmp_path)
@@ -156,6 +176,7 @@ def test_export_neo4j_creates_cypher(tmp_path):
 
 # ── graphify query ───────────────────────────────────────────────────────────
 
+
 def test_query_returns_output(tmp_path):
     _make_graph(tmp_path)
     r = _run(["query", "test"], tmp_path)
@@ -195,6 +216,7 @@ def test_query_uses_graphify_out_env(tmp_path):
 
 # ── graphify path ────────────────────────────────────────────────────────────
 
+
 def test_path_runs_without_error(tmp_path):
     _make_graph(tmp_path)
     r = _run(["path", "Transformer", "LayerNorm"], tmp_path)
@@ -221,6 +243,7 @@ def test_path_uses_graphify_out_env(tmp_path):
 
 # ── graphify explain ─────────────────────────────────────────────────────────
 
+
 def test_explain_runs_without_error(tmp_path):
     _make_graph(tmp_path)
     r = _run(["explain", "test"], tmp_path)
@@ -246,6 +269,7 @@ def test_explain_uses_graphify_out_env(tmp_path):
 
 # ── graphify export unknown format ───────────────────────────────────────────
 
+
 def test_export_unknown_format_fails(tmp_path):
     r = _run(["export", "pdf"], tmp_path)
     assert r.returncode != 0
@@ -267,6 +291,7 @@ def test_update_no_cluster_writes_raw_graph(tmp_path):
 
 # Regression test for #934 - cluster-only crashes when graphify-out/ doesn't exist
 
+
 def test_cluster_only_creates_output_dir_when_missing(tmp_path):
     """cluster-only must not crash with FileNotFoundError when graphify-out/ is absent (#934)."""
     # Build graph.json somewhere other than the default graphify-out/ location
@@ -278,6 +303,7 @@ def test_cluster_only_creates_output_dir_when_missing(tmp_path):
     graph_json = out_dir / "graph.json"
     # Simulate user archiving the output dir before re-clustering
     import shutil
+
     shutil.copy(graph_json, graph_src)
     shutil.rmtree(out_dir)
 
@@ -290,6 +316,7 @@ def test_cluster_only_creates_output_dir_when_missing(tmp_path):
 
 # Regression test for #1027 - cluster-only must remap labels via node overlap
 
+
 def test_cluster_only_remaps_labels_to_previous_cids(tmp_path):
     """cluster-only must invoke remap_communities_to_previous so the existing
     .graphify_labels.json keeps tracking the same conceptual communities after
@@ -350,6 +377,7 @@ def test_cluster_only_remaps_labels_to_previous_cids(tmp_path):
 # silently bails or generates a degraded artifact whenever the sidecar is
 # missing, even though the data is right there.
 
+
 def test_export_html_falls_back_to_node_community_attribute(tmp_path):
     """When .graphify_analysis.json is absent, export html should reconstruct
     communities from the per-node attribute in graph.json rather than bailing
@@ -385,8 +413,7 @@ def test_export_html_fallback_recovers_multiple_communities(tmp_path):
     # And the count we'd reconstruct from graph.json's node attributes
     graph = json.loads((out / "graph.json").read_text(encoding="utf-8"))
     reconstructed_cids = {
-        n["community"] for n in graph.get("nodes", [])
-        if n.get("community") is not None
+        n["community"] for n in graph.get("nodes", []) if n.get("community") is not None
     }
     assert len(reconstructed_cids) == expected_count, (
         f"reconstruction would lose communities: sidecar={expected_count} vs "
diff --git a/tests/test_cluster.py b/tests/test_cluster.py
index 21fd2ca3a..514e4aea0 100644
--- a/tests/test_cluster.py
+++ b/tests/test_cluster.py
@@ -1,5 +1,4 @@
 import json
-import sys
 import networkx as nx
 from pathlib import Path
 from graphify.build import build_from_json
@@ -7,38 +6,45 @@
 
 FIXTURES = Path(__file__).parent / "fixtures"
 
+
 def make_graph():
     return build_from_json(json.loads((FIXTURES / "extraction.json").read_text()))
 
+
 def test_cluster_returns_dict():
     G = make_graph()
     communities = cluster(G)
     assert isinstance(communities, dict)
 
+
 def test_cluster_covers_all_nodes():
     G = make_graph()
     communities = cluster(G)
     all_nodes = {n for nodes in communities.values() for n in nodes}
     assert all_nodes == set(G.nodes)
 
+
 def test_cohesion_score_complete_graph():
     G = nx.complete_graph(4)
     G = nx.relabel_nodes(G, {i: str(i) for i in G.nodes})
     score = cohesion_score(G, list(G.nodes))
     assert score == 1.0
 
+
 def test_cohesion_score_single_node():
     G = nx.Graph()
     G.add_node("a")
     score = cohesion_score(G, ["a"])
     assert score == 1.0
 
+
 def test_cohesion_score_disconnected():
     G = nx.Graph()
     G.add_nodes_from(["a", "b", "c"])
     score = cohesion_score(G, ["a", "b", "c"])
     assert score == 0.0
 
+
 def test_cohesion_score_range():
     G = make_graph()
     communities = cluster(G)
@@ -46,6 +52,7 @@ def test_cohesion_score_range():
         score = cohesion_score(G, nodes)
         assert 0.0 <= score <= 1.0
 
+
 def test_score_all_keys_match_communities():
     G = make_graph()
     communities = cluster(G)
diff --git a/tests/test_confidence.py b/tests/test_confidence.py
index 299548aca..4e6af31f1 100644
--- a/tests/test_confidence.py
+++ b/tests/test_confidence.py
@@ -1,9 +1,9 @@
 """Tests for confidence_score on edges."""
+
 import json
 import tempfile
 from pathlib import Path
 
-import networkx as nx
 
 from graphify.build import build_from_json
 from graphify.cluster import cluster, score_all
@@ -24,12 +24,33 @@ def _make_extraction(**edge_overrides):
             {"id": "n_d", "label": "D", "file_type": "document", "source_file": "d.md"},
         ],
         "edges": [
-            {"source": "n_a", "target": "n_b", "relation": "calls", "confidence": "EXTRACTED",
-             "confidence_score": 1.0, "source_file": "a.py", "weight": 1.0},
-            {"source": "n_b", "target": "n_c", "relation": "implements", "confidence": "INFERRED",
-             "confidence_score": 0.75, "source_file": "b.py", "weight": 0.8},
-            {"source": "n_c", "target": "n_d", "relation": "references", "confidence": "AMBIGUOUS",
-             "confidence_score": 0.2, "source_file": "c.md", "weight": 0.5},
+            {
+                "source": "n_a",
+                "target": "n_b",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "a.py",
+                "weight": 1.0,
+            },
+            {
+                "source": "n_b",
+                "target": "n_c",
+                "relation": "implements",
+                "confidence": "INFERRED",
+                "confidence_score": 0.75,
+                "source_file": "b.py",
+                "weight": 0.8,
+            },
+            {
+                "source": "n_c",
+                "target": "n_d",
+                "relation": "references",
+                "confidence": "AMBIGUOUS",
+                "confidence_score": 0.2,
+                "source_file": "c.md",
+                "weight": 0.5,
+            },
         ],
         "input_tokens": 100,
         "output_tokens": 50,
@@ -108,10 +129,22 @@ def test_to_json_defaults_missing_confidence_score():
         ],
         "edges": [
             # No confidence_score field on any of these
-            {"source": "n_x", "target": "n_y", "relation": "calls",
-             "confidence": "EXTRACTED", "source_file": "x.py", "weight": 1.0},
-            {"source": "n_y", "target": "n_z", "relation": "depends_on",
-             "confidence": "INFERRED", "source_file": "y.py", "weight": 1.0},
+            {
+                "source": "n_x",
+                "target": "n_y",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "x.py",
+                "weight": 1.0,
+            },
+            {
+                "source": "n_y",
+                "target": "n_z",
+                "relation": "depends_on",
+                "confidence": "INFERRED",
+                "source_file": "y.py",
+                "weight": 1.0,
+            },
         ],
         "input_tokens": 0,
         "output_tokens": 0,
@@ -148,7 +181,7 @@ def test_report_shows_avg_confidence_for_inferred():
     report = generate(G, communities, cohesion, labels, gods, surprises, detection, tokens, ".")
     assert "avg confidence" in report, "Report should show avg confidence for INFERRED edges"
     # The fixture has one INFERRED edge with score 0.75, so avg should be 0.75
-    assert "0.75" in report, f"Expected avg confidence 0.75 in report"
+    assert "0.75" in report, "Expected avg confidence 0.75 in report"
 
 
 def test_report_inferred_tag_with_score():
@@ -160,9 +193,15 @@ def test_report_inferred_tag_with_score():
             {"id": "n_q", "label": "Renderer", "file_type": "code", "source_file": "renderer.py"},
         ],
         "edges": [
-            {"source": "n_p", "target": "n_q", "relation": "feeds",
-             "confidence": "INFERRED", "confidence_score": 0.82,
-             "source_file": "parser.py", "weight": 1.0},
+            {
+                "source": "n_p",
+                "target": "n_q",
+                "relation": "feeds",
+                "confidence": "INFERRED",
+                "confidence_score": 0.82,
+                "source_file": "parser.py",
+                "weight": 1.0,
+            },
         ],
         "input_tokens": 0,
         "output_tokens": 0,
diff --git a/tests/test_dedup.py b/tests/test_dedup.py
index 293d2a8fe..b5f90b307 100644
--- a/tests/test_dedup.py
+++ b/tests/test_dedup.py
@@ -1,29 +1,34 @@
 """Tests for graphify/dedup.py entity deduplication pipeline."""
+
 from __future__ import annotations
-import pytest
 from graphify.dedup import deduplicate_entities, _entropy, _shingles
 
 
 # ── entropy gate ─────────────────────────────────────────────────────────────
 
+
 def test_entropy_short_label_low():
     assert _entropy("AI") < 2.5
 
+
 def test_entropy_normal_label_high():
     assert _entropy("AuthenticationManager") >= 2.5
 
+
 def test_entropy_empty_string():
     assert _entropy("") == 0.0
 
 
 # ── shingles ─────────────────────────────────────────────────────────────────
 
+
 def test_shingles_produces_trigrams():
     s = _shingles("hello")
     assert "hel" in s
     assert "ell" in s
     assert "llo" in s
 
+
 def test_shingles_short_string():
     # strings shorter than 3 chars return single shingle of the string itself
     assert _shingles("ab") == {"ab"}
@@ -31,8 +36,13 @@ def test_shingles_short_string():
 
 # ── full pipeline ─────────────────────────────────────────────────────────────
 
+
 def _make_nodes(*labels):
-    return [{"id": label.lower().replace(" ", "_"), "label": label, "source_file": "test.md"} for label in labels]
+    return [
+        {"id": label.lower().replace(" ", "_"), "label": label, "source_file": "test.md"}
+        for label in labels
+    ]
+
 
 def _make_edges(src, tgt, relation="relates_to"):
     return [{"source": src, "target": tgt, "relation": relation}]
@@ -122,9 +132,11 @@ def test_dedup_llm_flag_accepted():
 
 # ── build integration ─────────────────────────────────────────────────────────
 
+
 def test_build_calls_dedup():
     """build() should deduplicate near-identical nodes across extractions."""
     from graphify.build import build
+
     chunk1 = {
         "nodes": [{"id": "graphextractor", "label": "GraphExtractor", "source_file": "a.py"}],
         "edges": [],
@@ -139,6 +151,7 @@ def test_build_calls_dedup():
 
 # --- #878: fuzzy dedup false merges on short/variant labels ---
 
+
 def test_dedup_does_not_merge_numeric_variants(tmp_path):
     """Chip SKU variants (ASR1603 vs ASR1605) must not be merged (#878)."""
     nodes = _make_nodes("ASR1603", "ASR1605")
@@ -164,6 +177,7 @@ def test_dedup_still_merges_real_typos():
     """Genuine same-length single-char typos should still merge (#878 non-regression)."""
     from graphify.dedup import _is_variant_pair, _short_label_blocked
     from rapidfuzz.distance import JaroWinkler
+
     a, b = "graphextractor", "graphextractar"
     score = JaroWinkler.normalized_similarity(a, b) * 100
     assert not _is_variant_pair(a, b), "not a variant pair"
@@ -173,6 +187,7 @@ def test_dedup_still_merges_real_typos():
 def test_variant_pair_helper():
     """_is_variant_pair correctly identifies chip-model variant pairs (#878)."""
     from graphify.dedup import _is_variant_pair
+
     assert _is_variant_pair("asr1603", "asr1605")
     assert _is_variant_pair("cortex a55", "cortex a55x")
     assert not _is_variant_pair("graphextractor", "graphextracter")
diff --git a/tests/test_detect.py b/tests/test_detect.py
index 900851802..0357bab57 100644
--- a/tests/test_detect.py
+++ b/tests/test_detect.py
@@ -1,52 +1,74 @@
 from pathlib import Path
-from graphify.detect import classify_file, count_words, detect, detect_incremental, save_manifest, FileType, _looks_like_paper, _is_ignored, _load_graphifyignore, _is_sensitive
+from graphify.detect import (
+    classify_file,
+    count_words,
+    detect,
+    detect_incremental,
+    save_manifest,
+    FileType,
+    _is_ignored,
+    _load_graphifyignore,
+    _is_sensitive,
+)
 
 FIXTURES = Path(__file__).parent / "fixtures"
 
+
 def test_classify_python():
     assert classify_file(Path("foo.py")) == FileType.CODE
 
+
 def test_classify_typescript():
     assert classify_file(Path("bar.ts")) == FileType.CODE
 
+
 def test_classify_markdown():
     assert classify_file(Path("README.md")) == FileType.DOCUMENT
 
+
 def test_classify_pdf():
     assert classify_file(Path("paper.pdf")) == FileType.PAPER
 
+
 def test_classify_pdf_in_xcassets_skipped():
     # PDFs inside Xcode asset catalogs are vector icons, not papers
     asset_pdf = Path("MyApp/Images.xcassets/icon.imageset/icon.pdf")
     assert classify_file(asset_pdf) is None
 
+
 def test_classify_pdf_in_xcassets_root_skipped():
     asset_pdf = Path("Pods/HXPHPicker/Assets.xcassets/photo.pdf")
     assert classify_file(asset_pdf) is None
 
+
 def test_classify_unknown_returns_none():
     assert classify_file(Path("archive.zip")) is None
 
+
 def test_classify_image():
     assert classify_file(Path("screenshot.png")) == FileType.IMAGE
     assert classify_file(Path("design.jpg")) == FileType.IMAGE
     assert classify_file(Path("diagram.webp")) == FileType.IMAGE
 
+
 def test_count_words_sample_md():
     words = count_words(FIXTURES / "sample.md")
     assert words > 5
 
+
 def test_detect_finds_fixtures():
     result = detect(FIXTURES)
     assert result["total_files"] >= 2
     assert "code" in result["files"]
     assert "document" in result["files"]
 
+
 def test_detect_warns_small_corpus():
     result = detect(FIXTURES)
     assert result["needs_graph"] is False
     assert result["warning"] is not None
 
+
 def test_detect_skips_noise_dot_dirs():
     """Noise dot dirs (.next, .nuxt, .graphify cache, …) are skipped (#873).
     Non-noise dot dirs (.github, .claude, …) are now allowed through."""
@@ -301,6 +323,7 @@ def test_detect_incremental_propagates_follow_symlinks(tmp_path, monkeypatch):
 def test_classify_video_extensions():
     """Video and audio file extensions should classify as VIDEO."""
     from graphify.detect import FileType
+
     assert classify_file(Path("lecture.mp4")) == FileType.VIDEO
     assert classify_file(Path("podcast.mp3")) == FileType.VIDEO
     assert classify_file(Path("talk.mov")) == FileType.VIDEO
@@ -399,7 +422,9 @@ def test_detect_skips_visual_tests_dir(tmp_path):
 def test_detect_skips_snapshots_dir(tmp_path):
     """__snapshots__/ and snapshots/ are jest/vitest artefacts — must be excluded."""
     (tmp_path / "__snapshots__").mkdir()
-    (tmp_path / "__snapshots__" / "app.test.ts.snap").write_text("// Jest Snapshot\nexports[`test 1`] = `<div/>`")
+    (tmp_path / "__snapshots__" / "app.test.ts.snap").write_text(
+        "// Jest Snapshot\nexports[`test 1`] = `<div/>`"
+    )
     (tmp_path / "app.ts").write_text("export function greet() { return 'hi'; }")
     result = detect(tmp_path)
     all_files = [f for files in result["files"].values() for f in files]
@@ -422,6 +447,7 @@ def test_detect_skips_storybook_static_dir(tmp_path):
 
 # --- #873: dot dirs allowed, framework caches blocked ---
 
+
 def test_detect_allows_github_dir(tmp_path):
     """Files inside .github/ (workflows etc.) are now indexed (#873)."""
     gh = tmp_path / ".github" / "workflows"
@@ -430,7 +456,9 @@ def test_detect_allows_github_dir(tmp_path):
     (tmp_path / "main.py").write_text("def run(): pass")
     result = detect(tmp_path)
     all_files = [f for files in result["files"].values() for f in files]
-    assert any(".github" in f for f in all_files), "expected .github/workflows/ci.yml to be detected"
+    assert any(".github" in f for f in all_files), (
+        "expected .github/workflows/ci.yml to be detected"
+    )
 
 
 def test_detect_skips_next_cache(tmp_path):
@@ -461,9 +489,9 @@ def test_detect_skips_graphify_own_cache(tmp_path):
 
 # --- #882: gitignore parent-exclusion rule for ! re-includes ---
 
+
 def test_negation_cannot_rescue_file_under_excluded_dir(tmp_path):
     """A ! re-include cannot un-ignore a file whose parent dir is excluded (#882)."""
-    from graphify.detect import _is_ignored, _load_graphifyignore
     android = tmp_path / "android" / "app" / "src"
     android.mkdir(parents=True)
     victim = android / "Main.kt"
@@ -478,7 +506,6 @@ def test_negation_cannot_rescue_file_under_excluded_dir(tmp_path):
 
 def test_negation_works_when_no_ancestor_excluded(tmp_path):
     """A ! re-include must still un-ignore a file when no ancestor is excluded (#882)."""
-    from graphify.detect import _is_ignored, _load_graphifyignore
     src = tmp_path / "src"
     src.mkdir()
     keep = src / "keep.py"
@@ -492,7 +519,6 @@ def test_negation_works_when_no_ancestor_excluded(tmp_path):
 
 def test_negation_ancestor_itself_reincluded(tmp_path):
     """If the ancestor dir itself is re-included, its children should not be blocked (#882)."""
-    from graphify.detect import _is_ignored, _load_graphifyignore
     vendor = tmp_path / "vendor" / "lib"
     vendor.mkdir(parents=True)
     f = vendor / "utils.py"
@@ -588,34 +614,44 @@ def test_anchored_multi_segment_pattern(tmp_path):
 def test_sensitive_flags_api_token_txt():
     assert _is_sensitive(Path("api_token.txt"))
 
+
 def test_sensitive_flags_oauth_token_json():
     assert _is_sensitive(Path("oauth_token.json"))
 
+
 def test_sensitive_flags_underscore_secret():
     assert _is_sensitive(Path("app_secret.yaml"))
 
+
 def test_sensitive_does_not_flag_tokenizer_py():
     assert not _is_sensitive(Path("tokenizer.py"))
 
+
 def test_sensitive_does_not_flag_tokenize_py():
     assert not _is_sensitive(Path("tokenize.py"))
 
+
 def test_sensitive_flags_passwords_py():
     # passwords.py is just as likely a secret store as passwords.txt — code ext is no excuse
     assert _is_sensitive(Path("passwords.py"))
 
+
 def test_sensitive_flags_ssh_dir():
     assert _is_sensitive(Path("/home/user/.ssh/id_rsa"))
 
+
 def test_sensitive_flags_secrets_dir():
     assert _is_sensitive(Path("config/secrets/db.json"))
 
+
 def test_sensitive_flags_token_txt():
     assert _is_sensitive(Path("token.txt"))
 
+
 def test_sensitive_flags_credentials_json():
     assert _is_sensitive(Path("credentials.json"))
 
+
 def test_sensitive_does_not_flag_root_file_named_credentials():
     # A root-level file called "credentials" (no parent dir named credentials)
     # must NOT be flagged by Stage 1; Stage 2 name-pattern check catches it instead.
@@ -628,11 +664,13 @@ def test_sensitive_does_not_flag_root_file_named_credentials():
     # Verify the whole function still returns True (via name pattern, not dir check).
     assert _is_sensitive(p)
 
+
 def test_sensitive_secret_handler_txt():
     # Both patterns now use (?![a-zA-Z]) so underscore after keyword is allowed.
     # "secret_handler.txt": "secret" followed by "_" (not alpha) → flagged.
     assert _is_sensitive(Path("secret_handler.txt"))
 
+
 def test_sensitive_token_config_yaml():
     # "token_config.yaml": "token" followed by "_" (not alpha) → flagged.
     assert _is_sensitive(Path("token_config.yaml"))
@@ -640,6 +678,7 @@ def test_sensitive_token_config_yaml():
 
 # ── Issue #933: failed-chunk files must not be frozen in manifest ─────────────
 
+
 def test_save_manifest_skips_semantic_hash_for_files_without_cache(tmp_path):
     """Files in failed chunks have no semantic cache entry; save_manifest must
     leave their semantic_hash empty so detect_incremental re-queues them (#933)."""
@@ -653,7 +692,12 @@ def test_save_manifest_skips_semantic_hash_for_files_without_cache(tmp_path):
     doc2.write_text("# B\n\ncontent b")
 
     # Simulate: doc1's chunk succeeded (has a cache entry), doc2's chunk failed (no entry).
-    save_cached(doc1, {"nodes": [{"id": "a", "source_file": str(doc1)}], "edges": [], "hyperedges": []}, root=tmp_path, kind="semantic")
+    save_cached(
+        doc1,
+        {"nodes": [{"id": "a", "source_file": str(doc1)}], "edges": [], "hyperedges": []},
+        root=tmp_path,
+        kind="semantic",
+    )
     # doc2: no cache entry written
 
     files = {"document": [str(doc1), str(doc2)]}
@@ -674,7 +718,6 @@ def test_save_manifest_skips_semantic_hash_for_files_without_cache(tmp_path):
     assert str(doc2) not in manifest, "failed-chunk file must be absent from manifest"
 
 
-
 def test_save_manifest_without_filter_unchanged_for_code(tmp_path):
     """Code files must be stamped in the manifest regardless of semantic cache."""
     import json
@@ -689,8 +732,11 @@ def test_save_manifest_without_filter_unchanged_for_code(tmp_path):
     manifest = json.loads(Path(manifest_path).read_text())
     assert str(py) in manifest
     assert manifest[str(py)]["ast_hash"] != ""
+
+
 # Regression tests for #945 - .gitignore fallback when no .graphifyignore exists
 
+
 def test_gitignore_fallback_when_no_graphifyignore(tmp_path):
     """When no .graphifyignore exists, .gitignore patterns are honored (#945)."""
     (tmp_path / ".git").mkdir()
@@ -719,12 +765,13 @@ def test_graphifyignore_takes_precedence_over_gitignore(tmp_path):
 
     result = detect(tmp_path)
     code = result["files"]["code"]
-    assert any("main.py" in f for f in code)       # gitignore NOT applied
+    assert any("main.py" in f for f in code)  # gitignore NOT applied
     assert not any("other.py" in f for f in code)  # graphifyignore IS applied
 
 
 # Regression tests for #947 - .worktrees/ skipped and --exclude flag
 
+
 def test_detect_skips_worktrees_dir(tmp_path):
     """Files inside .worktrees/ are never indexed (#947)."""
     wt = tmp_path / ".worktrees" / "feature-branch"
@@ -770,9 +817,11 @@ def test_detect_extra_excludes_pattern(tmp_path):
 # Shebang interpreter parsing
 # ---------------------------------------------------------------------------
 
+
 def test_shebang_interpreter_plain(tmp_path):
     """Plain shebang returns the interpreter basename."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "plain"
     script.write_bytes(b"#!/usr/bin/python3\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
@@ -781,6 +830,7 @@ def test_shebang_interpreter_plain(tmp_path):
 def test_shebang_interpreter_env_single_arg(tmp_path):
     """`#!/usr/bin/env python3` returns the interpreter, not 'env'."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_single"
     script.write_bytes(b"#!/usr/bin/env python3\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
@@ -789,6 +839,7 @@ def test_shebang_interpreter_env_single_arg(tmp_path):
 def test_shebang_interpreter_env_dash_s(tmp_path):
     """`#!/usr/bin/env -S python3 -u` (-S split-args form) recovers the interpreter."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_dashs"
     script.write_bytes(b"#!/usr/bin/env -S python3 -u\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
@@ -797,6 +848,7 @@ def test_shebang_interpreter_env_dash_s(tmp_path):
 def test_shebang_interpreter_env_with_flags(tmp_path):
     """`#!/usr/bin/env -i bash` skips env flags and resolves to the interpreter."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_flags"
     script.write_bytes(b"#!/usr/bin/env -i bash\necho hi\n")
     assert _shebang_interpreter(script) == "bash"
@@ -805,6 +857,7 @@ def test_shebang_interpreter_env_with_flags(tmp_path):
 def test_shebang_interpreter_env_with_assignment(tmp_path):
     """`#!/usr/bin/env DEBUG=1 python3` skips var=value assignments."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_assign"
     script.write_bytes(b"#!/usr/bin/env DEBUG=1 python3\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
@@ -813,6 +866,7 @@ def test_shebang_interpreter_env_with_assignment(tmp_path):
 def test_shebang_interpreter_no_shebang(tmp_path):
     """File without shebang returns None."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "no_shebang"
     script.write_bytes(b"print('x')\n")
     assert _shebang_interpreter(script) is None
@@ -821,6 +875,7 @@ def test_shebang_interpreter_no_shebang(tmp_path):
 def test_shebang_interpreter_quoted_path(tmp_path):
     """Quoted interpreter path with spaces parses correctly via shlex."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "quoted"
     # Note: actual `#!` on disk wouldn't permit a quoted path on most kernels,
     # but shlex must not crash and should produce a reasonable answer
@@ -839,6 +894,7 @@ def test_shebang_file_type_classifies_via_interpreter(tmp_path):
 def test_shebang_interpreter_unreadable_returns_none(tmp_path):
     """Unreadable / nonexistent files return None, never raise."""
     from graphify.detect import _shebang_interpreter
+
     missing = tmp_path / "does_not_exist"
     assert _shebang_interpreter(missing) is None
 
@@ -846,6 +902,7 @@ def test_shebang_interpreter_unreadable_returns_none(tmp_path):
 def test_shebang_interpreter_env_unset_with_operand(tmp_path):
     """`env -u VAR python3` skips both -u and its required operand."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_unset"
     script.write_bytes(b"#!/usr/bin/env -u PYTHONPATH python3\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
@@ -855,6 +912,7 @@ def test_shebang_interpreter_env_unset_with_operand(tmp_path):
 def test_shebang_interpreter_env_chdir_with_operand(tmp_path):
     """`env -C /tmp python3` skips both -C and its workdir operand."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_chdir"
     script.write_bytes(b"#!/usr/bin/env -C /tmp python3\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
@@ -864,6 +922,7 @@ def test_shebang_interpreter_env_chdir_with_operand(tmp_path):
 def test_shebang_interpreter_env_path_with_operand(tmp_path):
     """`env -P /bin python3` skips both -P and its utilpath operand."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_path"
     script.write_bytes(b"#!/usr/bin/env -P /bin python3\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
@@ -873,6 +932,7 @@ def test_shebang_interpreter_env_path_with_operand(tmp_path):
 def test_shebang_interpreter_env_dash_s_after_flag(tmp_path):
     """`env -i -S "python3 -u"` handles -S after another env flag."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_flag_dash_s"
     script.write_bytes(b'#!/usr/bin/env -i -S "python3 -u"\nprint("x")\n')
     assert _shebang_interpreter(script) == "python3"
@@ -882,6 +942,7 @@ def test_shebang_interpreter_env_dash_s_after_flag(tmp_path):
 def test_shebang_interpreter_env_clumped_u_operand(tmp_path):
     """Clumped `-uPYTHONPATH` form (no space between flag and operand) is one arg."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_clumped"
     script.write_bytes(b"#!/usr/bin/env -uPYTHONPATH python3\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
@@ -891,6 +952,7 @@ def test_shebang_interpreter_env_clumped_u_operand(tmp_path):
 def test_shebang_interpreter_env_missing_operand_returns_none(tmp_path):
     """`env -u` with no operand → not a valid command, return None."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_missing_op"
     script.write_bytes(b"#!/usr/bin/env -u\n")
     assert _shebang_interpreter(script) is None
@@ -899,6 +961,7 @@ def test_shebang_interpreter_env_missing_operand_returns_none(tmp_path):
 def test_shebang_interpreter_env_gnu_split_string_equals(tmp_path):
     """GNU `--split-string='python3 -u'` (with `=` operand) → python3."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_split_eq"
     script.write_bytes(b"#!/usr/bin/env --split-string='python3 -u'\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
@@ -908,6 +971,7 @@ def test_shebang_interpreter_env_gnu_split_string_equals(tmp_path):
 def test_shebang_interpreter_env_gnu_split_string_separate(tmp_path):
     """GNU `--split-string "python3 -u"` (separate operand) → python3."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_split_sep"
     script.write_bytes(b'#!/usr/bin/env --split-string "python3 -u"\nprint("x")\n')
     assert _shebang_interpreter(script) == "python3"
@@ -917,6 +981,7 @@ def test_shebang_interpreter_env_gnu_split_string_separate(tmp_path):
 def test_shebang_interpreter_env_gnu_argv0_operand(tmp_path):
     """GNU `-a alias python3` skips both -a and its argv0 operand."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_argv0"
     script.write_bytes(b"#!/usr/bin/env -a alias python3\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
@@ -926,6 +991,7 @@ def test_shebang_interpreter_env_gnu_argv0_operand(tmp_path):
 def test_shebang_interpreter_env_compact_dash_s(tmp_path):
     """Compact `-Spython3 -u` form (no space between -S and packed string)."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_compact_dash_s"
     script.write_bytes(b"#!/usr/bin/env -Spython3 -u\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
@@ -935,6 +1001,7 @@ def test_shebang_interpreter_env_compact_dash_s(tmp_path):
 def test_shebang_interpreter_env_compact_v_then_s(tmp_path):
     """Compact `-vSpython3` (-v plus compact -S)."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_compact_vs"
     script.write_bytes(b"#!/usr/bin/env -vSpython3 -u\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
@@ -944,6 +1011,7 @@ def test_shebang_interpreter_env_compact_v_then_s(tmp_path):
 def test_shebang_interpreter_env_long_unset_separate_operand(tmp_path):
     """GNU `--unset PYTHONPATH python3` (separate operand)."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_long_unset"
     script.write_bytes(b"#!/usr/bin/env --unset PYTHONPATH python3\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
@@ -953,6 +1021,7 @@ def test_shebang_interpreter_env_long_unset_separate_operand(tmp_path):
 def test_shebang_interpreter_env_long_unset_equals(tmp_path):
     """GNU `--unset=PYTHONPATH python3` (`=` operand form)."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_long_unset_eq"
     script.write_bytes(b"#!/usr/bin/env --unset=PYTHONPATH python3\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
@@ -962,6 +1031,7 @@ def test_shebang_interpreter_env_long_unset_equals(tmp_path):
 def test_shebang_interpreter_env_long_chdir_separate_operand(tmp_path):
     """GNU `--chdir /tmp python3` (separate operand)."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_long_chdir"
     script.write_bytes(b"#!/usr/bin/env --chdir /tmp python3\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
@@ -971,6 +1041,7 @@ def test_shebang_interpreter_env_long_chdir_separate_operand(tmp_path):
 def test_shebang_interpreter_env_long_chdir_equals(tmp_path):
     """GNU `--chdir=/tmp python3` (`=` operand form)."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_long_chdir_eq"
     script.write_bytes(b"#!/usr/bin/env --chdir=/tmp python3\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
@@ -980,6 +1051,7 @@ def test_shebang_interpreter_env_long_chdir_equals(tmp_path):
 def test_shebang_interpreter_env_signal_flags(tmp_path):
     """GNU signal-handling flags skip transparently."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_signal"
     script.write_bytes(b"#!/usr/bin/env --default-signal=TERM --ignore-signal=PIPE python3\n")
     assert _shebang_interpreter(script) == "python3"
@@ -989,6 +1061,7 @@ def test_shebang_interpreter_env_signal_flags(tmp_path):
 def test_shebang_interpreter_env_unknown_option_returns_none(tmp_path):
     """Unknown hyphen-prefixed env option → return None rather than guessing."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_unknown"
     script.write_bytes(b"#!/usr/bin/env --no-such-flag python3\n")
     # Must refuse to guess: if we can't classify the option, we can't trust
@@ -999,10 +1072,10 @@ def test_shebang_interpreter_env_unknown_option_returns_none(tmp_path):
 def test_shebang_interpreter_env_dash_s_assignment_before_interpreter(tmp_path):
     """`-S` payload may carry NAME=value assignments before the interpreter."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_s_assignment"
     script.write_bytes(
-        b"#!/usr/bin/env -S PYTHONPATH=/opt/custom:${PYTHONPATH} python3\n"
-        b"print('x')\n"
+        b"#!/usr/bin/env -S PYTHONPATH=/opt/custom:${PYTHONPATH} python3\nprint('x')\n"
     )
     assert _shebang_interpreter(script) == "python3"
     assert classify_file(script) == FileType.CODE
@@ -1011,6 +1084,7 @@ def test_shebang_interpreter_env_dash_s_assignment_before_interpreter(tmp_path):
 def test_shebang_interpreter_env_dash_s_flag_before_interpreter(tmp_path):
     """`-S` payload may carry env flags (e.g. -i) before the interpreter."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_s_flag"
     script.write_bytes(b"#!/usr/bin/env -S -i OLDUSER=${USER} python3\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
@@ -1020,6 +1094,7 @@ def test_shebang_interpreter_env_dash_s_flag_before_interpreter(tmp_path):
 def test_shebang_interpreter_env_long_split_assignment_before_interpreter(tmp_path):
     """`--split-string=` payload may carry assignments before the interpreter."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_long_split_assignment"
     script.write_bytes(
         b"#!/usr/bin/env --split-string='PYTHONPATH=/opt/custom:${PYTHONPATH} python3 -u'\n"
@@ -1032,6 +1107,7 @@ def test_shebang_interpreter_env_long_split_assignment_before_interpreter(tmp_pa
 def test_shebang_interpreter_env_long_split_flag_before_interpreter(tmp_path):
     """`--split-string=` payload may carry env flags before the interpreter."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_long_split_flag"
     script.write_bytes(b"#!/usr/bin/env --split-string='-i python3 -u'\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
@@ -1043,6 +1119,7 @@ def test_shebang_interpreter_env_nested_split_string_rejected(tmp_path):
     on the recursive call bounds the recursion depth at one). Without this guard,
     a malicious or strange shebang could spin the parser indefinitely."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_nested_split"
     # Outer -S splits into ["-S", "python3", "-u"]; inner -S is treated as an
     # unknown option in the recursed pass, so we get None (refuse to guess).
@@ -1053,6 +1130,7 @@ def test_shebang_interpreter_env_nested_split_string_rejected(tmp_path):
 def test_shebang_interpreter_env_vs_assignment_before_interpreter(tmp_path):
     """`-vS` packed payload also re-parses for leading assignments."""
     from graphify.detect import _shebang_interpreter
+
     script = tmp_path / "env_vs_assignment"
     script.write_bytes(b"#!/usr/bin/env -vS DEBUG=1 python3 -u\nprint('x')\n")
     assert _shebang_interpreter(script) == "python3"
diff --git a/tests/test_devin.py b/tests/test_devin.py
index a3bca5d05..1f94bb58d 100644
--- a/tests/test_devin.py
+++ b/tests/test_devin.py
@@ -1,24 +1,28 @@
 """Tests for graphify devin install / uninstall commands."""
+
 from pathlib import Path
 import sys
 from unittest.mock import patch
-import pytest
 
 
 # ---------------------------------------------------------------------------
 # Helpers
 # ---------------------------------------------------------------------------
 
+
 def _devin_install_user(tmp_path):
     from graphify.__main__ import install
+
     old_cwd = Path.cwd()
     try:
         import os
+
         os.chdir(tmp_path)
         with patch("graphify.__main__.Path.home", return_value=tmp_path):
             install(platform="devin")
     finally:
         import os
+
         os.chdir(old_cwd)
 
 
@@ -38,6 +42,7 @@ def _rules_path(project_dir):
 # User-scope install (graphify install --platform devin / graphify devin install)
 # ---------------------------------------------------------------------------
 
+
 def test_devin_install_user_creates_skill_file(tmp_path):
     """User-scope install copies skill to ~/.config/devin/skills/graphify/SKILL.md."""
     _devin_install_user(tmp_path)
@@ -71,9 +76,11 @@ def test_devin_install_user_does_not_write_rules(tmp_path):
 # Project-scope install (graphify devin install --project)
 # ---------------------------------------------------------------------------
 
+
 def test_devin_install_project_creates_skill_file(tmp_path, monkeypatch):
     """Project-scope install copies skill to .devin/skills/graphify/SKILL.md."""
     from graphify.__main__ import main
+
     home = tmp_path / "home"
     project = tmp_path / "project"
     project.mkdir()
@@ -88,6 +95,7 @@ def test_devin_install_project_creates_skill_file(tmp_path, monkeypatch):
 def test_devin_install_project_creates_rules_file(tmp_path, monkeypatch):
     """Project-scope install writes .windsurf/rules/graphify.md."""
     from graphify.__main__ import main
+
     home = tmp_path / "home"
     project = tmp_path / "project"
     project.mkdir()
@@ -104,6 +112,7 @@ def test_devin_install_project_creates_rules_file(tmp_path, monkeypatch):
 def test_devin_rules_content_recommends_graphify_query(tmp_path):
     """The rules file installed by devin must use query-first policy."""
     from graphify.__main__ import _devin_rules_install
+
     _devin_rules_install(tmp_path)
     content = _rules_path(tmp_path).read_text()
     assert "graphify query" in content
@@ -112,6 +121,7 @@ def test_devin_rules_content_recommends_graphify_query(tmp_path):
 def test_devin_rules_install_idempotent(tmp_path, capsys):
     """Installing rules twice does not change content and prints 'no change'."""
     from graphify.__main__ import _devin_rules_install
+
     _devin_rules_install(tmp_path)
     content_first = _rules_path(tmp_path).read_text()
     _devin_rules_install(tmp_path)
@@ -123,6 +133,7 @@ def test_devin_rules_install_idempotent(tmp_path, capsys):
 def test_devin_install_project_hints_git_add(tmp_path, monkeypatch, capsys):
     """Project-scope install prints a git add hint covering .devin/ and .windsurf/."""
     from graphify.__main__ import main
+
     home = tmp_path / "home"
     project = tmp_path / "project"
     project.mkdir()
@@ -138,6 +149,7 @@ def test_devin_install_project_hints_git_add(tmp_path, monkeypatch, capsys):
 # Uninstall — user scope
 # ---------------------------------------------------------------------------
 
+
 def test_devin_uninstall_user_removes_skill_file(tmp_path):
     """User-scope uninstall removes the skill file."""
     _devin_install_user(tmp_path)
@@ -145,6 +157,7 @@ def test_devin_uninstall_user_removes_skill_file(tmp_path):
     assert skill.exists()
 
     from graphify.__main__ import _remove_skill_file
+
     with patch("graphify.__main__.Path.home", return_value=tmp_path):
         _remove_skill_file("devin")
     assert not skill.exists()
@@ -154,6 +167,7 @@ def test_devin_uninstall_user_noop_when_not_installed(tmp_path, capsys):
     """User-scope uninstall prints an appropriate message when nothing is installed."""
     from graphify.__main__ import main
     import os
+
     old_cwd = Path.cwd()
     try:
         os.chdir(tmp_path)
@@ -170,9 +184,11 @@ def test_devin_uninstall_user_noop_when_not_installed(tmp_path, capsys):
 # Uninstall — project scope
 # ---------------------------------------------------------------------------
 
+
 def test_devin_uninstall_project_removes_skill_file(tmp_path, monkeypatch):
     """Project-scope uninstall removes .devin/skills/graphify/SKILL.md."""
     from graphify.__main__ import main
+
     home = tmp_path / "home"
     project = tmp_path / "project"
     project.mkdir()
@@ -188,6 +204,7 @@ def test_devin_uninstall_project_removes_skill_file(tmp_path, monkeypatch):
 def test_devin_uninstall_project_removes_rules_file(tmp_path, monkeypatch):
     """Project-scope uninstall removes .windsurf/rules/graphify.md."""
     from graphify.__main__ import main
+
     home = tmp_path / "home"
     project = tmp_path / "project"
     project.mkdir()
@@ -203,6 +220,7 @@ def test_devin_uninstall_project_removes_rules_file(tmp_path, monkeypatch):
 def test_devin_uninstall_project_does_not_touch_user_scope(tmp_path, monkeypatch):
     """Project-scope uninstall must not remove the user-scope skill file."""
     from graphify.__main__ import main
+
     home = tmp_path / "home"
     project = tmp_path / "project"
     project.mkdir()
@@ -222,6 +240,7 @@ def test_devin_uninstall_project_does_not_touch_user_scope(tmp_path, monkeypatch
 def test_devin_rules_uninstall_noop_when_not_installed(tmp_path):
     """_devin_rules_uninstall does nothing if the rules file was never written."""
     from graphify.__main__ import _devin_rules_uninstall
+
     _devin_rules_uninstall(tmp_path)  # should not raise
 
 
@@ -229,9 +248,11 @@ def test_devin_rules_uninstall_noop_when_not_installed(tmp_path):
 # Skill file content
 # ---------------------------------------------------------------------------
 
+
 def test_devin_skill_file_exists_in_package():
     """skill-devin.md must be present in the installed package."""
     import graphify
+
     skill = Path(graphify.__file__).parent / "skill-devin.md"
     assert skill.exists(), "skill-devin.md missing from package"
 
@@ -244,6 +265,7 @@ def test_devin_skill_file_uses_python_c_syntax():
     ``python -c "..."`` so they work in pipx / venv environments.
     """
     import graphify
+
     skill = (Path(graphify.__file__).parent / "skill-devin.md").read_text()
     assert '.graphify_python) -c "' in skill, (
         "skill-devin.md must use the interpreter-detection pattern "
@@ -255,6 +277,7 @@ def test_devin_skill_file_uses_python_c_syntax():
 def test_devin_skill_file_frontmatter_has_triggers():
     """Devin skill frontmatter must list triggers for model-invocable activation."""
     import graphify
+
     skill = (Path(graphify.__file__).parent / "skill-devin.md").read_text()
     assert "triggers:" in skill
     assert "model" in skill
@@ -264,9 +287,11 @@ def test_devin_skill_file_frontmatter_has_triggers():
 # Platform config sanity
 # ---------------------------------------------------------------------------
 
+
 def test_devin_in_platform_config():
     """devin must be registered in _PLATFORM_CONFIG."""
     from graphify.__main__ import _PLATFORM_CONFIG
+
     assert "devin" in _PLATFORM_CONFIG
     assert _PLATFORM_CONFIG["devin"]["skill_file"] == "skill-devin.md"
     assert _PLATFORM_CONFIG["devin"]["claude_md"] is False
@@ -275,6 +300,7 @@ def test_devin_in_platform_config():
 def test_devin_platform_skill_destination_user_scope(tmp_path):
     """User-scope destination must be ~/.config/devin/skills/graphify/SKILL.md."""
     from graphify.__main__ import _platform_skill_destination
+
     with patch("graphify.__main__.Path.home", return_value=tmp_path):
         dst = _platform_skill_destination("devin", project=False)
     assert dst == tmp_path / ".config" / "devin" / "skills" / "graphify" / "SKILL.md"
@@ -283,6 +309,7 @@ def test_devin_platform_skill_destination_user_scope(tmp_path):
 def test_devin_in_main_help_text(capsys, monkeypatch):
     """`graphify --help` must list devin in the platform list and in the per-platform section."""
     from graphify.__main__ import main
+
     monkeypatch.setattr(sys, "argv", ["graphify", "--help"])
     main()
     captured = capsys.readouterr().out
@@ -305,5 +332,6 @@ def test_devin_in_main_help_text(capsys, monkeypatch):
 def test_devin_platform_skill_destination_project_scope(tmp_path):
     """Project-scope destination must be <project>/.devin/skills/graphify/SKILL.md."""
     from graphify.__main__ import _platform_skill_destination
+
     dst = _platform_skill_destination("devin", project=True, project_dir=tmp_path)
     assert dst == tmp_path / ".devin" / "skills" / "graphify" / "SKILL.md"
diff --git a/tests/test_dotnet.py b/tests/test_dotnet.py
index 17a146073..4bc6faf30 100644
--- a/tests/test_dotnet.py
+++ b/tests/test_dotnet.py
@@ -1,7 +1,7 @@
 """Tests for .NET project file extraction (.sln, .csproj, .razor)."""
+
 from pathlib import Path
 import tempfile
-import pytest
 from graphify.extract import extract_sln, extract_csproj, extract_razor
 
 FIXTURES = Path(__file__).parent / "fixtures"
@@ -17,6 +17,7 @@ def _relations(r):
 
 # ── .sln ─────────────────────────────────────────────────────────────────────
 
+
 def test_sln_extracts_projects():
     r = extract_sln(FIXTURES / "sample.sln")
     assert "error" not in r
@@ -39,13 +40,14 @@ def test_sln_project_dependency():
 
 # ── .csproj ──────────────────────────────────────────────────────────────────
 
+
 def test_csproj_packages():
     r = extract_csproj(FIXTURES / "sample.csproj")
     assert "error" not in r
     labels = _labels(r)
-    assert any("MediatR" in l for l in labels)
-    assert any("FluentValidation" in l for l in labels)
-    assert any("Swashbuckle" in l for l in labels)
+    assert any("MediatR" in label for label in labels)
+    assert any("FluentValidation" in label for label in labels)
+    assert any("Swashbuckle" in label for label in labels)
 
 
 def test_csproj_project_references():
@@ -74,6 +76,7 @@ def test_csproj_invalid_xml():
 
 # ── .razor ───────────────────────────────────────────────────────────────────
 
+
 def test_razor_using_and_inject():
     r = extract_razor(FIXTURES / "sample.razor")
     assert "error" not in r
@@ -91,7 +94,7 @@ def test_razor_components():
 
 def test_razor_page_route():
     r = extract_razor(FIXTURES / "sample.razor")
-    assert any("/counter" in l for l in _labels(r))
+    assert any("/counter" in label for label in _labels(r))
 
 
 def test_razor_inherits():
@@ -113,13 +116,16 @@ def test_razor_missing_file():
 
 # ── dispatch & detect integration ────────────────────────────────────────────
 
+
 def test_dispatch_table():
     from graphify.extract import _get_extractor
+
     for ext in (".sln", ".csproj", ".fsproj", ".vbproj", ".razor", ".cshtml"):
         assert _get_extractor(Path(f"foo{ext}")) is not None, f"{ext} not in dispatch"
 
 
 def test_code_extensions():
     from graphify.detect import CODE_EXTENSIONS
+
     for ext in (".sln", ".csproj", ".fsproj", ".vbproj", ".razor", ".cshtml"):
         assert ext in CODE_EXTENSIONS, f"{ext} missing"
diff --git a/tests/test_explain_cli.py b/tests/test_explain_cli.py
index 1d00955f0..f96b896cd 100644
--- a/tests/test_explain_cli.py
+++ b/tests/test_explain_cli.py
@@ -1,4 +1,5 @@
 """Regression tests for `graphify explain` arrow direction (#853)."""
+
 from __future__ import annotations
 import json
 import graphify.__main__ as mainmod
@@ -6,24 +7,54 @@
 
 def _write_graph(tmp_path):
     graph_data = {
-        "directed": False, "multigraph": False, "graph": {},
+        "directed": False,
+        "multigraph": False,
+        "graph": {},
         "nodes": [
-            {"id": "validate", "label": "validateSanitySession()",
-             "source_file": "server/sanity-validate-session.ts", "community": 0},
-            {"id": "create_patch", "label": "createPatchHandler()",
-             "source_file": "server/create-patch-handler.ts", "community": 0},
-            {"id": "create_edit", "label": "createEditHandler()",
-             "source_file": "server/create-edit-handler.ts", "community": 0},
-            {"id": "stable_stringify", "label": "stableStringify()",
-             "source_file": "shared/stringify.ts", "community": 0},
+            {
+                "id": "validate",
+                "label": "validateSanitySession()",
+                "source_file": "server/sanity-validate-session.ts",
+                "community": 0,
+            },
+            {
+                "id": "create_patch",
+                "label": "createPatchHandler()",
+                "source_file": "server/create-patch-handler.ts",
+                "community": 0,
+            },
+            {
+                "id": "create_edit",
+                "label": "createEditHandler()",
+                "source_file": "server/create-edit-handler.ts",
+                "community": 0,
+            },
+            {
+                "id": "stable_stringify",
+                "label": "stableStringify()",
+                "source_file": "shared/stringify.ts",
+                "community": 0,
+            },
         ],
         "links": [
-            {"source": "create_patch", "target": "validate",
-             "relation": "calls", "confidence": "EXTRACTED"},
-            {"source": "create_edit", "target": "validate",
-             "relation": "calls", "confidence": "EXTRACTED"},
-            {"source": "validate", "target": "stable_stringify",
-             "relation": "calls", "confidence": "EXTRACTED"},
+            {
+                "source": "create_patch",
+                "target": "validate",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+            },
+            {
+                "source": "create_edit",
+                "target": "validate",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+            },
+            {
+                "source": "validate",
+                "target": "stable_stringify",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+            },
         ],
     }
     p = tmp_path / "graph.json"
@@ -33,8 +64,9 @@ def _write_graph(tmp_path):
 
 def _run(monkeypatch, graph_path, label, capsys):
     monkeypatch.setattr(mainmod, "_check_skill_version", lambda _: None)
-    monkeypatch.setattr(mainmod.sys, "argv",
-        ["graphify", "explain", label, "--graph", str(graph_path)])
+    monkeypatch.setattr(
+        mainmod.sys, "argv", ["graphify", "explain", label, "--graph", str(graph_path)]
+    )
     mainmod.main()
     return capsys.readouterr().out
 
diff --git a/tests/test_export.py b/tests/test_export.py
index 65964d24e..78a1a36c3 100644
--- a/tests/test_export.py
+++ b/tests/test_export.py
@@ -7,9 +7,11 @@
 
 FIXTURES = Path(__file__).parent / "fixtures"
 
+
 def make_graph():
     return build_from_json(json.loads((FIXTURES / "extraction.json").read_text()))
 
+
 def test_to_json_creates_file():
     G = make_graph()
     communities = cluster(G)
@@ -18,6 +20,7 @@ def test_to_json_creates_file():
         to_json(G, communities, str(out))
         assert out.exists()
 
+
 def test_to_json_valid_json():
     G = make_graph()
     communities = cluster(G)
@@ -28,6 +31,7 @@ def test_to_json_valid_json():
         assert "nodes" in data
         assert "links" in data
 
+
 def test_to_json_nodes_have_community():
     G = make_graph()
     communities = cluster(G)
@@ -38,6 +42,7 @@ def test_to_json_nodes_have_community():
         for node in data["nodes"]:
             assert "community" in node
 
+
 def test_to_cypher_creates_file():
     G = make_graph()
     with tempfile.TemporaryDirectory() as tmp:
@@ -45,6 +50,7 @@ def test_to_cypher_creates_file():
         to_cypher(G, str(out))
         assert out.exists()
 
+
 def test_to_cypher_contains_merge_statements():
     G = make_graph()
     with tempfile.TemporaryDirectory() as tmp:
@@ -53,6 +59,7 @@ def test_to_cypher_contains_merge_statements():
         content = out.read_text()
         assert "MERGE" in content
 
+
 def test_to_graphml_creates_file():
     G = make_graph()
     communities = cluster(G)
@@ -61,6 +68,7 @@ def test_to_graphml_creates_file():
         to_graphml(G, communities, str(out))
         assert out.exists()
 
+
 def test_to_graphml_valid_xml():
     G = make_graph()
     communities = cluster(G)
@@ -71,6 +79,7 @@ def test_to_graphml_valid_xml():
         assert "<graphml" in content
         assert "<node" in content
 
+
 def test_to_graphml_has_community_attribute():
     G = make_graph()
     communities = cluster(G)
@@ -80,6 +89,7 @@ def test_to_graphml_has_community_attribute():
         content = out.read_text()
         assert "community" in content
 
+
 def test_to_html_creates_file():
     G = make_graph()
     communities = cluster(G)
@@ -88,6 +98,7 @@ def test_to_html_creates_file():
         to_html(G, communities, str(out))
         assert out.exists()
 
+
 def test_to_html_contains_visjs():
     G = make_graph()
     communities = cluster(G)
@@ -119,11 +130,15 @@ def test_to_html_pins_visjs_version_with_sri():
     assert "https://unpkg.com/vis-network/standalone" not in content
 
     # SRI integrity attribute pinning the known-good hash.
-    assert 'integrity="sha384-Ux6phic9PEHJ38YtrijhkzyJ8yQlH8i/+buBR8s3mAZOJrP1gwyvAcIYl3GWtpX1"' in content
+    assert (
+        'integrity="sha384-Ux6phic9PEHJ38YtrijhkzyJ8yQlH8i/+buBR8s3mAZOJrP1gwyvAcIYl3GWtpX1"'
+        in content
+    )
 
     # crossorigin="anonymous" is required for SRI on cross-origin scripts.
     assert 'crossorigin="anonymous"' in content
 
+
 def test_to_html_contains_search():
     G = make_graph()
     communities = cluster(G)
@@ -133,6 +148,7 @@ def test_to_html_contains_search():
         content = out.read_text()
         assert "search" in content.lower()
 
+
 def test_to_html_contains_legend_with_labels():
     G = make_graph()
     communities = cluster(G)
@@ -143,6 +159,7 @@ def test_to_html_contains_legend_with_labels():
         content = out.read_text()
         assert "Group 0" in content
 
+
 def test_to_html_contains_nodes_and_edges():
     G = make_graph()
     communities = cluster(G)
@@ -182,15 +199,18 @@ def test_to_canvas_file_paths_relative_to_vault():
 
 # ── Issue #834: backup_if_protected ──────────────────────────────────────────
 
+
 def test_backup_no_graph_json(tmp_path):
     """No graph.json → no backup."""
     from graphify.export import backup_if_protected
+
     assert backup_if_protected(tmp_path) is None
 
 
 def test_backup_no_markers(tmp_path):
     """graph.json present but no sentinel and no curated labels → no backup."""
     from graphify.export import backup_if_protected
+
     (tmp_path / "graph.json").write_text('{"nodes":[],"links":[]}')
     assert backup_if_protected(tmp_path) is None
 
@@ -198,6 +218,7 @@ def test_backup_no_markers(tmp_path):
 def test_backup_semantic_marker(tmp_path):
     """graph.json + .graphify_semantic_marker → backup taken."""
     from graphify.export import backup_if_protected
+
     (tmp_path / "graph.json").write_text('{"nodes":[],"links":[]}')
     (tmp_path / "GRAPH_REPORT.md").write_text("# Report")
     (tmp_path / ".graphify_semantic_marker").write_text('{"output_tokens": 1234}')
@@ -213,8 +234,11 @@ def test_backup_curated_labels(tmp_path):
     """graph.json + non-default label in .graphify_labels.json → backup taken."""
     import json
     from graphify.export import backup_if_protected
+
     (tmp_path / "graph.json").write_text('{"nodes":[],"links":[]}')
-    (tmp_path / ".graphify_labels.json").write_text(json.dumps({"0": "Auth Pipeline", "1": "Community 1"}))
+    (tmp_path / ".graphify_labels.json").write_text(
+        json.dumps({"0": "Auth Pipeline", "1": "Community 1"})
+    )
     result = backup_if_protected(tmp_path)
     assert result is not None
 
@@ -223,8 +247,11 @@ def test_backup_default_labels_only(tmp_path):
     """All-default labels → no backup (not curated)."""
     import json
     from graphify.export import backup_if_protected
+
     (tmp_path / "graph.json").write_text('{"nodes":[],"links":[]}')
-    (tmp_path / ".graphify_labels.json").write_text(json.dumps({"0": "Community 0", "1": "Community 1"}))
+    (tmp_path / ".graphify_labels.json").write_text(
+        json.dumps({"0": "Community 0", "1": "Community 1"})
+    )
     assert backup_if_protected(tmp_path) is None
 
 
@@ -232,6 +259,7 @@ def test_backup_same_day_no_accumulation(tmp_path):
     """Same content on same day returns existing backup dir without re-copying."""
     from graphify.export import backup_if_protected
     from datetime import date
+
     (tmp_path / "graph.json").write_text('{"nodes":[],"links":[]}')
     (tmp_path / ".graphify_semantic_marker").write_text("{}")
     b1 = backup_if_protected(tmp_path)
@@ -244,12 +272,13 @@ def test_backup_same_day_no_accumulation(tmp_path):
 def test_backup_same_day_changed_content(tmp_path):
     """Changed graph.json on same day overwrites the existing backup in place."""
     from graphify.export import backup_if_protected
-    from datetime import date
+
     (tmp_path / "graph.json").write_text('{"nodes":[],"links":[]}')
     (tmp_path / ".graphify_semantic_marker").write_text("{}")
     b1 = backup_if_protected(tmp_path)
     (tmp_path / "graph.json").write_text('{"nodes":[{"id":"x"}],"links":[]}')
     b2 = backup_if_protected(tmp_path)
+    assert b2 is not None
     assert b1 == b2  # still one folder per day
     assert (b2 / "graph.json").read_text() == '{"nodes":[{"id":"x"}],"links":[]}'
 
@@ -257,6 +286,7 @@ def test_backup_same_day_changed_content(tmp_path):
 def test_backup_env_disable(tmp_path, monkeypatch):
     """GRAPHIFY_NO_BACKUP=1 disables backup entirely."""
     from graphify.export import backup_if_protected
+
     monkeypatch.setenv("GRAPHIFY_NO_BACKUP", "1")
     (tmp_path / "graph.json").write_text('{"nodes":[],"links":[]}')
     (tmp_path / ".graphify_semantic_marker").write_text("{}")
diff --git a/tests/test_extract.py b/tests/test_extract.py
index 0d5db2c5a..80a68602b 100644
--- a/tests/test_extract.py
+++ b/tests/test_extract.py
@@ -1,5 +1,13 @@
 from pathlib import Path
-from graphify.extract import extract_python, extract, collect_files, _make_id, extract_bash, extract_json, _DISPATCH
+from graphify.extract import (
+    extract_python,
+    extract,
+    collect_files,
+    _make_id,
+    extract_bash,
+    extract_json,
+    _DISPATCH,
+)
 
 FIXTURES = Path(__file__).parent / "fixtures"
 
@@ -29,7 +37,7 @@ def test_extract_python_finds_class():
 def test_extract_python_finds_methods():
     result = extract_python(FIXTURES / "sample.py")
     labels = [n["label"] for n in result["nodes"]]
-    assert any("__init__" in l or "forward" in l for l in labels)
+    assert any("__init__" in label or "forward" in label for label in labels)
 
 
 def test_extract_python_no_dangling_edges():
@@ -66,7 +74,8 @@ def test_extract_disambiguates_duplicate_symbol_ids_by_source_path(tmp_path):
 
     result = extract([first, second], cache_root=tmp_path)
     program_nodes = [
-        node for node in result["nodes"]
+        node
+        for node in result["nodes"]
         if node["label"] == "Program" and node.get("source_file", "").endswith("Program.cs")
     ]
 
@@ -76,14 +85,13 @@ def test_extract_disambiguates_duplicate_symbol_ids_by_source_path(tmp_path):
     node_ids = {node["id"] for node in result["nodes"]}
     program_by_source = {node["source_file"]: node["id"] for node in program_nodes}
     file_nodes_by_source = {
-        node["source_file"]: node["id"]
-        for node in result["nodes"]
-        if node["label"] == "Program.cs"
+        node["source_file"]: node["id"] for node in result["nodes"] if node["label"] == "Program.cs"
     }
 
     assert set(program_by_source) == set(file_nodes_by_source)
     contains_edges = [
-        edge for edge in result["edges"]
+        edge
+        for edge in result["edges"]
         if edge["relation"] == "contains" and edge["source_file"] in program_by_source
     ]
     assert len(contains_edges) == 2
@@ -129,14 +137,16 @@ def test_extract_rewires_unique_inheritance_stub_to_real_definition(tmp_path):
     inherits_edges = [edge for edge in result["edges"] if edge["relation"] == "inherits"]
 
     matching = [
-        edge for edge in inherits_edges
+        edge
+        for edge in inherits_edges
         if node_by_id[edge["source"]]["label"] == "SqliteBookStore"
         and node_by_id[edge["target"]]["label"] == "BookStore"
     ]
 
     assert matching
     assert matching[0]["target"] == next(
-        node["id"] for node in result["nodes"]
+        node["id"]
+        for node in result["nodes"]
         if node["label"] == "BookStore" and node.get("source_file") == "interfaces.py"
     )
     assert all(
@@ -158,7 +168,8 @@ def test_extract_keeps_stub_when_multiple_real_definitions_match(tmp_path):
 
     result = extract([first, second, implementation], cache_root=tmp_path)
     stubs = [
-        node for node in result["nodes"]
+        node
+        for node in result["nodes"]
         if node["label"] == "BookStore" and not node.get("source_file")
     ]
 
@@ -177,8 +188,7 @@ def test_extract_does_not_rewire_inheritance_stub_to_same_named_function(tmp_pat
     inherits_edges = [edge for edge in result["edges"] if edge["relation"] == "inherits"]
 
     assert any(
-        node["label"] == "BookStore" and not node.get("source_file")
-        for node in result["nodes"]
+        node["label"] == "BookStore" and not node.get("source_file") for node in result["nodes"]
     )
     assert not any(
         node_by_id[edge["source"]]["label"] == "SqliteBookStore"
@@ -190,27 +200,20 @@ def test_extract_does_not_rewire_inheritance_stub_to_same_named_function(tmp_pat
 def test_extract_does_not_rewire_constructor_method_to_same_named_class(tmp_path):
     source = tmp_path / "Sample.java"
     source.write_text(
-        "class DataProcessor {\n"
-        "    public DataProcessor() {}\n"
-        "}\n",
+        "class DataProcessor {\n    public DataProcessor() {}\n}\n",
         encoding="utf-8",
     )
 
     result = extract([source], cache_root=tmp_path)
 
-    constructor_nodes = [
-        node for node in result["nodes"]
-        if node["label"] == ".DataProcessor()"
-    ]
+    constructor_nodes = [node for node in result["nodes"] if node["label"] == ".DataProcessor()"]
     assert constructor_nodes
-    assert not any(
-        edge["source"] == edge["target"]
-        for edge in result["edges"]
-    )
+    assert not any(edge["source"] == edge["target"] for edge in result["edges"])
 
 
 def test_collect_files_from_dir():
     from graphify.extract import _DISPATCH
+
     files = collect_files(FIXTURES)
     supported = set(_DISPATCH.keys())
     assert all(f.suffix in supported for f in files)
@@ -339,8 +342,7 @@ def test_cross_file_calls_skip_ambiguous_duplicate_labels(tmp_path):
     result = extract([caller, helper_a, helper_b], cache_root=tmp_path)
     nodes = {n["id"]: n for n in result["nodes"]}
     calls = [
-        e for e in result["edges"]
-        if e["relation"] == "calls" and e["confidence"] == "INFERRED"
+        e for e in result["edges"] if e["relation"] == "calls" and e["confidence"] == "INFERRED"
     ]
 
     assert not any(
@@ -362,15 +364,17 @@ def test_extract_generic_surfaces_tree_sitter_version_mismatch_hint(monkeypatch)
     # this is exactly what users see when an older tree-sitter is paired
     # with a newer language binding.
     fake_ts = types.ModuleType("tree_sitter")
+
     def _raise(*args, **kwargs):
         raise TypeError("missing 1 required positional argument: 'name'")
-    fake_ts.Language = _raise
-    fake_ts.Parser = None
+
+    setattr(fake_ts, "Language", _raise)
+    setattr(fake_ts, "Parser", None)
     monkeypatch.setitem(sys.modules, "tree_sitter", fake_ts)
 
     # Stub the language module so import_module returns something with .language
     fake_lang_mod = types.ModuleType("fake_ts_lang")
-    fake_lang_mod.language = lambda: object()
+    setattr(fake_lang_mod, "language", lambda: object())
     monkeypatch.setitem(sys.modules, "fake_ts_lang", fake_lang_mod)
 
     config = LanguageConfig(ts_module="fake_ts_lang", ts_language_fn="language")
@@ -384,6 +388,7 @@ def _raise(*args, **kwargs):
 def test_extract_js_destructured_require_imports_from():
     """`const { foo } = require('./mod')` must emit imports_from to the resolved module path."""
     from graphify.extract import extract_js
+
     result = extract_js(FIXTURES / "cjs_require.js")
     imports_from = [e for e in result["edges"] if e["relation"] == "imports_from"]
     targets = [e["target"] for e in imports_from]
@@ -398,6 +403,7 @@ def test_extract_js_destructured_require_imports_from():
 def test_extract_js_destructured_require_named_symbols():
     """Destructured CJS requires must emit symbol-level `imports` edges per binder."""
     from graphify.extract import extract_js, _make_id, _file_stem
+
     result = extract_js(FIXTURES / "cjs_require.js")
     sym_targets = [e["target"] for e in result["edges"] if e["relation"] == "imports"]
     foundation_stem = _file_stem(FIXTURES / "foundation.js")
@@ -408,6 +414,7 @@ def test_extract_js_destructured_require_named_symbols():
 def test_extract_js_member_require_emits_property_symbol():
     """`const x = require('./m').y` must emit symbol edge for `y`."""
     from graphify.extract import extract_js, _make_id, _file_stem
+
     result = extract_js(FIXTURES / "cjs_require.js")
     sym_targets = [e["target"] for e in result["edges"] if e["relation"] == "imports"]
     helpers_stem = _file_stem(FIXTURES / "helpers.js")
@@ -417,6 +424,7 @@ def test_extract_js_member_require_emits_property_symbol():
 def test_extract_js_arrow_function_still_extracted():
     """Regression: arrow functions in lexical_declaration must still produce nodes."""
     from graphify.extract import extract_js
+
     arrow_fixture = FIXTURES / "_arrow_only.js"
     arrow_fixture.write_text("const greet = () => console.log('hi');\n")
     try:
@@ -432,18 +440,13 @@ def test_cross_file_call_promoted_to_extracted_with_import_evidence(tmp_path):
     an `imports` or `imports_from` edge linking it to the callee."""
     caller = tmp_path / "caller.js"
     callee = tmp_path / "lib.js"
-    caller.write_text(
-        "const { doWork } = require('./lib');\n"
-        "function run() { doWork(); }\n"
-    )
-    callee.write_text(
-        "function doWork() { return 1; }\n"
-        "module.exports = { doWork };\n"
-    )
+    caller.write_text("const { doWork } = require('./lib');\nfunction run() { doWork(); }\n")
+    callee.write_text("function doWork() { return 1; }\nmodule.exports = { doWork };\n")
     result = extract([caller, callee], cache_root=tmp_path)
     nodes = {n["id"]: n for n in result["nodes"]}
     call_edges = [
-        e for e in result["edges"]
+        e
+        for e in result["edges"]
         if e["relation"] == "calls"
         and nodes[e["source"]]["label"] == "run()"
         and nodes[e["target"]]["label"] == "doWork()"
@@ -460,14 +463,12 @@ def test_cross_file_call_remains_inferred_without_import_evidence(tmp_path):
     callee = tmp_path / "lib.js"
     # Caller does NOT require lib — same-name function happens to exist elsewhere
     caller.write_text("function run() { doUnique(); }\n")
-    callee.write_text(
-        "function doUnique() { return 1; }\n"
-        "module.exports = { doUnique };\n"
-    )
+    callee.write_text("function doUnique() { return 1; }\nmodule.exports = { doUnique };\n")
     result = extract([caller, callee], cache_root=tmp_path)
     nodes = {n["id"]: n for n in result["nodes"]}
     call_edges = [
-        e for e in result["edges"]
+        e
+        for e in result["edges"]
         if e["relation"] == "calls"
         and nodes[e["source"]]["label"] == "run()"
         and nodes[e["target"]]["label"] == "doUnique()"
@@ -481,14 +482,16 @@ def test_cross_file_call_remains_inferred_without_import_evidence(tmp_path):
 # `language_typescript` grammar. Parsing JSX with the wrong grammar produces
 # silent ERROR nodes and drops every function/call inside JSX trees.
 
+
 def test_extract_tsx_finds_helpers_and_component():
     """Functions defined alongside a JSX-returning component must be captured."""
     from graphify.extract import extract_js
+
     result = extract_js(FIXTURES / "sample.tsx")
     labels = [n["label"] for n in result["nodes"]]
-    assert any("fmtDate" in l for l in labels), f"fmtDate missing from {labels}"
-    assert any("fmtCount" in l for l in labels), f"fmtCount missing from {labels}"
-    assert any("App" in l for l in labels), f"App missing from {labels}"
+    assert any("fmtDate" in label for label in labels), f"fmtDate missing from {labels}"
+    assert any("fmtCount" in label for label in labels), f"fmtCount missing from {labels}"
+    assert any("App" in label for label in labels), f"App missing from {labels}"
 
 
 def test_extract_tsx_jsx_expression_calls_resolve():
@@ -498,6 +501,7 @@ def test_extract_tsx_jsx_expression_calls_resolve():
     JSX is parsed as ERROR nodes and these call_expressions disappear.
     """
     from graphify.extract import extract_js
+
     result = extract_js(FIXTURES / "sample.tsx")
     nodes_by_id = {n["id"]: n for n in result["nodes"]}
     call_targets = {
@@ -516,6 +520,7 @@ def test_extract_tsx_jsx_expression_calls_resolve():
 def test_extract_tsx_uses_tsx_grammar():
     """Wiring check: the .tsx config must use tree-sitter's `language_tsx`."""
     from graphify.extract import _TSX_CONFIG, _TS_CONFIG
+
     assert _TSX_CONFIG.ts_language_fn == "language_tsx"
     assert _TS_CONFIG.ts_language_fn == "language_typescript"
 
@@ -526,6 +531,7 @@ def test_extract_tsx_uses_tsx_grammar():
 # detect this, warn, and fall back to sequential extraction rather than
 # propagating a 290-line traceback.
 
+
 def test_extract_falls_back_to_sequential_when_parallel_returns_false(tmp_path, monkeypatch):
     """extract() must run sequential when _extract_parallel signals failure (returns False)."""
     from graphify import extract as extract_mod
@@ -561,15 +567,19 @@ def test_extract_parallel_returns_false_on_broken_pool(tmp_path, monkeypatch, ca
     from graphify import extract as extract_mod
 
     class FakePool:
-        def __init__(self, *a, **kw): pass
-        def __enter__(self): return self
-        def __exit__(self, *a): return False
+        def __init__(self, *a, **kw):
+            pass
+
+        def __enter__(self):
+            return self
+
+        def __exit__(self, *a):
+            return False
+
         def submit(self, *a, **kw):
             raise BrokenProcessPool("simulated spawn failure")
 
-    monkeypatch.setattr(
-        concurrent.futures, "ProcessPoolExecutor", lambda *a, **kw: FakePool()
-    )
+    monkeypatch.setattr(concurrent.futures, "ProcessPoolExecutor", lambda *a, **kw: FakePool())
 
     uncached = [(0, FIXTURES / "sample.py")]
     per_file: list = [None]
@@ -580,10 +590,46 @@ def submit(self, *a, **kw):
     assert "__main__" in out, "warning must hint at the Windows __main__ guard idiom"
 
 
+def test_extract_parallel_worker_warning_handles_sparse_file_indexes(tmp_path, monkeypatch, capsys):
+    """Worker-failure warnings must not index work_items by original file index."""
+    import concurrent.futures
+    from graphify import extract as extract_mod
+
+    class FakePool:
+        def __init__(self, *a, **kw):
+            pass
+
+        def __enter__(self):
+            return self
+
+        def __exit__(self, *a):
+            return False
+
+        def submit(self, fn, item):
+            future: concurrent.futures.Future = concurrent.futures.Future()
+            future.set_exception(RuntimeError("simulated worker failure"))
+            return future
+
+    monkeypatch.setattr(concurrent.futures, "ProcessPoolExecutor", lambda *a, **kw: FakePool())
+
+    source = tmp_path / "late.py"
+    source.write_text("x = 1\n", encoding="utf-8")
+    uncached = [(3, source)]
+    per_file: list = [None, None, None, None]
+
+    ok = extract_mod._extract_parallel(uncached, per_file, tmp_path, 2, 4)
+
+    assert ok is True
+    err = capsys.readouterr().err
+    assert "late.py" in err
+    assert "simulated worker failure" in err
+
+
 # ---------------------------------------------------------------------------
 # Bash extractor tests (#866)
 # ---------------------------------------------------------------------------
 
+
 def test_dispatch_includes_sh_and_json():
     assert ".sh" in _DISPATCH
     assert ".bash" in _DISPATCH
@@ -626,7 +672,7 @@ def test_extract_bash_emits_source_imports_from(tmp_path):
     helpers = tmp_path / "helpers.sh"
     helpers.write_text("# helper\n")
     script = tmp_path / "deploy.sh"
-    script.write_text(f"#!/bin/bash\nsource ./helpers.sh\nfoo() {{ echo hi; }}\n")
+    script.write_text("#!/bin/bash\nsource ./helpers.sh\nfoo() { echo hi; }\n")
     result = extract_bash(script)
     import_edges = [e for e in result["edges"] if e["relation"] == "imports_from"]
     assert len(import_edges) >= 1
@@ -661,6 +707,7 @@ def test_extract_bash_missing_grammar_returns_error():
     """extract_bash returns error dict when tree-sitter-bash not installed (mocked)."""
     import unittest.mock as mock
     import builtins
+
     real_import = builtins.__import__
 
     def patched(name, *args, **kwargs):
@@ -677,11 +724,7 @@ def patched(name, *args, **kwargs):
 def test_extract_bash_rejects_command_substitution_as_call(tmp_path):
     """`$(build)` must not be recorded as a call edge to build()."""
     script = tmp_path / "command_substitution.sh"
-    script.write_text(
-        "#!/usr/bin/env bash\n"
-        "build() { echo build; }\n"
-        "$(build)\n"
-    )
+    script.write_text("#!/usr/bin/env bash\nbuild() { echo build; }\n$(build)\n")
     result = extract_bash(script)
     labels = {n["id"]: n["label"] for n in result["nodes"]}
     call_pairs = [
@@ -695,11 +738,7 @@ def test_extract_bash_rejects_command_substitution_as_call(tmp_path):
 def test_extract_bash_process_substitution_not_recorded(tmp_path):
     """`<(helper)` (process substitution) must not be recorded as a call edge."""
     script = tmp_path / "process_substitution.sh"
-    script.write_text(
-        "#!/usr/bin/env bash\n"
-        "helper() { echo h; }\n"
-        "diff <(helper) <(helper)\n"
-    )
+    script.write_text("#!/usr/bin/env bash\nhelper() { echo h; }\ndiff <(helper) <(helper)\n")
     result = extract_bash(script)
     labels = {n["id"]: n["label"] for n in result["nodes"]}
     call_pairs = [
@@ -713,11 +752,7 @@ def test_extract_bash_process_substitution_not_recorded(tmp_path):
 def test_extract_bash_shadowing_function_is_recorded(tmp_path):
     """User-defined function shadowing an external command (install/find/etc.) must still produce a call edge."""
     script = tmp_path / "shadowing.sh"
-    script.write_text(
-        "#!/usr/bin/env bash\n"
-        "install() { echo install; }\n"
-        "deploy() { install; }\n"
-    )
+    script.write_text("#!/usr/bin/env bash\ninstall() { echo install; }\ndeploy() { install; }\n")
     result = extract_bash(script)
     labels = {n["id"]: n["label"] for n in result["nodes"]}
     call_pairs = [
@@ -739,10 +774,15 @@ def test_extract_bash_creates_entrypoint_node(tmp_path):
     assert "bash_entrypoint" in kinds, f"No bash_entrypoint node; kinds={kinds}"
     assert "file" in kinds, f"No file node; kinds={kinds}"
     file_node = next(n for n in result["nodes"] if n.get("metadata", {}).get("kind") == "file")
-    entry_node = next(n for n in result["nodes"] if n.get("metadata", {}).get("kind") == "bash_entrypoint")
+    entry_node = next(
+        n for n in result["nodes"] if n.get("metadata", {}).get("kind") == "bash_entrypoint"
+    )
     contains_edges = [
-        e for e in result["edges"]
-        if e["relation"] == "contains" and e["source"] == file_node["id"] and e["target"] == entry_node["id"]
+        e
+        for e in result["edges"]
+        if e["relation"] == "contains"
+        and e["source"] == file_node["id"]
+        and e["target"] == entry_node["id"]
     ]
     assert contains_edges, "Missing contains edge from file → bash_entrypoint"
 
@@ -750,23 +790,19 @@ def test_extract_bash_creates_entrypoint_node(tmp_path):
 def test_extract_bash_top_level_call_attributes_to_entrypoint(tmp_path):
     """Top-level function call attaches to the entrypoint node, not orphaned."""
     script = tmp_path / "top_level_call.sh"
-    script.write_text(
-        "#!/usr/bin/env bash\n"
-        "build() { echo build; }\n"
-        "build\n"
-    )
+    script.write_text("#!/usr/bin/env bash\nbuild() { echo build; }\nbuild\n")
     result = extract_bash(script)
     entry_node = next(
         (n for n in result["nodes"] if n.get("metadata", {}).get("kind") == "bash_entrypoint"),
         None,
     )
     assert entry_node is not None, "No entrypoint node created"
-    call_pairs = [
-        (e["source"], e["target"])
-        for e in result["edges"]
-        if e["relation"] == "calls"
-    ]
-    target_ids = {tgt for _, tgt in call_pairs if any(n["id"] == tgt and n["label"] == "build()" for n in result["nodes"])}
+    call_pairs = [(e["source"], e["target"]) for e in result["edges"] if e["relation"] == "calls"]
+    target_ids = {
+        tgt
+        for _, tgt in call_pairs
+        if any(n["id"] == tgt and n["label"] == "build()" for n in result["nodes"])
+    }
     source_ids_to_build = {src for src, tgt in call_pairs if tgt in target_ids}
     assert entry_node["id"] in source_ids_to_build, (
         f"Top-level call to build not attributed to entrypoint; calls={call_pairs}"
@@ -788,8 +824,12 @@ def test_extract_bash_entrypoint_no_collision_with_function_named_script(tmp_pat
     script = tmp_path / "deploy.sh"
     script.write_text("#!/usr/bin/env bash\nfunction script() { echo hi; }\n")
     result = extract_bash(script)
-    entry_nodes = [n for n in result["nodes"] if n.get("metadata", {}).get("kind") == "bash_entrypoint"]
-    func_nodes = [n for n in result["nodes"] if n.get("metadata", {}).get("kind") == "bash_function"]
+    entry_nodes = [
+        n for n in result["nodes"] if n.get("metadata", {}).get("kind") == "bash_entrypoint"
+    ]
+    func_nodes = [
+        n for n in result["nodes"] if n.get("metadata", {}).get("kind") == "bash_function"
+    ]
     assert entry_nodes, "Must have a bash_entrypoint node"
     assert func_nodes, "Must have a bash_function node for 'script'"
     entry_id = entry_nodes[0]["id"]
@@ -814,8 +854,12 @@ def test_extract_bash_nested_function_calls_recorded(tmp_path):
     )
     result = extract_bash(script)
     node_id_by_label = {n["label"].rstrip("()"): n["id"] for n in result["nodes"]}
-    assert "inner" in node_id_by_label, f"inner function must be discovered; labels={list(node_id_by_label)}"
-    assert "do_work" in node_id_by_label, f"do_work function must be discovered; labels={list(node_id_by_label)}"
+    assert "inner" in node_id_by_label, (
+        f"inner function must be discovered; labels={list(node_id_by_label)}"
+    )
+    assert "do_work" in node_id_by_label, (
+        f"do_work function must be discovered; labels={list(node_id_by_label)}"
+    )
     calls = {(e["source"], e["target"]) for e in result["edges"] if e.get("relation") == "calls"}
     inner_id = node_id_by_label["inner"]
     do_work_id = node_id_by_label["do_work"]
@@ -832,9 +876,7 @@ def test_extract_bash_source_user_defined_emits_calls_not_imports_from(tmp_path)
     helpers.write_text("#!/bin/bash\n")
     script = tmp_path / "run.sh"
     script.write_text(
-        "#!/usr/bin/env bash\n"
-        "function source() { echo 'custom source'; }\n"
-        "source ./helpers.sh\n"
+        "#!/usr/bin/env bash\nfunction source() { echo 'custom source'; }\nsource ./helpers.sh\n"
     )
     result = extract_bash(script)
     import_edges = [e for e in result["edges"] if e.get("relation") == "imports_from"]
@@ -847,6 +889,7 @@ def test_extract_bash_source_user_defined_emits_calls_not_imports_from(tmp_path)
 # JSON extractor tests (#866)
 # ---------------------------------------------------------------------------
 
+
 def test_extract_json_top_level_keys():
     result = extract_json(FIXTURES / "sample.json")
     assert "error" not in result
@@ -907,12 +950,14 @@ def test_extract_json_no_self_loops():
 
 def test_extract_bash_via_dispatch():
     from graphify.extract import _get_extractor
+
     assert _get_extractor(Path("foo.sh")) is extract_bash
     assert _get_extractor(Path("foo.bash")) is extract_bash
 
 
 def test_extract_json_via_dispatch():
     from graphify.extract import _get_extractor
+
     assert _get_extractor(Path("foo.json")) is extract_json
 
 
@@ -938,6 +983,7 @@ def test_extract_bash_node_metadata_is_sanitized():
 def test_barrel_reexport_emits_re_exports_edges():
     """export { X } from './mod' must emit re_exports edges for each named specifier."""
     from graphify.extract import extract_js
+
     result = extract_js(FIXTURES / "barrel_reexport.ts")
     reexports = [e for e in result["edges"] if e["relation"] == "re_exports"]
     targets = [e["target"] for e in reexports]
@@ -952,6 +998,7 @@ def test_barrel_reexport_emits_re_exports_edges():
 def test_barrel_reexport_emits_imports_from():
     """Barrel file must emit file-level imports_from edges to source modules."""
     from graphify.extract import extract_js
+
     result = extract_js(FIXTURES / "barrel_reexport.ts")
     imports_from = [e for e in result["edges"] if e["relation"] == "imports_from"]
     targets = [e["target"] for e in imports_from]
@@ -963,6 +1010,7 @@ def test_barrel_reexport_emits_imports_from():
 def test_barrel_reexport_context_tagged():
     """re_exports edges should have context='re-export'."""
     from graphify.extract import extract_js
+
     result = extract_js(FIXTURES / "barrel_reexport.ts")
     reexports = [e for e in result["edges"] if e["relation"] == "re_exports"]
     for e in reexports:
@@ -972,6 +1020,7 @@ def test_barrel_reexport_context_tagged():
 def test_barrel_local_exports_still_extracted():
     """export function/const in a barrel file must still create nodes."""
     from graphify.extract import extract_js
+
     result = extract_js(FIXTURES / "barrel_reexport.ts")
     labels = [n["label"] for n in result["nodes"]]
     assert "localHelper()" in labels or "localHelper" in labels
@@ -982,6 +1031,7 @@ def test_barrel_local_exports_still_extracted():
 def test_barrel_reexport_confidence_extracted():
     """All re_exports edges should have confidence=EXTRACTED."""
     from graphify.extract import extract_js
+
     result = extract_js(FIXTURES / "barrel_reexport.ts")
     reexports = [e for e in result["edges"] if e["relation"] == "re_exports"]
     for e in reexports:
@@ -1015,6 +1065,7 @@ def test_pure_export_no_from_not_treated_as_reexport():
     """export { localVar } without 'from' should NOT create re_exports edges."""
     from graphify.extract import extract_js
     import tempfile
+
     code = b"const x = 1;\nexport { x };\n"
     with tempfile.NamedTemporaryFile(suffix=".ts", delete=False) as f:
         f.write(code)
@@ -1035,8 +1086,8 @@ def test_dart_child_node_ids_are_stem_based(tmp_path):
     result = extract_dart(src_file)
 
     stem = _file_stem(src_file)  # -> "mydir.sample"
-    expected_class_nid = _make_id(stem, "MyClass")   # -> "mydir_sample_myclass"
-    expected_func_nid  = _make_id(stem, "myFunc")    # -> "mydir_sample_myfunc"
+    expected_class_nid = _make_id(stem, "MyClass")  # -> "mydir_sample_myclass"
+    expected_func_nid = _make_id(stem, "myFunc")  # -> "mydir_sample_myfunc"
 
     node_ids = {n["id"] for n in result["nodes"]}
 
@@ -1054,9 +1105,9 @@ def test_dart_child_node_ids_are_stem_based(tmp_path):
     for node in result["nodes"]:
         if node["id"] == file_nid:
             continue
-        assert "_" + stem.replace(".", "_") in node["id"] or node["id"].startswith(stem.replace(".", "_")), (
+        assert "_" + stem.replace(".", "_") in node["id"] or node["id"].startswith(
+            stem.replace(".", "_")
+        ), (
             f"Child node ID '{node['id']}' does not start with the expected stem prefix '{stem}'. "
             "This suggests an absolute path is still leaking into the ID."
         )
-
-
diff --git a/tests/test_extract_cli.py b/tests/test_extract_cli.py
index 6998bcce9..a3ca1e4ed 100644
--- a/tests/test_extract_cli.py
+++ b/tests/test_extract_cli.py
@@ -1,4 +1,5 @@
 """Tests for `graphify extract` CLI dispatch path in graphify.__main__."""
+
 from __future__ import annotations
 
 import pytest
@@ -17,9 +18,7 @@ def _make_corpus(tmp_path):
     return tmp_path
 
 
-def test_extract_exits_nonzero_when_all_semantic_chunks_fail(
-    monkeypatch, tmp_path, capsys
-):
+def test_extract_exits_nonzero_when_all_semantic_chunks_fail(monkeypatch, tmp_path, capsys):
     """When every semantic chunk errors (e.g. backend SDK not installed),
     the CLI must exit non-zero instead of silently writing an AST-only graph.
 
@@ -48,23 +47,19 @@ def _all_chunks_failed(paths, **kwargs):
             "output_tokens": 0,
         }
 
-    monkeypatch.setattr(
-        "graphify.llm.extract_corpus_parallel", _all_chunks_failed
-    )
+    monkeypatch.setattr("graphify.llm.extract_corpus_parallel", _all_chunks_failed)
     monkeypatch.setattr(mainmod, "_check_skill_version", lambda _: None)
     monkeypatch.setattr(
         mainmod.sys,
         "argv",
-        ["graphify", "extract", str(corpus), "--backend", "claude",
-         "--out", str(out_dir)],
+        ["graphify", "extract", str(corpus), "--backend", "claude", "--out", str(out_dir)],
     )
 
     with pytest.raises(SystemExit) as exc_info:
         mainmod.main()
 
     assert exc_info.value.code == 1, (
-        f"expected exit code 1 when all semantic chunks fail, "
-        f"got {exc_info.value.code}"
+        f"expected exit code 1 when all semantic chunks fail, got {exc_info.value.code}"
     )
 
     stderr = capsys.readouterr().err
@@ -78,9 +73,7 @@ def _all_chunks_failed(paths, **kwargs):
     )
 
 
-def test_extract_succeeds_when_at_least_one_chunk_completes(
-    monkeypatch, tmp_path
-):
+def test_extract_succeeds_when_at_least_one_chunk_completes(monkeypatch, tmp_path):
     """Sanity counter-test: a successful chunk run keeps exit 0. Confirms the
     new guard only fires on the all-failed path, not on every extract."""
     corpus = _make_corpus(tmp_path)
@@ -99,15 +92,12 @@ def _one_chunk_succeeded(paths, **kwargs):
             "output_tokens": 50,
         }
 
-    monkeypatch.setattr(
-        "graphify.llm.extract_corpus_parallel", _one_chunk_succeeded
-    )
+    monkeypatch.setattr("graphify.llm.extract_corpus_parallel", _one_chunk_succeeded)
     monkeypatch.setattr(mainmod, "_check_skill_version", lambda _: None)
     monkeypatch.setattr(
         mainmod.sys,
         "argv",
-        ["graphify", "extract", str(corpus), "--backend", "claude",
-         "--out", str(out_dir)],
+        ["graphify", "extract", str(corpus), "--backend", "claude", "--out", str(out_dir)],
     )
 
     # extract may still raise SystemExit at the end (clean exit code 0)
diff --git a/tests/test_global_graph.py b/tests/test_global_graph.py
index f40d9c6d5..c09b457f1 100644
--- a/tests/test_global_graph.py
+++ b/tests/test_global_graph.py
@@ -1,6 +1,7 @@
 """Tests for the global graph infrastructure (graphify/global_graph.py),
 prefix/prune helpers in graphify/build.py, and the cross-repo guard in
 graphify/dedup.py."""
+
 from __future__ import annotations
 
 import json
@@ -11,13 +12,14 @@
 
 # ── helpers ──────────────────────────────────────────────────────────────────
 
+
 def _make_graph(nodes, edges=None):
     """Build a simple nx.Graph from node dicts."""
     G = nx.Graph()
     for n in nodes:
         nid = n["id"]
         G.add_node(nid, **{k: v for k, v in n.items() if k != "id"})
-    for e in (edges or []):
+    for e in edges or []:
         G.add_edge(
             e["source"],
             e["target"],
@@ -28,6 +30,7 @@ def _make_graph(nodes, edges=None):
 
 def _graph_to_json(G, path):
     from networkx.readwrite import json_graph as jg
+
     try:
         data = jg.node_link_data(G, edges="links")
     except TypeError:
@@ -37,8 +40,10 @@ def _graph_to_json(G, path):
 
 # ── build.py helpers ──────────────────────────────────────────────────────────
 
+
 def test_prefix_graph_preserves_label():
     from graphify.build import prefix_graph_for_global
+
     G = _make_graph([{"id": "userservice", "label": "UserService", "source_file": "src/user.py"}])
     H = prefix_graph_for_global(G, "repoA")
     assert "repoA::userservice" in H.nodes
@@ -48,6 +53,7 @@ def test_prefix_graph_preserves_label():
 
 def test_prefix_graph_sets_repo_and_local_id():
     from graphify.build import prefix_graph_for_global
+
     G = _make_graph([{"id": "userservice", "label": "UserService"}])
     H = prefix_graph_for_global(G, "repoA")
     data = H.nodes["repoA::userservice"]
@@ -57,6 +63,7 @@ def test_prefix_graph_sets_repo_and_local_id():
 
 def test_prefix_graph_rewrites_edges():
     from graphify.build import prefix_graph_for_global
+
     G = _make_graph(
         [{"id": "a", "label": "A"}, {"id": "b", "label": "B"}],
         [{"source": "a", "target": "b"}],
@@ -68,6 +75,7 @@ def test_prefix_graph_rewrites_edges():
 
 def test_prune_repo_removes_correct_nodes():
     from graphify.build import prune_repo_from_graph
+
     G = nx.Graph()
     G.add_node("repoA::userservice", repo="repoA", label="UserService")
     G.add_node("repoB::userservice", repo="repoB", label="UserService")
@@ -81,6 +89,7 @@ def test_prune_repo_removes_correct_nodes():
 
 def test_prune_repo_returns_zero_if_not_present():
     from graphify.build import prune_repo_from_graph
+
     G = nx.Graph()
     G.add_node("repoA::x", repo="repoA")
     removed = prune_repo_from_graph(G, "repoB")
@@ -90,16 +99,20 @@ def test_prune_repo_returns_zero_if_not_present():
 
 # ── global_graph.py ───────────────────────────────────────────────────────────
 
+
 def test_global_add_creates_global_graph(tmp_path):
     src_graph = tmp_path / "graph.json"
     G = _make_graph([{"id": "userservice", "label": "UserService", "source_file": "src/user.py"}])
     _graph_to_json(G, src_graph)
 
     global_dir = tmp_path / ".graphify"
-    with patch("graphify.global_graph._GLOBAL_DIR", global_dir), \
-         patch("graphify.global_graph._GLOBAL_GRAPH", global_dir / "global-graph.json"), \
-         patch("graphify.global_graph._GLOBAL_MANIFEST", global_dir / "global-manifest.json"):
+    with (
+        patch("graphify.global_graph._GLOBAL_DIR", global_dir),
+        patch("graphify.global_graph._GLOBAL_GRAPH", global_dir / "global-graph.json"),
+        patch("graphify.global_graph._GLOBAL_MANIFEST", global_dir / "global-manifest.json"),
+    ):
         from graphify.global_graph import global_add
+
         result = global_add(src_graph, "repoA")
 
     assert result["skipped"] is False
@@ -116,10 +129,13 @@ def test_global_add_skip_on_unchanged_hash(tmp_path):
     _graph_to_json(G, src_graph)
 
     global_dir = tmp_path / ".graphify"
-    with patch("graphify.global_graph._GLOBAL_DIR", global_dir), \
-         patch("graphify.global_graph._GLOBAL_GRAPH", global_dir / "global-graph.json"), \
-         patch("graphify.global_graph._GLOBAL_MANIFEST", global_dir / "global-manifest.json"):
+    with (
+        patch("graphify.global_graph._GLOBAL_DIR", global_dir),
+        patch("graphify.global_graph._GLOBAL_GRAPH", global_dir / "global-graph.json"),
+        patch("graphify.global_graph._GLOBAL_MANIFEST", global_dir / "global-manifest.json"),
+    ):
         from graphify.global_graph import global_add
+
         global_add(src_graph, "repoA")
         result2 = global_add(src_graph, "repoA")
 
@@ -137,10 +153,13 @@ def test_global_add_two_repos_no_collision(tmp_path):
     global_dir = tmp_path / ".graphify"
     global_graph_path = global_dir / "global-graph.json"
     global_manifest_path = global_dir / "global-manifest.json"
-    with patch("graphify.global_graph._GLOBAL_DIR", global_dir), \
-         patch("graphify.global_graph._GLOBAL_GRAPH", global_graph_path), \
-         patch("graphify.global_graph._GLOBAL_MANIFEST", global_manifest_path):
+    with (
+        patch("graphify.global_graph._GLOBAL_DIR", global_dir),
+        patch("graphify.global_graph._GLOBAL_GRAPH", global_graph_path),
+        patch("graphify.global_graph._GLOBAL_MANIFEST", global_manifest_path),
+    ):
         from graphify.global_graph import global_add, _load_global_graph
+
         global_add(g1, "repoA")
         global_add(g2, "repoB")
         G = _load_global_graph()
@@ -156,30 +175,39 @@ def test_global_remove(tmp_path):
     _graph_to_json(G, src_graph)
 
     global_dir = tmp_path / ".graphify"
-    with patch("graphify.global_graph._GLOBAL_DIR", global_dir), \
-         patch("graphify.global_graph._GLOBAL_GRAPH", global_dir / "global-graph.json"), \
-         patch("graphify.global_graph._GLOBAL_MANIFEST", global_dir / "global-manifest.json"):
+    with (
+        patch("graphify.global_graph._GLOBAL_DIR", global_dir),
+        patch("graphify.global_graph._GLOBAL_GRAPH", global_dir / "global-graph.json"),
+        patch("graphify.global_graph._GLOBAL_MANIFEST", global_dir / "global-manifest.json"),
+    ):
         from graphify.global_graph import global_add, global_remove
+
         global_add(src_graph, "repoA")
         removed = global_remove("repoA")
 
     assert removed > 0
     # manifest should no longer list repoA - need to re-patch for list call
     global_dir2 = global_dir  # same dir
-    with patch("graphify.global_graph._GLOBAL_DIR", global_dir2), \
-         patch("graphify.global_graph._GLOBAL_GRAPH", global_dir2 / "global-graph.json"), \
-         patch("graphify.global_graph._GLOBAL_MANIFEST", global_dir2 / "global-manifest.json"):
+    with (
+        patch("graphify.global_graph._GLOBAL_DIR", global_dir2),
+        patch("graphify.global_graph._GLOBAL_GRAPH", global_dir2 / "global-graph.json"),
+        patch("graphify.global_graph._GLOBAL_MANIFEST", global_dir2 / "global-manifest.json"),
+    ):
         from graphify.global_graph import global_list
+
         repos = global_list()
     assert "repoA" not in repos
 
 
 def test_global_remove_unknown_tag_raises(tmp_path):
     global_dir = tmp_path / ".graphify"
-    with patch("graphify.global_graph._GLOBAL_DIR", global_dir), \
-         patch("graphify.global_graph._GLOBAL_GRAPH", global_dir / "global-graph.json"), \
-         patch("graphify.global_graph._GLOBAL_MANIFEST", global_dir / "global-manifest.json"):
+    with (
+        patch("graphify.global_graph._GLOBAL_DIR", global_dir),
+        patch("graphify.global_graph._GLOBAL_GRAPH", global_dir / "global-graph.json"),
+        patch("graphify.global_graph._GLOBAL_MANIFEST", global_dir / "global-manifest.json"),
+    ):
         from graphify.global_graph import global_remove
+
         with pytest.raises(KeyError):
             global_remove("nonexistent")
 
@@ -192,10 +220,13 @@ def test_global_add_collision_warning(tmp_path, capsys):
     _graph_to_json(G, g2)
 
     global_dir = tmp_path / ".graphify"
-    with patch("graphify.global_graph._GLOBAL_DIR", global_dir), \
-         patch("graphify.global_graph._GLOBAL_GRAPH", global_dir / "global-graph.json"), \
-         patch("graphify.global_graph._GLOBAL_MANIFEST", global_dir / "global-manifest.json"):
+    with (
+        patch("graphify.global_graph._GLOBAL_DIR", global_dir),
+        patch("graphify.global_graph._GLOBAL_GRAPH", global_dir / "global-graph.json"),
+        patch("graphify.global_graph._GLOBAL_MANIFEST", global_dir / "global-manifest.json"),
+    ):
         from graphify.global_graph import global_add
+
         global_add(g1, "myrepo")
         global_add(g2, "myrepo")  # different source path, same tag
 
@@ -205,8 +236,10 @@ def test_global_add_collision_warning(tmp_path, capsys):
 
 # ── dedup guard ───────────────────────────────────────────────────────────────
 
+
 def test_dedup_raises_on_cross_repo_nodes():
     from graphify.dedup import deduplicate_entities
+
     nodes = [
         {"id": "repoA::userservice", "label": "UserService", "repo": "repoA"},
         {"id": "repoB::userservice", "label": "UserService", "repo": "repoB"},
@@ -217,6 +250,7 @@ def test_dedup_raises_on_cross_repo_nodes():
 
 def test_dedup_ok_with_single_repo():
     from graphify.dedup import deduplicate_entities
+
     nodes = [
         {"id": "repoA::userservice", "label": "UserService", "repo": "repoA"},
         {"id": "repoA::auth", "label": "Auth", "repo": "repoA"},
@@ -227,6 +261,7 @@ def test_dedup_ok_with_single_repo():
 
 def test_dedup_ok_with_no_repo_attr():
     from graphify.dedup import deduplicate_entities
+
     nodes = [
         {"id": "userservice", "label": "UserService"},
         {"id": "auth", "label": "Auth"},
@@ -237,6 +272,7 @@ def test_dedup_ok_with_no_repo_attr():
 
 # ── merge-graphs prefix ───────────────────────────────────────────────────────
 
+
 def test_merge_graphs_prefixes_ids(tmp_path):
     """merge-graphs should prefix node IDs with repo name to avoid silent collision."""
     from graphify.build import prefix_graph_for_global
@@ -290,9 +326,12 @@ def test_global_add_rejects_oversized_source_graph(monkeypatch, tmp_path):
 
     global_dir = tmp_path / ".graphify"
     monkeypatch.setattr("graphify.security._MAX_GRAPH_FILE_BYTES", 8)
-    with patch("graphify.global_graph._GLOBAL_DIR", global_dir), \
-         patch("graphify.global_graph._GLOBAL_GRAPH", global_dir / "global-graph.json"), \
-         patch("graphify.global_graph._GLOBAL_MANIFEST", global_dir / "global-manifest.json"):
+    with (
+        patch("graphify.global_graph._GLOBAL_DIR", global_dir),
+        patch("graphify.global_graph._GLOBAL_GRAPH", global_dir / "global-graph.json"),
+        patch("graphify.global_graph._GLOBAL_MANIFEST", global_dir / "global-manifest.json"),
+    ):
         from graphify.global_graph import global_add
+
         with pytest.raises(ValueError, match="exceeds"):
             global_add(src_graph, "repoA")
diff --git a/tests/test_google_workspace.py b/tests/test_google_workspace.py
index 9d8cbfa4b..c23913304 100644
--- a/tests/test_google_workspace.py
+++ b/tests/test_google_workspace.py
@@ -1,4 +1,3 @@
-from pathlib import Path
 import json
 
 import graphify.google_workspace as gw
diff --git a/tests/test_hooks.py b/tests/test_hooks.py
index 873b2028c..7f43b2ab1 100644
--- a/tests/test_hooks.py
+++ b/tests/test_hooks.py
@@ -1,4 +1,5 @@
 """Tests for hooks.py - git hook install/uninstall."""
+
 import os
 import subprocess
 from types import SimpleNamespace
@@ -120,7 +121,6 @@ def test_status_shows_both_hooks(tmp_path):
     assert result.count("installed") >= 2
 
 
-
 def test_hooks_dir_resolves_relative_git_hooks_path(tmp_path, monkeypatch):
     repo = _make_git_repo(tmp_path)
 
@@ -155,15 +155,18 @@ def fake_run(*args, **kwargs):
 
     assert _hooks_dir(repo) == hooks.resolve()
 
+
 def test_hook_skips_head_on_exe():
     """Hook script must skip shebang extraction for .exe binaries (Windows)."""
     from graphify.hooks import _PYTHON_DETECT
-    assert "*.exe) _SHEBANG=" in _PYTHON_DETECT or '*.exe)' in _PYTHON_DETECT
+
+    assert "*.exe) _SHEBANG=" in _PYTHON_DETECT or "*.exe)" in _PYTHON_DETECT
 
 
 def test_hook_check_no_additionalContext(tmp_path):
     """graphify hook-check must not emit additionalContext — Codex Desktop rejects it."""
     import sys
+
     out = tmp_path / "graphify-out"
     out.mkdir()
     (out / "graph.json").write_text("{}", encoding="utf-8")
diff --git a/tests/test_hypergraph.py b/tests/test_hypergraph.py
index dda8ac793..f82d36816 100644
--- a/tests/test_hypergraph.py
+++ b/tests/test_hypergraph.py
@@ -1,11 +1,11 @@
 """Tests for hyperedge support in graphify."""
+
 from __future__ import annotations
 import json
 import tempfile
 from pathlib import Path
 
 import networkx as nx
-import pytest
 
 from graphify.build import build_from_json
 from graphify.export import attach_hyperedges, to_json
@@ -22,10 +22,22 @@
         {"id": "DigestAuth", "label": "DigestAuth", "file_type": "code", "source_file": "auth.py"},
         {"id": "Request", "label": "Request", "file_type": "code", "source_file": "http.py"},
         {"id": "Response", "label": "Response", "file_type": "code", "source_file": "http.py"},
-        {"id": "BaseClient", "label": "BaseClient", "file_type": "code", "source_file": "client.py"},
+        {
+            "id": "BaseClient",
+            "label": "BaseClient",
+            "file_type": "code",
+            "source_file": "client.py",
+        },
     ],
     "edges": [
-        {"source": "BasicAuth", "target": "Request", "relation": "uses", "confidence": "EXTRACTED", "confidence_score": 1.0, "source_file": "auth.py"},
+        {
+            "source": "BasicAuth",
+            "target": "Request",
+            "relation": "uses",
+            "confidence": "EXTRACTED",
+            "confidence_score": 1.0,
+            "source_file": "auth.py",
+        },
     ],
     "hyperedges": [
         {
@@ -55,6 +67,7 @@
 # 1. Hyperedges survive build_from_json round-trip
 # ---------------------------------------------------------------------------
 
+
 def test_build_from_json_stores_hyperedges():
     G = build_from_json(SAMPLE_EXTRACTION)
     assert "hyperedges" in G.graph
@@ -78,6 +91,7 @@ def test_build_from_json_missing_hyperedges_key():
 # 2. attach_hyperedges deduplicates by id
 # ---------------------------------------------------------------------------
 
+
 def test_attach_hyperedges_adds_new():
     G = nx.Graph()
     attach_hyperedges(G, [{"id": "auth_flow", "label": "Auth Flow", "nodes": ["A", "B", "C"]}])
@@ -94,10 +108,13 @@ def test_attach_hyperedges_deduplicates():
 
 def test_attach_hyperedges_multiple_different_ids():
     G = nx.Graph()
-    attach_hyperedges(G, [
-        {"id": "flow_a", "label": "Flow A", "nodes": ["A", "B", "C"]},
-        {"id": "flow_b", "label": "Flow B", "nodes": ["D", "E", "F"]},
-    ])
+    attach_hyperedges(
+        G,
+        [
+            {"id": "flow_a", "label": "Flow A", "nodes": ["A", "B", "C"]},
+            {"id": "flow_b", "label": "Flow B", "nodes": ["D", "E", "F"]},
+        ],
+    )
     assert len(G.graph["hyperedges"]) == 2
 
 
@@ -111,6 +128,7 @@ def test_attach_hyperedges_skips_entry_without_id():
 # 3. to_json includes hyperedges key
 # ---------------------------------------------------------------------------
 
+
 def test_to_json_includes_hyperedges():
     G = build_from_json(SAMPLE_EXTRACTION)
     communities = {0: list(G.nodes())}
@@ -139,6 +157,7 @@ def test_to_json_hyperedges_empty_when_none():
 # 4. Hyperedges loaded from graph.json via build_from_json
 # ---------------------------------------------------------------------------
 
+
 def test_hyperedges_roundtrip_via_json_file():
     """Write graph.json then reload it - hyperedges must survive."""
     G = build_from_json(SAMPLE_EXTRACTION)
@@ -149,11 +168,22 @@ def test_hyperedges_roundtrip_via_json_file():
 
     # Reload the JSON as if build_from_json were called on it
     data = json.loads(Path(path).read_text())
-    G2 = build_from_json({
-        "nodes": [{"id": n["id"], **{k: v for k, v in n.items() if k != "id"}} for n in data["nodes"]],
-        "edges": [{"source": e["source"], "target": e["target"], **{k: v for k, v in e.items() if k not in ("source", "target")}} for e in data.get("links", [])],
-        "hyperedges": data.get("hyperedges", []),
-    })
+    G2 = build_from_json(
+        {
+            "nodes": [
+                {"id": n["id"], **{k: v for k, v in n.items() if k != "id"}} for n in data["nodes"]
+            ],
+            "edges": [
+                {
+                    "source": e["source"],
+                    "target": e["target"],
+                    **{k: v for k, v in e.items() if k not in ("source", "target")},
+                }
+                for e in data.get("links", [])
+            ],
+            "hyperedges": data.get("hyperedges", []),
+        }
+    )
     assert G2.graph.get("hyperedges", []) != []
     assert G2.graph["hyperedges"][0]["id"] == "auth_flow"
 
@@ -162,13 +192,24 @@ def test_hyperedges_roundtrip_via_json_file():
 # 5. Report includes hyperedges section when hyperedges present
 # ---------------------------------------------------------------------------
 
+
 def _make_report(G):
     communities = {0: list(G.nodes())}
     cohesion = {0: 1.0}
     labels = {0: "All"}
     gods = [{"label": "BasicAuth", "degree": 2}]
     surprises = []
-    return generate(G, communities, cohesion, labels, gods, surprises, SAMPLE_DETECTION, {"input": 10, "output": 5}, ".")
+    return generate(
+        G,
+        communities,
+        cohesion,
+        labels,
+        gods,
+        surprises,
+        SAMPLE_DETECTION,
+        {"input": 10, "output": 5},
+        ".",
+    )
 
 
 def test_report_includes_hyperedges_section():
@@ -191,6 +232,7 @@ def test_report_includes_hyperedge_node_list():
 # 6. Report skips hyperedges section when none present
 # ---------------------------------------------------------------------------
 
+
 def test_report_skips_hyperedges_section_when_empty():
     extraction = {**SAMPLE_EXTRACTION, "hyperedges": []}
     G = build_from_json(extraction)
diff --git a/tests/test_import_extension_resolution.py b/tests/test_import_extension_resolution.py
index 0d1222c0a..930c2fcaa 100644
--- a/tests/test_import_extension_resolution.py
+++ b/tests/test_import_extension_resolution.py
@@ -25,8 +25,11 @@ def _write(path: Path, body: str) -> Path:
 
 
 def _import_targets(result: dict) -> set[str]:
-    return {str(e.get("target") or "") for e in result["edges"]
-            if e.get("relation") in ("imports", "imports_from")}
+    return {
+        str(e.get("target") or "")
+        for e in result["edges"]
+        if e.get("relation") in ("imports", "imports_from")
+    }
 
 
 # ── _resolve_js_module_path unit tests ──────────────────────────────────────
@@ -94,13 +97,10 @@ def test_resolve_svelte_to_svelte_ts_for_rune_files(tmp_path):
     """Svelte 5: `from './foo.svelte'` may actually point at `foo.svelte.ts`
     (a rune-only TypeScript file with no .svelte file). The resolver must
     APPEND .ts to the full filename, not swap suffixes."""
-    target = _write(tmp_path / "is-mobile.svelte.ts",
-                    "export const isMobile = () => true")
+    target = _write(tmp_path / "is-mobile.svelte.ts", "export const isMobile = () => true")
     written_as = tmp_path / "is-mobile.svelte"
     resolved = _resolve_js_module_path(written_as)
-    assert resolved == target, (
-        f"Expected resolution to is-mobile.svelte.ts; got {resolved}"
-    )
+    assert resolved == target, f"Expected resolution to is-mobile.svelte.ts; got {resolved}"
 
 
 def test_resolve_svelte_to_svelte_js_for_javascript_rune_files(tmp_path):
@@ -111,8 +111,7 @@ def test_resolve_svelte_to_svelte_js_for_javascript_rune_files(tmp_path):
     Same code path as the .svelte.ts case — the generalized resolver tries
     every extension in priority order, so JS-only and TS-only projects
     both work without special-casing."""
-    target = _write(tmp_path / "store.svelte.js",
-                    "export const count = $state(0)")
+    target = _write(tmp_path / "store.svelte.js", "export const count = $state(0)")
     written_as = tmp_path / "store.svelte"
     resolved = _resolve_js_module_path(written_as)
     assert resolved == target
@@ -128,10 +127,8 @@ def test_resolve_svelte_prefers_svelte_ts_over_svelte_js(tmp_path):
     expect tooling to read the `.svelte.ts` source. graphify is a source-
     code tool, not a runtime resolver, so source-first ordering is correct
     for our use case."""
-    ts_target = _write(tmp_path / "store.svelte.ts",
-                       "export const count = $state(0)")
-    _write(tmp_path / "store.svelte.js",
-           "export const count = 0  // build artifact")
+    ts_target = _write(tmp_path / "store.svelte.ts", "export const count = $state(0)")
+    _write(tmp_path / "store.svelte.js", "export const count = 0  // build artifact")
     written_as = tmp_path / "store.svelte"
     resolved = _resolve_js_module_path(written_as)
     assert resolved == ts_target
@@ -142,8 +139,10 @@ def test_resolve_real_svelte_file_wins_over_svelte_ts_sibling(tmp_path):
     must resolve to that — not get hijacked to a sibling `foo.svelte.ts`
     rune file. The existence-check short-circuits before any append."""
     real = _write(tmp_path / "Card.svelte", "<div>card markup</div>")
-    _write(tmp_path / "Card.svelte.ts",
-           "export const helpers = {}  // rune sibling, not the import target")
+    _write(
+        tmp_path / "Card.svelte.ts",
+        "export const helpers = {}  // rune sibling, not the import target",
+    )
     resolved = _resolve_js_module_path(real)
     assert resolved == real
 
@@ -180,10 +179,8 @@ def test_resolve_real_js_stays_js_when_ts_does_not_exist(tmp_path):
 
 def test_bare_path_import_resolves_in_ts_file(tmp_path):
     """The #716 reproducer: TS file imports a sibling without an extension."""
-    target = _write(tmp_path / "type-helpers.ts",
-                    "export type GetNestedType<T> = T")
-    importer = _write(tmp_path / "page.ts",
-                      "import type { GetNestedType } from './type-helpers'\n")
+    target = _write(tmp_path / "type-helpers.ts", "export type GetNestedType<T> = T")
+    importer = _write(tmp_path / "page.ts", "import type { GetNestedType } from './type-helpers'\n")
     result = extract_js(importer)
     expected = _make_id(str(target))
     assert expected in _import_targets(result), (
@@ -194,10 +191,8 @@ def test_bare_path_import_resolves_in_ts_file(tmp_path):
 
 def test_directory_import_resolves_to_index_ts(tmp_path):
     """`from './queue'` must resolve to `./queue/index.ts`."""
-    target = _write(tmp_path / "queue" / "index.ts",
-                    "export const enqueue = () => {}")
-    importer = _write(tmp_path / "page.ts",
-                      "import { enqueue } from './queue'\n")
+    target = _write(tmp_path / "queue" / "index.ts", "export const enqueue = () => {}")
+    importer = _write(tmp_path / "page.ts", "import { enqueue } from './queue'\n")
     result = extract_js(importer)
     expected = _make_id(str(target))
     assert expected in _import_targets(result), (
@@ -211,10 +206,8 @@ def test_directory_import_resolves_to_index_ts(tmp_path):
 
 def test_dot_svelte_import_resolves_to_dot_svelte_ts(tmp_path):
     """Svelte 5 rune file: import written as .svelte, real file is .svelte.ts."""
-    target = _write(tmp_path / "is-mobile.svelte.ts",
-                    "export const isMobile = () => true")
-    importer = _write(tmp_path / "page.ts",
-                      "import { isMobile } from './is-mobile.svelte'\n")
+    target = _write(tmp_path / "is-mobile.svelte.ts", "export const isMobile = () => true")
+    importer = _write(tmp_path / "page.ts", "import { isMobile } from './is-mobile.svelte'\n")
     result = extract_js(importer)
     expected = _make_id(str(target))
     assert expected in _import_targets(result), (
@@ -230,8 +223,7 @@ def test_explicit_ts_import_still_works(tmp_path):
     """The most common case — import with explicit .ts extension — must
     continue to work after the resolver change."""
     target = _write(tmp_path / "foo.ts", "export const x = 1")
-    importer = _write(tmp_path / "page.ts",
-                      "import { x } from './foo.ts'\n")
+    importer = _write(tmp_path / "page.ts", "import { x } from './foo.ts'\n")
     result = extract_js(importer)
     expected = _make_id(str(target))
     assert expected in _import_targets(result), (
@@ -244,8 +236,7 @@ def test_explicit_svelte_import_still_works(tmp_path):
     """Real .svelte file imports must still resolve when the .svelte file
     exists (i.e. don't accidentally redirect to a non-existent .svelte.ts)."""
     target = _write(tmp_path / "Card.svelte", "<div></div>")
-    importer = _write(tmp_path / "page.ts",
-                      "import Card from './Card.svelte'\n")
+    importer = _write(tmp_path / "page.ts", "import Card from './Card.svelte'\n")
     result = extract_js(importer)
     expected = _make_id(str(target))
     assert expected in _import_targets(result), (
@@ -259,14 +250,12 @@ def test_external_module_unchanged(tmp_path):
     """Bare module specifiers (no leading dot, no alias match) must still
     fall through to the external/last-segment path — don't accidentally
     treat 'lodash' as a relative path."""
-    importer = _write(tmp_path / "page.ts",
-                      "import _ from 'lodash-es'\n")
+    importer = _write(tmp_path / "page.ts", "import _ from 'lodash-es'\n")
     result = extract_js(importer)
     targets = _import_targets(result)
     # The target should be the bare module name, not a resolved file path
     assert "lodash_es" in targets or any("lodash" in t for t in targets), (
-        f"External module specifier should still produce an external "
-        f"reference; got {targets}"
+        f"External module specifier should still produce an external reference; got {targets}"
     )
 
 
@@ -276,19 +265,17 @@ def test_external_module_unchanged(tmp_path):
 def test_alias_import_with_bare_path_resolves(tmp_path):
     """`$lib/foo` (alias + bare path) — both layers must work together."""
     src = tmp_path / "src"
-    target = _write(src / "lib" / "type-helpers.ts",
-                    "export type X = string")
-    _write(tmp_path / "tsconfig.json",
-           '{"compilerOptions":{"paths":{"$lib":["./src/lib"],'
-           '"$lib/*":["./src/lib/*"]}}}')
+    target = _write(src / "lib" / "type-helpers.ts", "export type X = string")
+    _write(
+        tmp_path / "tsconfig.json",
+        '{"compilerOptions":{"paths":{"$lib":["./src/lib"],"$lib/*":["./src/lib/*"]}}}',
+    )
     importer_dir = src / "routes"
-    importer = _write(importer_dir / "page.ts",
-                      "import type { X } from '$lib/type-helpers'\n")
+    importer = _write(importer_dir / "page.ts", "import type { X } from '$lib/type-helpers'\n")
     result = extract_js(importer)
     expected = _make_id(str(target))
     assert expected in _import_targets(result), (
-        f"Alias + bare-path resolution failed; "
-        f"expected {expected}; got {_import_targets(result)}"
+        f"Alias + bare-path resolution failed; expected {expected}; got {_import_targets(result)}"
     )
 
 
@@ -299,10 +286,8 @@ def test_type_only_import_with_bare_path_resolves(tmp_path):
     """`import type { X } from './foo'` — type-only imports must go through
     the same resolution path as regular imports. Common in TS codebases
     that separate types into their own module."""
-    target = _write(tmp_path / "type-helpers.ts",
-                    "export type GetNestedType<T> = T")
-    importer = _write(tmp_path / "page.ts",
-                      "import type { GetNestedType } from './type-helpers'\n")
+    target = _write(tmp_path / "type-helpers.ts", "export type GetNestedType<T> = T")
+    importer = _write(tmp_path / "page.ts", "import type { GetNestedType } from './type-helpers'\n")
     result = extract_js(importer)
     expected = _make_id(str(target))
     assert expected in _import_targets(result), (
@@ -317,8 +302,7 @@ def test_named_imports_emit_symbol_edges_after_resolution(tmp_path):
     `imports_from`. The symbol-edge target_stem comes from _file_stem(resolved),
     which depends on resolution succeeding first."""
     _write(tmp_path / "utils.ts", "export const foo = 1\nexport const bar = 2")
-    importer = _write(tmp_path / "page.ts",
-                      "import { foo, bar } from './utils'\n")
+    importer = _write(tmp_path / "page.ts", "import { foo, bar } from './utils'\n")
     result = extract_js(importer)
     sym_edges = [e for e in result["edges"] if e.get("relation") == "imports"]
     targets = {str(e.get("target") or "") for e in sym_edges}
@@ -334,18 +318,16 @@ def test_named_imports_emit_symbol_edges_after_resolution(tmp_path):
 def test_alias_directory_import_resolves_to_index_ts(tmp_path):
     """`from '$lib/queue'` where queue/ is a directory under src/lib/."""
     src = tmp_path / "src"
-    target = _write(src / "lib" / "queue" / "index.ts",
-                    "export const enqueue = () => {}")
-    _write(tmp_path / "tsconfig.json",
-           '{"compilerOptions":{"paths":{"$lib":["./src/lib"],'
-           '"$lib/*":["./src/lib/*"]}}}')
-    importer = _write(src / "routes" / "page.ts",
-                      "import { enqueue } from '$lib/queue'\n")
+    target = _write(src / "lib" / "queue" / "index.ts", "export const enqueue = () => {}")
+    _write(
+        tmp_path / "tsconfig.json",
+        '{"compilerOptions":{"paths":{"$lib":["./src/lib"],"$lib/*":["./src/lib/*"]}}}',
+    )
+    importer = _write(src / "routes" / "page.ts", "import { enqueue } from '$lib/queue'\n")
     result = extract_js(importer)
     expected = _make_id(str(target))
     assert expected in _import_targets(result), (
-        f"Alias + directory resolution failed; "
-        f"expected {expected}; got {_import_targets(result)}"
+        f"Alias + directory resolution failed; expected {expected}; got {_import_targets(result)}"
     )
 
 
@@ -358,9 +340,7 @@ def test_resolve_does_not_match_partial_directory_name(tmp_path):
     bare = tmp_path / "foo"
     resolved = _resolve_js_module_path(bare)
     # Not a real file → nothing matches → returns input unchanged
-    assert resolved == bare, (
-        f"Partial-name match must not happen; got {resolved}"
-    )
+    assert resolved == bare, f"Partial-name match must not happen; got {resolved}"
 
 
 def test_resolve_directory_without_index_returns_unchanged(tmp_path):
@@ -369,16 +349,13 @@ def test_resolve_directory_without_index_returns_unchanged(tmp_path):
     pkg = tmp_path / "pkg"
     _write(pkg / "not-index.ts", "export const x = 1")
     resolved = _resolve_js_module_path(pkg)
-    assert resolved == pkg, (
-        f"Directory without index.* must return unchanged; got {resolved}"
-    )
+    assert resolved == pkg, f"Directory without index.* must return unchanged; got {resolved}"
 
 
 def test_resolve_handles_subpath_into_directory_with_index(tmp_path):
     """`./foo/sub` where ./foo/sub/index.ts exists — nested subpath.
     Common pattern for sub-modules inside a package."""
-    target = _write(tmp_path / "foo" / "sub" / "index.ts",
-                    "export const x = 1")
+    target = _write(tmp_path / "foo" / "sub" / "index.ts", "export const x = 1")
     sub = tmp_path / "foo" / "sub"
     assert _resolve_js_module_path(sub) == target
 
@@ -387,8 +364,7 @@ def test_resolve_does_not_treat_dotfile_as_extension(tmp_path):
     """Edge case: `.eslintrc` and similar dotfiles. Path('.eslintrc').suffix
     returns '' on Python 3.x for files starting with `.`. Make sure we
     don't accidentally treat a real file as bare and try to append .ts."""
-    target = _write(tmp_path / ".env-types.ts",
-                    "export const x = 1")
+    target = _write(tmp_path / ".env-types.ts", "export const x = 1")
     # Path('.env-types.ts').suffix is '.ts' — not a problem
     assert _resolve_js_module_path(target) == target
 
@@ -401,8 +377,7 @@ def test_resolve_multi_dot_helper_file(tmp_path):
 
     Before this rule, .suffix was '.shared' so neither the bare-path branch
     nor the .js/.jsx branches matched, and the import dropped to a phantom."""
-    target = _write(tmp_path / "tag-action.shared.ts",
-                    "export const apply = () => {}")
+    target = _write(tmp_path / "tag-action.shared.ts", "export const apply = () => {}")
     written_as = tmp_path / "tag-action.shared"
     assert _resolve_js_module_path(written_as) == target
 
@@ -423,15 +398,12 @@ def test_resolve_ambient_d_ts_via_bare_path(tmp_path):
 
 def test_end_to_end_multi_dot_import_resolves(tmp_path):
     """End-to-end sanity for the multi-dot pattern via the import handler."""
-    target = _write(tmp_path / "tag-action.shared.ts",
-                    "export const apply = () => {}")
-    importer = _write(tmp_path / "page.ts",
-                      "import { apply } from './tag-action.shared'\n")
+    target = _write(tmp_path / "tag-action.shared.ts", "export const apply = () => {}")
+    importer = _write(tmp_path / "page.ts", "import { apply } from './tag-action.shared'\n")
     result = extract_js(importer)
     expected = _make_id(str(target))
     assert expected in _import_targets(result), (
-        f"Multi-dot import failed end-to-end; "
-        f"expected {expected}; got {_import_targets(result)}"
+        f"Multi-dot import failed end-to-end; expected {expected}; got {_import_targets(result)}"
     )
 
 
@@ -440,13 +412,16 @@ def test_resolve_chain_alias_and_extension_compose(tmp_path):
     compose correctly: tsconfig alias maps `$lib/...` to a real dir,
     then extension resolution finds the actual file."""
     src = tmp_path / "src"
-    target = _write(src / "lib" / "hooks" / "is-mobile.svelte.ts",
-                    "export const isMobile = () => true")
-    _write(tmp_path / "tsconfig.json",
-           '{"compilerOptions":{"paths":{"$lib":["./src/lib"],'
-           '"$lib/*":["./src/lib/*"]}}}')
-    importer = _write(src / "routes" / "page.ts",
-                      "import { isMobile } from '$lib/hooks/is-mobile.svelte'\n")
+    target = _write(
+        src / "lib" / "hooks" / "is-mobile.svelte.ts", "export const isMobile = () => true"
+    )
+    _write(
+        tmp_path / "tsconfig.json",
+        '{"compilerOptions":{"paths":{"$lib":["./src/lib"],"$lib/*":["./src/lib/*"]}}}',
+    )
+    importer = _write(
+        src / "routes" / "page.ts", "import { isMobile } from '$lib/hooks/is-mobile.svelte'\n"
+    )
     result = extract_js(importer)
     expected = _make_id(str(target))
     assert expected in _import_targets(result), (
@@ -464,21 +439,25 @@ def test_ts_dynamic_import_bare_path_resolves(tmp_path):
     has its own copy of the resolution logic — distinct from the static-import
     handler and from the Svelte regex pass — and was missing the bare-path
     extension append, silently dropping every such edge."""
-    target = _write(tmp_path / "profanity.ts",
-                    "export const hasProfanity = (s: string) => false")
-    importer = _write(tmp_path / "auth-validators.ts", """\
+    target = _write(tmp_path / "profanity.ts", "export const hasProfanity = (s: string) => false")
+    importer = _write(
+        tmp_path / "auth-validators.ts",
+        """\
 export async function validate(name: string) {
     const { hasProfanity } = await import('./profanity')
     return hasProfanity(name)
 }
-""")
+""",
+    )
     result = extract_js(importer)
     expected = _make_id(str(target))
-    targets = {str(e.get("target") or "") for e in result["edges"]
-               if e.get("relation") in ("imports", "imports_from")}
+    targets = {
+        str(e.get("target") or "")
+        for e in result["edges"]
+        if e.get("relation") in ("imports", "imports_from")
+    }
     assert expected in targets, (
-        f"Bare-path TS dynamic import failed to resolve; "
-        f"expected {expected}; got {targets}"
+        f"Bare-path TS dynamic import failed to resolve; expected {expected}; got {targets}"
     )
 
 
@@ -488,22 +467,28 @@ def test_ts_dynamic_import_alias_with_bare_path_resolves(tmp_path):
     `$lib/foo.ts` after both alias substitution and extension append."""
     src = tmp_path / "src"
     target = _write(src / "lib" / "lazy-module.ts", "export const x = 1")
-    _write(tmp_path / "tsconfig.json",
-           '{"compilerOptions":{"paths":{"$lib":["./src/lib"],'
-           '"$lib/*":["./src/lib/*"]}}}')
-    importer = _write(src / "routes" / "page.ts", """\
+    _write(
+        tmp_path / "tsconfig.json",
+        '{"compilerOptions":{"paths":{"$lib":["./src/lib"],"$lib/*":["./src/lib/*"]}}}',
+    )
+    importer = _write(
+        src / "routes" / "page.ts",
+        """\
 export async function load() {
     const m = await import('$lib/lazy-module')
     return m.x
 }
-""")
+""",
+    )
     result = extract_js(importer)
     expected = _make_id(str(target))
-    targets = {str(e.get("target") or "") for e in result["edges"]
-               if e.get("relation") in ("imports", "imports_from")}
+    targets = {
+        str(e.get("target") or "")
+        for e in result["edges"]
+        if e.get("relation") in ("imports", "imports_from")
+    }
     assert expected in targets, (
-        f"Alias + bare-path dynamic import failed to resolve; "
-        f"expected {expected}; got {targets}"
+        f"Alias + bare-path dynamic import failed to resolve; expected {expected}; got {targets}"
     )
 
 
@@ -511,16 +496,19 @@ def test_dynamic_import_bare_path_resolves(tmp_path):
     """The regex pass for `import('...')` in .svelte files must also use
     the new resolver — otherwise dynamic imports of bare paths still
     produce phantom edges."""
-    target = _write(tmp_path / "Heavy.svelte.ts",
-                    "export const heavy = () => 1")
-    importer = _write(tmp_path / "page.svelte", """\
+    target = _write(tmp_path / "Heavy.svelte.ts", "export const heavy = () => 1")
+    importer = _write(
+        tmp_path / "page.svelte",
+        """\
 <script>
   const lazy = () => import('./Heavy.svelte')
 </script>
-""")
+""",
+    )
     result = extract_svelte(importer)
-    dyn_targets = {str(e.get("target") or "") for e in result["edges"]
-                   if e.get("relation") == "dynamic_import"}
+    dyn_targets = {
+        str(e.get("target") or "") for e in result["edges"] if e.get("relation") == "dynamic_import"
+    }
     expected = _make_id(str(target))
     assert expected in dyn_targets, (
         f"dynamic_import of .svelte that's actually .svelte.ts must "
diff --git a/tests/test_incremental.py b/tests/test_incremental.py
index 2e2e902e0..47006c695 100644
--- a/tests/test_incremental.py
+++ b/tests/test_incremental.py
@@ -1,11 +1,11 @@
 """Integration tests for incremental graphify extract behavior."""
+
 from __future__ import annotations
 import json
 import subprocess
 import sys
 from pathlib import Path
 
-import pytest
 
 PYTHON = sys.executable
 
diff --git a/tests/test_ingest.py b/tests/test_ingest.py
index 41128eee2..2d6774258 100644
--- a/tests/test_ingest.py
+++ b/tests/test_ingest.py
@@ -1,8 +1,6 @@
 """Tests for graphify.ingest.save_query_result"""
+
 from __future__ import annotations
-import re
-from pathlib import Path
-import pytest
 from graphify.ingest import save_query_result
 
 
@@ -49,7 +47,7 @@ def test_source_nodes_capped_at_10(tmp_path):
     out = save_query_result("q", "a", mem, source_nodes=nodes)
     content = out.read_text()
     # Only first 10 should appear in frontmatter source_nodes line
-    fm_line = [l for l in content.splitlines() if l.startswith("source_nodes:")][0]
+    fm_line = [label for label in content.splitlines() if label.startswith("source_nodes:")][0]
     assert fm_line.count('"Node') == 10
 
 
diff --git a/tests/test_install.py b/tests/test_install.py
index 5b464e8d9..23a4309e5 100644
--- a/tests/test_install.py
+++ b/tests/test_install.py
@@ -1,4 +1,5 @@
 """Tests for graphify install --platform routing."""
+
 import os
 from pathlib import Path
 import sys
@@ -20,6 +21,7 @@
 
 def _install(tmp_path, platform):
     from graphify.__main__ import install
+
     old_cwd = Path.cwd()
     try:
         os.chdir(tmp_path)
@@ -46,6 +48,7 @@ def test_install_opencode(tmp_path):
 
 def test_install_positional_platform_opencode(tmp_path, monkeypatch):
     from graphify.__main__ import main
+
     monkeypatch.chdir(tmp_path)
     monkeypatch.setattr(sys, "argv", ["graphify", "install", "opencode"])
     with patch("graphify.__main__.Path.home", return_value=tmp_path):
@@ -56,6 +59,7 @@ def test_install_positional_platform_opencode(tmp_path, monkeypatch):
 
 def test_install_project_claude_writes_project_scope(tmp_path, monkeypatch, capsys):
     from graphify.__main__ import main
+
     home = tmp_path / "home"
     project = tmp_path / "project"
     project.mkdir()
@@ -67,12 +71,15 @@ def test_install_project_claude_writes_project_scope(tmp_path, monkeypatch, caps
     assert (project / ".claude" / "CLAUDE.md").exists()
     assert not (home / ".claude" / "skills" / "graphify" / "SKILL.md").exists()
     assert ".claude/skills/graphify/SKILL.md" in (project / ".claude" / "CLAUDE.md").read_text()
-    assert "~/.claude/skills/graphify/SKILL.md" not in (project / ".claude" / "CLAUDE.md").read_text()
+    assert (
+        "~/.claude/skills/graphify/SKILL.md" not in (project / ".claude" / "CLAUDE.md").read_text()
+    )
     assert "git add .claude/" in capsys.readouterr().out
 
 
 def test_install_project_codex_writes_skill_and_agents(tmp_path, monkeypatch):
     from graphify.__main__ import main
+
     home = tmp_path / "home"
     project = tmp_path / "project"
     project.mkdir()
@@ -88,6 +95,7 @@ def test_install_project_codex_writes_skill_and_agents(tmp_path, monkeypatch):
 
 def test_claude_subcommand_project_install_and_uninstall_are_project_scoped(tmp_path, monkeypatch):
     from graphify.__main__ import main
+
     home = tmp_path / "home"
     project = tmp_path / "project"
     project.mkdir()
@@ -114,6 +122,7 @@ def test_claude_subcommand_project_install_and_uninstall_are_project_scoped(tmp_
 
 def test_codex_subcommand_project_install_and_uninstall_are_project_scoped(tmp_path, monkeypatch):
     from graphify.__main__ import main
+
     home = tmp_path / "home"
     project = tmp_path / "project"
     project.mkdir()
@@ -142,6 +151,7 @@ def test_codex_subcommand_project_install_and_uninstall_are_project_scoped(tmp_p
 
 def test_antigravity_install_project_writes_project_skill(tmp_path, monkeypatch):
     from graphify.__main__ import main
+
     home = tmp_path / "home"
     project = tmp_path / "project"
     project.mkdir()
@@ -155,6 +165,7 @@ def test_antigravity_install_project_writes_project_skill(tmp_path, monkeypatch)
 
 def test_install_help_does_not_install_default(tmp_path, monkeypatch, capsys):
     from graphify.__main__ import main
+
     monkeypatch.chdir(tmp_path)
     monkeypatch.setattr(sys, "argv", ["graphify", "install", "opencode", "--help"])
     with patch("graphify.__main__.Path.home", return_value=tmp_path):
@@ -199,6 +210,7 @@ def test_install_unknown_platform_exits(tmp_path):
 def test_codex_skill_contains_spawn_agent():
     """Codex skill file must reference spawn_agent."""
     import graphify
+
     skill = (Path(graphify.__file__).parent / "skill-codex.md").read_text()
     assert "spawn_agent" in skill
 
@@ -206,6 +218,7 @@ def test_codex_skill_contains_spawn_agent():
 def test_codex_skill_uses_graphify_with_dirty_graph_output():
     """Codex skill must keep graph-first orientation even when graph output is dirty."""
     import graphify
+
     skill = (Path(graphify.__file__).parent / "skill-codex.md").read_text()
     assert "Dirty `graphify-out/` artifacts are expected" in skill
     assert "not a reason to skip Graphify" in skill
@@ -224,6 +237,7 @@ def test_codex_agents_install_mentions_dirty_graph_output(tmp_path):
 def test_opencode_skill_contains_mention():
     """OpenCode skill file must reference @mention."""
     import graphify
+
     skill = (Path(graphify.__file__).parent / "skill-opencode.md").read_text()
     assert "@mention" in skill
 
@@ -231,6 +245,7 @@ def test_opencode_skill_contains_mention():
 def test_opencode_skill_uses_opencode_agent_guidance():
     """OpenCode skill must not reference Codex/Claude agent type names."""
     import graphify
+
     skill = (Path(graphify.__file__).parent / "skill-opencode.md").read_text()
     assert "general-purpose" not in skill
     assert 'subagent_type="general-purpose"' not in skill
@@ -245,6 +260,7 @@ def test_opencode_skill_uses_opencode_agent_guidance():
 def test_claw_skill_is_sequential():
     """OpenClaw skill file must describe sequential extraction."""
     import graphify
+
     skill = (Path(graphify.__file__).parent / "skill-claw.md").read_text()
     assert "sequential" in skill.lower()
     assert "spawn_agent" not in skill
@@ -254,8 +270,17 @@ def test_claw_skill_is_sequential():
 def test_all_skill_files_exist_in_package():
     """All installable platform skill files must be present in the installed package."""
     import graphify
+
     pkg = Path(graphify.__file__).parent
-    for name in ("skill.md", "skill-codex.md", "skill-opencode.md", "skill-claw.md", "skill-windows.md", "skill-droid.md", "skill-trae.md"):
+    for name in (
+        "skill.md",
+        "skill-codex.md",
+        "skill-opencode.md",
+        "skill-claw.md",
+        "skill-windows.md",
+        "skill-droid.md",
+        "skill-trae.md",
+    ):
         assert (pkg / name).exists(), f"Missing: {name}"
 
 
@@ -272,6 +297,7 @@ def test_codex_install_does_not_write_claude_md(tmp_path):
 
 def test_uninstall_project_removes_project_skill_only(tmp_path, monkeypatch):
     from graphify.__main__ import main
+
     home = tmp_path / "home"
     project = tmp_path / "project"
     project.mkdir()
@@ -280,9 +306,13 @@ def test_uninstall_project_removes_project_skill_only(tmp_path, monkeypatch):
     user_skill.write_text("user skill")
     monkeypatch.chdir(project)
     with patch("graphify.__main__.Path.home", return_value=home):
-        monkeypatch.setattr(sys, "argv", ["graphify", "install", "--project", "--platform", "codex"])
+        monkeypatch.setattr(
+            sys, "argv", ["graphify", "install", "--project", "--platform", "codex"]
+        )
         main()
-        monkeypatch.setattr(sys, "argv", ["graphify", "uninstall", "--project", "--platform", "codex"])
+        monkeypatch.setattr(
+            sys, "argv", ["graphify", "uninstall", "--project", "--platform", "codex"]
+        )
         main()
     assert user_skill.exists()
     assert not (project / ".agents" / "skills" / "graphify" / "SKILL.md").exists()
@@ -291,6 +321,7 @@ def test_uninstall_project_removes_project_skill_only(tmp_path, monkeypatch):
 
 def test_uninstall_project_without_platform_removes_project_installs(tmp_path, monkeypatch):
     from graphify.__main__ import main
+
     home = tmp_path / "home"
     project = tmp_path / "project"
     project.mkdir()
@@ -310,6 +341,7 @@ def test_uninstall_project_without_platform_removes_project_installs(tmp_path, m
 
 def test_antigravity_uninstall_project_removes_project_skill_only(tmp_path, monkeypatch):
     from graphify.__main__ import main
+
     home = tmp_path / "home"
     project = tmp_path / "project"
     project.mkdir()
@@ -368,13 +400,16 @@ def test_antigravity_global_uninstall_removes_gemini_config_skill(tmp_path, monk
 
 # --- always-on AGENTS.md install/uninstall tests ---
 
+
 def _agents_install(tmp_path, platform):
     from graphify.__main__ import _agents_install as _install_fn
+
     _install_fn(tmp_path, platform)
 
 
 def _agents_uninstall(tmp_path, platform=""):
     from graphify.__main__ import _agents_uninstall as _uninstall_fn
+
     _uninstall_fn(tmp_path, platform=platform)
 
 
@@ -442,6 +477,7 @@ def test_agents_uninstall_no_op_when_not_installed(tmp_path, capsys):
 
 # --- OpenCode plugin tests ---
 
+
 def test_opencode_agents_install_writes_plugin(tmp_path):
     """opencode install writes .opencode/plugins/graphify.js."""
     _agents_install(tmp_path, "opencode")
@@ -456,6 +492,7 @@ def test_opencode_agents_install_registers_plugin_in_config(tmp_path):
     config_file = tmp_path / ".opencode" / "opencode.json"
     assert config_file.exists()
     import json as _json
+
     config = _json.loads(config_file.read_text())
     assert any("graphify.js" in p for p in config.get("plugin", []))
 
@@ -463,6 +500,7 @@ def test_opencode_agents_install_registers_plugin_in_config(tmp_path):
 def test_opencode_agents_install_merges_existing_config(tmp_path):
     """opencode install preserves existing .opencode/opencode.json keys."""
     import json as _json
+
     config_file = tmp_path / ".opencode" / "opencode.json"
     config_file.parent.mkdir(parents=True, exist_ok=True)
     config_file.write_text(_json.dumps({"model": "claude-opus-4-5", "plugin": []}))
@@ -475,6 +513,7 @@ def test_opencode_agents_install_merges_existing_config(tmp_path):
 def test_opencode_agents_uninstall_removes_plugin(tmp_path):
     """opencode uninstall removes the plugin file and deregisters from opencode.json."""
     import json as _json
+
     _agents_install(tmp_path, "opencode")
     _agents_uninstall(tmp_path, platform="opencode")
     plugin = tmp_path / ".opencode" / "plugins" / "graphify.js"
@@ -487,9 +526,11 @@ def test_opencode_agents_uninstall_removes_plugin(tmp_path):
 
 # ── Cursor ────────────────────────────────────────────────────────────────────
 
+
 def test_cursor_install_writes_rule(tmp_path):
     """cursor install writes .cursor/rules/graphify.mdc."""
     from graphify.__main__ import _cursor_install
+
     _cursor_install(tmp_path)
     rule = tmp_path / ".cursor" / "rules" / "graphify.mdc"
     assert rule.exists()
@@ -501,6 +542,7 @@ def test_cursor_install_writes_rule(tmp_path):
 def test_cursor_install_idempotent(tmp_path):
     """cursor install does not overwrite an existing rule file."""
     from graphify.__main__ import _cursor_install
+
     _cursor_install(tmp_path)
     rule = tmp_path / ".cursor" / "rules" / "graphify.mdc"
     original = rule.read_text()
@@ -511,6 +553,7 @@ def test_cursor_install_idempotent(tmp_path):
 def test_cursor_uninstall_removes_rule(tmp_path):
     """cursor uninstall removes the rule file."""
     from graphify.__main__ import _cursor_install, _cursor_uninstall
+
     _cursor_install(tmp_path)
     _cursor_uninstall(tmp_path)
     rule = tmp_path / ".cursor" / "rules" / "graphify.mdc"
@@ -520,51 +563,64 @@ def test_cursor_uninstall_removes_rule(tmp_path):
 def test_cursor_uninstall_noop_if_not_installed(tmp_path):
     """cursor uninstall does nothing if rule was never written."""
     from graphify.__main__ import _cursor_uninstall
+
     _cursor_uninstall(tmp_path)  # should not raise
 
 
 # ── Gemini CLI ────────────────────────────────────────────────────────────────
 
+
 def test_gemini_install_writes_gemini_md(tmp_path):
     from graphify.__main__ import gemini_install
+
     gemini_install(tmp_path)
     md = tmp_path / "GEMINI.md"
     assert md.exists()
     assert "graphify-out/GRAPH_REPORT.md" in md.read_text()
 
+
 def test_gemini_install_writes_hook(tmp_path):
     import json as _json
     from graphify.__main__ import gemini_install
+
     gemini_install(tmp_path)
     settings = _json.loads((tmp_path / ".gemini" / "settings.json").read_text())
     hooks = settings["hooks"]["BeforeTool"]
     assert any("graphify" in str(h) for h in hooks)
 
+
 def test_gemini_install_idempotent(tmp_path):
     from graphify.__main__ import gemini_install
+
     gemini_install(tmp_path)
     gemini_install(tmp_path)
     md = tmp_path / "GEMINI.md"
     assert md.read_text().count("## graphify") == 1
 
+
 def test_gemini_install_merges_existing_gemini_md(tmp_path):
     from graphify.__main__ import gemini_install
+
     (tmp_path / "GEMINI.md").write_text("# My project rules\n")
     gemini_install(tmp_path)
     content = (tmp_path / "GEMINI.md").read_text()
     assert "# My project rules" in content
     assert "graphify-out/GRAPH_REPORT.md" in content
 
+
 def test_gemini_uninstall_removes_section(tmp_path):
     from graphify.__main__ import gemini_install, gemini_uninstall
+
     gemini_install(tmp_path)
     gemini_uninstall(tmp_path)
     md = tmp_path / "GEMINI.md"
     assert not md.exists()
 
+
 def test_gemini_uninstall_removes_hook(tmp_path):
     import json as _json
     from graphify.__main__ import gemini_install, gemini_uninstall
+
     gemini_install(tmp_path)
     gemini_uninstall(tmp_path)
     settings_path = tmp_path / ".gemini" / "settings.json"
@@ -573,6 +629,8 @@ def test_gemini_uninstall_removes_hook(tmp_path):
         hooks = settings.get("hooks", {}).get("BeforeTool", [])
         assert not any("graphify" in str(h) for h in hooks)
 
+
 def test_gemini_uninstall_noop_if_not_installed(tmp_path):
     from graphify.__main__ import gemini_uninstall
+
     gemini_uninstall(tmp_path)  # should not raise
diff --git a/tests/test_install_strings.py b/tests/test_install_strings.py
index 5e0037089..93cb2a7d8 100644
--- a/tests/test_install_strings.py
+++ b/tests/test_install_strings.py
@@ -8,6 +8,7 @@
 (issue #580). This file locks in the query-first policy so a future revert
 or partial change is caught by CI.
 """
+
 from __future__ import annotations
 import json
 
@@ -73,6 +74,7 @@ def test_no_install_surface_demands_reading_the_full_report_first():
     are legitimate platform metadata, not the bug.
     """
     import re
+
     banned = [
         # "read ... GRAPH_REPORT.md ... before"
         re.compile(r"read[^.\n]{0,80}GRAPH_REPORT\.md[^.\n]{0,80}before", re.IGNORECASE),
@@ -87,10 +89,7 @@ def test_no_install_surface_demands_reading_the_full_report_first():
             m = pattern.search(text)
             if m:
                 hits.append((name, m.group(0)))
-    assert not hits, (
-        f"banned report-first phrasing reappeared: {hits}. "
-        f"This regresses issue #580."
-    )
+    assert not hits, f"banned report-first phrasing reappeared: {hits}. This regresses issue #580."
 
 
 def test_report_is_still_referenced_as_fallback():
@@ -127,6 +126,7 @@ def test_agents_section_does_not_skip_dirty_graph_output():
 
 def test_how_it_works_clarifies_code_only_semantic_extraction():
     from pathlib import Path
+
     doc = (Path(__file__).parent.parent / "docs" / "how-it-works.md").read_text(encoding="utf-8")
     assert "Code files are not sent to the LLM semantic extractor" in doc
     assert "code files, Pass 3 is skipped entirely" in doc
diff --git a/tests/test_install_upgrade.py b/tests/test_install_upgrade.py
index 09ee3d81e..fa085dbdc 100644
--- a/tests/test_install_upgrade.py
+++ b/tests/test_install_upgrade.py
@@ -9,6 +9,7 @@
 section, run the installer, and assert that the on-disk file now contains
 the new query-first wording and does not contain the old report-first text.
 """
+
 from __future__ import annotations
 import json
 from pathlib import Path
@@ -85,9 +86,7 @@ def _assert_no_report_first(text: str, ctx: str) -> None:
 
 
 def _assert_query_first(text: str, ctx: str) -> None:
-    assert "graphify query" in text, (
-        f"{ctx}: new 'graphify query' guidance missing after upgrade"
-    )
+    assert "graphify query" in text, f"{ctx}: new 'graphify query' guidance missing after upgrade"
 
 
 def test_claude_install_upgrades_stale_section(tmp_path, monkeypatch):
@@ -95,7 +94,9 @@ def test_claude_install_upgrades_stale_section(tmp_path, monkeypatch):
     `graphify claude install` again after upgrading to a fixed package."""
     monkeypatch.chdir(tmp_path)
     claude_md = tmp_path / "CLAUDE.md"
-    claude_md.write_text("# My Project\n\nSome description.\n\n" + _OLD_CLAUDE_SECTION, encoding="utf-8")
+    claude_md.write_text(
+        "# My Project\n\nSome description.\n\n" + _OLD_CLAUDE_SECTION, encoding="utf-8"
+    )
     monkeypatch.setattr(mainmod, "_check_skill_version", lambda _: None)
 
     mainmod.claude_install(tmp_path)
@@ -125,11 +126,7 @@ def test_claude_install_upgrades_stale_hook_payload(tmp_path, monkeypatch):
                     "hooks": [
                         {
                             "type": "command",
-                            "command": (
-                                "case x in *) "
-                                + _OLD_HOOK_PAYLOAD_SNIPPET
-                                + " esac"
-                            ),
+                            "command": ("case x in *) " + _OLD_HOOK_PAYLOAD_SNIPPET + " esac"),
                         }
                     ],
                 }
@@ -142,9 +139,7 @@ def test_claude_install_upgrades_stale_hook_payload(tmp_path, monkeypatch):
     mainmod.claude_install(tmp_path)
 
     new_settings_text = settings.read_text(encoding="utf-8")
-    assert _OLD_HOOK_PAYLOAD_SNIPPET not in new_settings_text, (
-        "stale hook payload survived upgrade"
-    )
+    assert _OLD_HOOK_PAYLOAD_SNIPPET not in new_settings_text, "stale hook payload survived upgrade"
     assert "graphify query" in new_settings_text, (
         "new hook payload should route to `graphify query`"
     )
diff --git a/tests/test_js_import_resolution.py b/tests/test_js_import_resolution.py
index 414cf8d87..a1ac2ff7e 100644
--- a/tests/test_js_import_resolution.py
+++ b/tests/test_js_import_resolution.py
@@ -18,10 +18,7 @@ def _extract_for(paths: list[Path], root: Path):
 
 def _has_edge(result: dict, source: str, target: str, relation: str = "imports_from") -> bool:
     expected = (_make_id(source), _make_id(target), relation)
-    actual = {
-        (edge["source"], edge["target"], edge["relation"])
-        for edge in result["edges"]
-    }
+    actual = {(edge["source"], edge["target"], edge["relation"]) for edge in result["edges"]}
     return expected in actual
 
 
@@ -33,10 +30,7 @@ def _has_symbol_edge(
     relation: str = "imports",
 ) -> bool:
     expected = (_make_id(source), _make_id(_file_stem(Path(target_file)), symbol), relation)
-    actual = {
-        (edge["source"], edge["target"], edge["relation"])
-        for edge in result["edges"]
-    }
+    actual = {(edge["source"], edge["target"], edge["relation"]) for edge in result["edges"]}
     return expected in actual
 
 
@@ -53,10 +47,7 @@ def _has_symbol_to_symbol_edge(
         _make_id(_file_stem(Path(target_file)), target_symbol),
         relation,
     )
-    actual = {
-        (edge["source"], edge["target"], edge["relation"])
-        for edge in result["edges"]
-    }
+    actual = {(edge["source"], edge["target"], edge["relation"]) for edge in result["edges"]}
     return expected in actual
 
 
@@ -198,7 +189,9 @@ def test_ts_reexported_type_alias_resolves_imported_symbol_to_origin(tmp_path: P
 
 
 def test_ts_reexported_abstract_class_resolves_imported_symbol_to_origin(tmp_path: Path):
-    target = _write(tmp_path / "src/lib/foo.ts", "export abstract class Foo { abstract run(): void }\n")
+    target = _write(
+        tmp_path / "src/lib/foo.ts", "export abstract class Foo { abstract run(): void }\n"
+    )
     barrel = _write(tmp_path / "src/lib/index.ts", "export { Foo } from './foo'\n")
     consumer = _write(
         tmp_path / "src/routes/page.ts",
@@ -228,7 +221,9 @@ def test_ts_const_alias_reexport_resolves_imported_symbol_to_origin(tmp_path: Pa
     assert _has_symbol_edge(result, "src/routes/page.ts", "src/lib/foo.ts", "Foo")
 
 
-def test_ts_local_const_alias_then_named_reexport_resolves_imported_symbol_to_origin(tmp_path: Path):
+def test_ts_local_const_alias_then_named_reexport_resolves_imported_symbol_to_origin(
+    tmp_path: Path,
+):
     target = _write(tmp_path / "src/lib/foo.ts", "export function makeFoo() { return {} }\n")
     barrel = _write(
         tmp_path / "src/lib/index.ts",
@@ -307,7 +302,9 @@ def test_ts_import_alias_call_from_same_named_local_symbol_targets_origin(tmp_pa
 
 
 def test_svelte_rune_import_resolves_svelte_ts_file(tmp_path: Path):
-    target = _write(tmp_path / "src/lib/hooks/is-mobile.svelte.ts", "export const isMobile = true\n")
+    target = _write(
+        tmp_path / "src/lib/hooks/is-mobile.svelte.ts", "export const isMobile = true\n"
+    )
     importer = _write(
         tmp_path / "src/routes/page.ts",
         "import { isMobile } from '../lib/hooks/is-mobile.svelte'\nconsole.log(isMobile)\n",
@@ -482,8 +479,12 @@ def _norm(label: str) -> str:
         if edge.get("relation") == "references"
     }
 
-    assert _has_symbol_to_symbol_edge(result, "src/lib/impl.ts", "DataProcessor", "src/lib/base.ts", "BaseProcessor", "inherits")
-    assert _has_symbol_to_symbol_edge(result, "src/lib/impl.ts", "DataProcessor", "src/lib/base.ts", "IProcessor", "implements")
+    assert _has_symbol_to_symbol_edge(
+        result, "src/lib/impl.ts", "DataProcessor", "src/lib/base.ts", "BaseProcessor", "inherits"
+    )
+    assert _has_symbol_to_symbol_edge(
+        result, "src/lib/impl.ts", "DataProcessor", "src/lib/base.ts", "IProcessor", "implements"
+    )
     assert ("run", "Payload", "parameter_type") in reference_contexts
     assert ("run", "Result", "return_type") in reference_contexts
     assert ("run", "Payload", "generic_arg") in reference_contexts
diff --git a/tests/test_languages.py b/tests/test_languages.py
index fefc13807..1c38a9845 100644
--- a/tests/test_languages.py
+++ b/tests/test_languages.py
@@ -1,14 +1,30 @@
 """Tests for language extractors: Java, C, C++, Ruby, C#, Kotlin, Scala, PHP, Swift, Go, Julia, Fortran, JS/TS, .NET project files."""
+
 from __future__ import annotations
 from pathlib import Path
-import pytest
 from graphify.extract import (
-    extract_java, extract_c, extract_cpp, extract_ruby,
-    extract_csharp, extract_kotlin, extract_scala, extract_php,
-    extract_swift, extract_go, extract_julia, extract_js, extract_fortran,
-    extract_groovy, extract_sln, extract_csproj, extract_razor,
-    extract_dm, extract_dmi, extract_dmm, extract_dmf,
+    extract_java,
+    extract_c,
+    extract_cpp,
+    extract_csproj,
+    extract_dm,
+    extract_dmf,
+    extract_dmi,
+    extract_dmm,
+    extract_fortran,
+    extract_go,
+    extract_groovy,
+    extract_js,
+    extract_julia,
+    extract_kotlin,
+    extract_php,
     extract_powershell,
+    extract_razor,
+    extract_ruby,
+    extract_csharp,
+    extract_scala,
+    extract_sln,
+    extract_swift,
 )
 
 FIXTURES = Path(__file__).parent / "fixtures"
@@ -17,14 +33,17 @@
 def _labels(r):
     return [n["label"] for n in r["nodes"]]
 
+
 def _relations(r):
     return {e["relation"] for e in r["edges"]}
 
+
 def _calls(r):
     node_by_id = {n["id"]: n["label"] for n in r["nodes"]}
     return {
         (node_by_id.get(e["source"], e["source"]), node_by_id.get(e["target"], e["target"]))
-        for e in r["edges"] if e["relation"] == "calls"
+        for e in r["edges"]
+        if e["relation"] == "calls"
     }
 
 
@@ -36,7 +55,8 @@ def _references(r):
             node_by_id.get(e["target"], e["target"]),
             e,
         )
-        for e in r["edges"] if e["relation"] == "references"
+        for e in r["edges"]
+        if e["relation"] == "references"
     ]
 
 
@@ -63,29 +83,36 @@ def _edge_labels(result: dict, relation: str, context: str | None = None) -> set
             continue
         if context is not None and edge.get("context") != context:
             continue
-        pairs.add((labels.get(edge["source"], edge["source"]), labels.get(edge["target"], edge["target"])))
+        pairs.add(
+            (labels.get(edge["source"], edge["source"]), labels.get(edge["target"], edge["target"]))
+        )
     return pairs
 
 
 # ── Java ──────────────────────────────────────────────────────────────────────
 
+
 def test_java_no_error():
     r = extract_java(FIXTURES / "sample.java")
     assert "error" not in r
 
+
 def test_java_finds_class():
     r = extract_java(FIXTURES / "sample.java")
-    assert any("DataProcessor" in l for l in _labels(r))
+    assert any("DataProcessor" in label for label in _labels(r))
+
 
 def test_java_finds_interface():
     r = extract_java(FIXTURES / "sample.java")
-    assert any("Processor" in l for l in _labels(r))
+    assert any("Processor" in label for label in _labels(r))
+
 
 def test_java_finds_methods():
     r = extract_java(FIXTURES / "sample.java")
     labels = _labels(r)
-    assert any("addItem" in l for l in labels)
-    assert any("process" in l for l in labels)
+    assert any("addItem" in label for label in labels)
+    assert any("process" in label for label in labels)
+
 
 def test_java_finds_imports():
     r = extract_java(FIXTURES / "sample.java")
@@ -98,6 +125,7 @@ def test_java_import_edges_have_import_context():
     assert import_edges
     assert all(e.get("context") == "import" for e in import_edges)
 
+
 def test_java_no_dangling_edges():
     r = extract_java(FIXTURES / "sample.java")
     node_ids = {n["id"] for n in r["nodes"]}
@@ -107,24 +135,29 @@ def test_java_no_dangling_edges():
 
 # ── C ────────────────────────────────────────────────────────────────────────
 
+
 def test_c_no_error():
     r = extract_c(FIXTURES / "sample.c")
     assert "error" not in r
 
+
 def test_c_finds_functions():
     r = extract_c(FIXTURES / "sample.c")
     labels = _labels(r)
-    assert any("process" in l for l in labels)
-    assert any("main" in l for l in labels)
+    assert any("process" in label for label in labels)
+    assert any("main" in label for label in labels)
+
 
 def test_c_finds_includes():
     r = extract_c(FIXTURES / "sample.c")
     assert "imports" in _relations(r)
 
+
 def test_c_emits_calls():
     r = extract_c(FIXTURES / "sample.c")
     assert any(e["relation"] == "calls" for e in r["edges"])
 
+
 def test_c_calls_are_extracted():
     r = extract_c(FIXTURES / "sample.c")
     for e in r["edges"]:
@@ -154,19 +187,23 @@ def test_c_call_edges_have_call_context():
 
 # ── C++ ───────────────────────────────────────────────────────────────────────
 
+
 def test_cpp_no_error():
     r = extract_cpp(FIXTURES / "sample.cpp")
     assert "error" not in r
 
+
 def test_cpp_finds_class():
     r = extract_cpp(FIXTURES / "sample.cpp")
-    assert any("HttpClient" in l for l in _labels(r))
+    assert any("HttpClient" in label for label in _labels(r))
+
 
 def test_cpp_finds_methods():
     r = extract_cpp(FIXTURES / "sample.cpp")
     labels = _labels(r)
     # C++ extractor captures the constructor and public-visible methods
-    assert any("HttpClient" in l for l in labels)
+    assert any("HttpClient" in label for label in labels)
+
 
 def test_cpp_finds_includes():
     r = extract_cpp(FIXTURES / "sample.cpp")
@@ -200,7 +237,8 @@ def test_cpp_class_inherits_edge():
     found = any(
         "AuthedHttpClient" in node_by_id.get(e["source"], "")
         and "HttpClient" in node_by_id.get(e["target"], "")
-        for e in r["edges"] if e["relation"] == "inherits"
+        for e in r["edges"]
+        if e["relation"] == "inherits"
     )
     assert found, "AuthedHttpClient should have inherits edge to HttpClient"
 
@@ -212,67 +250,80 @@ def test_cpp_struct_inherits_edge():
     found = any(
         "RetryingHttpClient" in node_by_id.get(e["source"], "")
         and "HttpClient" in node_by_id.get(e["target"], "")
-        for e in r["edges"] if e["relation"] == "inherits"
+        for e in r["edges"]
+        if e["relation"] == "inherits"
     )
     assert found, "RetryingHttpClient (struct) should have inherits edge to HttpClient"
 
 
 # ── Ruby ─────────────────────────────────────────────────────────────────────
 
+
 def test_ruby_no_error():
     r = extract_ruby(FIXTURES / "sample.rb")
     assert "error" not in r
 
+
 def test_ruby_finds_class():
     r = extract_ruby(FIXTURES / "sample.rb")
-    assert any("ApiClient" in l for l in _labels(r))
+    assert any("ApiClient" in label for label in _labels(r))
+
 
 def test_ruby_finds_methods():
     r = extract_ruby(FIXTURES / "sample.rb")
     labels = _labels(r)
-    assert any("get" in l for l in labels)
-    assert any("post" in l for l in labels)
+    assert any("get" in label for label in labels)
+    assert any("post" in label for label in labels)
+
 
 def test_ruby_finds_function():
     r = extract_ruby(FIXTURES / "sample.rb")
-    assert any("parse_response" in l for l in _labels(r))
+    assert any("parse_response" in label for label in _labels(r))
 
 
 # ── C# ───────────────────────────────────────────────────────────────────────
 
+
 def test_csharp_no_error():
     r = extract_csharp(FIXTURES / "sample.cs")
     assert "error" not in r
 
+
 def test_csharp_finds_class():
     r = extract_csharp(FIXTURES / "sample.cs")
-    assert any("DataProcessor" in l for l in _labels(r))
+    assert any("DataProcessor" in label for label in _labels(r))
+
 
 def test_csharp_finds_interface():
     r = extract_csharp(FIXTURES / "sample.cs")
-    assert any("IProcessor" in l for l in _labels(r))
+    assert any("IProcessor" in label for label in _labels(r))
+
 
 def test_csharp_finds_methods():
     r = extract_csharp(FIXTURES / "sample.cs")
     labels = _labels(r)
-    assert any("Process" in l for l in labels)
+    assert any("Process" in label for label in labels)
+
 
 def test_csharp_finds_usings():
     r = extract_csharp(FIXTURES / "sample.cs")
     assert "imports" in _relations(r)
 
+
 def test_csharp_inherits_edge():
     r = extract_csharp(FIXTURES / "sample.cs")
     inherits = [e for e in r["edges"] if e["relation"] == "inherits"]
     assert len(inherits) >= 1
 
+
 def test_csharp_implements_iprocessor():
     r = extract_csharp(FIXTURES / "sample.cs")
     node_by_id = {n["id"]: n["label"] for n in r["nodes"]}
     found = any(
-        "DataProcessor" in node_by_id.get(e["source"], "") and
-        "IProcessor" in node_by_id.get(e["target"], "")
-        for e in r["edges"] if e["relation"] == "implements"
+        "DataProcessor" in node_by_id.get(e["source"], "")
+        and "IProcessor" in node_by_id.get(e["target"], "")
+        for e in r["edges"]
+        if e["relation"] == "implements"
     )
     assert found, "DataProcessor should have implements edge to IProcessor"
 
@@ -320,7 +371,8 @@ def test_csharp_call_edges_have_call_context():
         "Process" in node_by_id.get(e["source"], "")
         and "Validate" in node_by_id.get(e["target"], "")
         and e.get("context") == "call"
-        for e in r["edges"] if e["relation"] == "calls"
+        for e in r["edges"]
+        if e["relation"] == "calls"
     ), "C# call edges should retain call context"
 
 
@@ -333,27 +385,33 @@ def test_csharp_import_edges_have_import_context():
 
 # ── Kotlin ───────────────────────────────────────────────────────────────────
 
+
 def test_kotlin_no_error():
     r = extract_kotlin(FIXTURES / "sample.kt")
     assert "error" not in r
 
+
 def test_kotlin_finds_class():
     r = extract_kotlin(FIXTURES / "sample.kt")
-    assert any("HttpClient" in l for l in _labels(r))
+    assert any("HttpClient" in label for label in _labels(r))
+
 
 def test_kotlin_finds_data_class():
     r = extract_kotlin(FIXTURES / "sample.kt")
-    assert any("Config" in l for l in _labels(r))
+    assert any("Config" in label for label in _labels(r))
+
 
 def test_kotlin_finds_methods():
     r = extract_kotlin(FIXTURES / "sample.kt")
     labels = _labels(r)
-    assert any("get" in l for l in labels)
-    assert any("post" in l for l in labels)
+    assert any("get" in label for label in labels)
+    assert any("post" in label for label in labels)
+
 
 def test_kotlin_finds_function():
     r = extract_kotlin(FIXTURES / "sample.kt")
-    assert any("createClient" in l for l in _labels(r))
+    assert any("createClient" in label for label in _labels(r))
+
 
 def test_kotlin_emits_in_file_calls():
     """Regression test for the call-walker `simple_identifier` /
@@ -384,23 +442,27 @@ def test_kotlin_parameter_return_generic_and_field_contexts():
 
 # ── Scala ─────────────────────────────────────────────────────────────────────
 
+
 def test_scala_no_error():
     r = extract_scala(FIXTURES / "sample.scala")
     assert "error" not in r
 
+
 def test_scala_finds_class():
     r = extract_scala(FIXTURES / "sample.scala")
-    assert any("HttpClient" in l for l in _labels(r))
+    assert any("HttpClient" in label for label in _labels(r))
+
 
 def test_scala_finds_object():
     r = extract_scala(FIXTURES / "sample.scala")
-    assert any("HttpClientFactory" in l for l in _labels(r))
+    assert any("HttpClientFactory" in label for label in _labels(r))
+
 
 def test_scala_finds_methods():
     r = extract_scala(FIXTURES / "sample.scala")
     labels = _labels(r)
-    assert any("get" in l for l in labels)
-    assert any("post" in l for l in labels)
+    assert any("get" in label for label in labels)
+    assert any("post" in label for label in labels)
 
 
 def test_scala_import_edges_have_import_context():
@@ -440,23 +502,28 @@ def test_scala_call_edges_have_call_context():
 
 # ── PHP ───────────────────────────────────────────────────────────────────────
 
+
 def test_php_no_error():
     r = extract_php(FIXTURES / "sample.php")
     assert "error" not in r
 
+
 def test_php_finds_class():
     r = extract_php(FIXTURES / "sample.php")
-    assert any("ApiClient" in l for l in _labels(r))
+    assert any("ApiClient" in label for label in _labels(r))
+
 
 def test_php_finds_methods():
     r = extract_php(FIXTURES / "sample.php")
     labels = _labels(r)
-    assert any("get" in l for l in labels)
-    assert any("post" in l for l in labels)
+    assert any("get" in label for label in labels)
+    assert any("post" in label for label in labels)
+
 
 def test_php_finds_function():
     r = extract_php(FIXTURES / "sample.php")
-    assert any("parseResponse" in l for l in _labels(r))
+    assert any("parseResponse" in label for label in _labels(r))
+
 
 def test_php_finds_imports():
     r = extract_php(FIXTURES / "sample.php")
@@ -476,55 +543,67 @@ def test_php_call_edges_have_call_context():
     assert call_edges
     assert all(e.get("context") == "call" for e in call_edges)
 
+
 def test_php_finds_static_property_access():
     r = extract_php(FIXTURES / "sample_php_static_prop.php")
     assert "uses_static_prop" in _relations(r)
 
+
 def test_php_static_prop_target_is_holding_class():
     r = extract_php(FIXTURES / "sample_php_static_prop.php")
     node_by_id = {n["id"]: n["label"] for n in r["nodes"]}
     uses_prop = [
         (node_by_id.get(e["source"], e["source"]), node_by_id.get(e["target"], e["target"]))
-        for e in r["edges"] if e["relation"] == "uses_static_prop"
+        for e in r["edges"]
+        if e["relation"] == "uses_static_prop"
     ]
     assert any("DefaultPalette" in tgt for _, tgt in uses_prop)
 
+
 def test_php_finds_config_helper_call():
     r = extract_php(FIXTURES / "sample_php_config.php")
     assert "uses_config" in _relations(r)
 
+
 def test_php_config_helper_target_matches_first_segment():
     r = extract_php(FIXTURES / "sample_php_config.php")
     node_by_id = {n["id"]: n["label"] for n in r["nodes"]}
     uses_cfg = [
         (node_by_id.get(e["source"], e["source"]), node_by_id.get(e["target"], e["target"]))
-        for e in r["edges"] if e["relation"] == "uses_config"
+        for e in r["edges"]
+        if e["relation"] == "uses_config"
     ]
     assert any("Throttle" in tgt for _, tgt in uses_cfg)
 
+
 def test_php_finds_container_bind():
     r = extract_php(FIXTURES / "sample_php_container.php")
     assert "bound_to" in _relations(r)
 
+
 def test_php_container_bind_links_contract_to_implementation():
     r = extract_php(FIXTURES / "sample_php_container.php")
     node_by_id = {n["id"]: n["label"] for n in r["nodes"]}
     bound = [
         (node_by_id.get(e["source"], e["source"]), node_by_id.get(e["target"], e["target"]))
-        for e in r["edges"] if e["relation"] == "bound_to"
+        for e in r["edges"]
+        if e["relation"] == "bound_to"
     ]
     assert any("PaymentGateway" in src and "StripeGateway" in tgt for src, tgt in bound)
 
+
 def test_php_finds_event_listeners():
     r = extract_php(FIXTURES / "sample_php_listen.php")
     assert "listened_by" in _relations(r)
 
+
 def test_php_event_listener_links_event_to_listener():
     r = extract_php(FIXTURES / "sample_php_listen.php")
     node_by_id = {n["id"]: n["label"] for n in r["nodes"]}
     listened = [
         (node_by_id.get(e["source"], e["source"]), node_by_id.get(e["target"], e["target"]))
-        for e in r["edges"] if e["relation"] == "listened_by"
+        for e in r["edges"]
+        if e["relation"] == "listened_by"
     ]
     assert any("UserRegistered" in src and "SendWelcomeEmail" in tgt for src, tgt in listened)
 
@@ -545,31 +624,38 @@ def test_php_property_parameter_and_return_contexts():
 
 # ── Swift ────────────────────────────────────────────────────────────────────
 
+
 def test_swift_no_error():
     r = extract_swift(FIXTURES / "sample.swift")
     assert "error" not in r
 
+
 def test_swift_finds_class():
     r = extract_swift(FIXTURES / "sample.swift")
-    assert any("DataProcessor" in l for l in _labels(r))
+    assert any("DataProcessor" in label for label in _labels(r))
+
 
 def test_swift_finds_protocol():
     r = extract_swift(FIXTURES / "sample.swift")
-    assert any("Processor" in l for l in _labels(r))
+    assert any("Processor" in label for label in _labels(r))
+
 
 def test_swift_finds_struct():
     r = extract_swift(FIXTURES / "sample.swift")
-    assert any("Config" in l for l in _labels(r))
+    assert any("Config" in label for label in _labels(r))
+
 
 def test_swift_finds_methods():
     r = extract_swift(FIXTURES / "sample.swift")
     labels = _labels(r)
-    assert any("addItem" in l for l in labels)
-    assert any("process" in l for l in labels)
+    assert any("addItem" in label for label in labels)
+    assert any("process" in label for label in labels)
+
 
 def test_swift_finds_function():
     r = extract_swift(FIXTURES / "sample.swift")
-    assert any("createProcessor" in l for l in _labels(r))
+    assert any("createProcessor" in label for label in _labels(r))
+
 
 def test_swift_finds_imports():
     r = extract_swift(FIXTURES / "sample.swift")
@@ -582,42 +668,51 @@ def test_swift_import_edges_have_import_context():
     assert import_edges
     assert all(e.get("context") == "import" for e in import_edges)
 
+
 def test_swift_no_dangling_edges():
     r = extract_swift(FIXTURES / "sample.swift")
     node_ids = {n["id"] for n in r["nodes"]}
     for e in r["edges"]:
         assert e["source"] in node_ids
 
+
 def test_swift_finds_actor():
     r = extract_swift(FIXTURES / "sample.swift")
-    assert any("CacheManager" in l for l in _labels(r))
+    assert any("CacheManager" in label for label in _labels(r))
+
 
 def test_swift_finds_enum():
     r = extract_swift(FIXTURES / "sample.swift")
-    assert any("NetworkError" in l for l in _labels(r))
+    assert any("NetworkError" in label for label in _labels(r))
+
 
 def test_swift_finds_enum_methods():
     r = extract_swift(FIXTURES / "sample.swift")
-    assert any("describe" in l for l in _labels(r))
+    assert any("describe" in label for label in _labels(r))
+
 
 def test_swift_finds_enum_cases():
     r = extract_swift(FIXTURES / "sample.swift")
     labels = _labels(r)
-    assert any("timeout" in l for l in labels)
-    assert any("connectionFailed" in l for l in labels)
+    assert any("timeout" in label for label in labels)
+    assert any("connectionFailed" in label for label in labels)
+
 
 def test_swift_enum_cases_have_case_of_edge():
     r = extract_swift(FIXTURES / "sample.swift")
     case_edges = [e for e in r["edges"] if e["relation"] == "case_of"]
     assert len(case_edges) >= 2
 
+
 def test_swift_finds_deinit():
     r = extract_swift(FIXTURES / "sample.swift")
-    assert any("deinit" in l for l in _labels(r))
+    assert any("deinit" in label for label in _labels(r))
+
 
 def test_swift_finds_subscript():
     r = extract_swift(FIXTURES / "sample.swift")
-    assert any("subscript" in l for l in _labels(r))
+    assert any("subscript" in label for label in _labels(r))
+
 
 def test_swift_extension_methods_attach_to_type():
     r = extract_swift(FIXTURES / "sample.swift")
@@ -632,6 +727,7 @@ def test_swift_extension_methods_attach_to_type():
             break
     assert found, "extension method isValid should attach to Config"
 
+
 def test_swift_extension_does_not_duplicate_type_node():
     r = extract_swift(FIXTURES / "sample.swift")
     config_nodes = [n for n in r["nodes"] if n["label"] == "Config"]
@@ -660,11 +756,13 @@ def test_swift_parameter_return_generic_and_field_contexts():
     assert ("run", "DataProcessor") in _edge_labels(r, "references", "generic_arg")
     assert ("DataProcessor", "Result") in _edge_labels(r, "references", "field")
 
+
 def test_swift_emits_calls():
     r = extract_swift(FIXTURES / "sample.swift")
     calls = _calls(r)
     assert any("process" in src and "validate" in tgt for src, tgt in calls)
 
+
 def test_swift_call_edges_have_call_context():
     r = extract_swift(FIXTURES / "sample.swift")
     call_edges = _edges_with_relation(r, "calls")
@@ -678,36 +776,45 @@ def test_swift_extension_across_files_merges_into_canonical_type():
     node ids carry the file stem, so without a corpus-level merge each file
     would emit its own Foo."""
     from graphify.extract import extract
+
     paths = sorted((FIXTURES / "swift_cross_file").glob("*.swift"))
     r = extract(paths, cache_root=Path("/tmp/graphify-test-no-cache"))
     foo_nodes = [n for n in r["nodes"] if n["label"] == "Foo"]
-    assert len(foo_nodes) == 1, f"Foo should appear once, got {len(foo_nodes)}: {[n['id'] for n in foo_nodes]}"
+    assert len(foo_nodes) == 1, (
+        f"Foo should appear once, got {len(foo_nodes)}: {[n['id'] for n in foo_nodes]}"
+    )
     foo_id = foo_nodes[0]["id"]
     method_targets = {
-        e["target"] for e in r["edges"]
-        if e["relation"] == "method" and e["source"] == foo_id
+        e["target"] for e in r["edges"] if e["relation"] == "method" and e["source"] == foo_id
     }
     method_labels = {n["label"] for n in r["nodes"] if n["id"] in method_targets}
-    assert any("one" in l for l in method_labels), f"one() should attach to Foo, got {method_labels}"
-    assert any("two" in l for l in method_labels), f"extension method two() should attach to Foo, got {method_labels}"
+    assert any("one" in label for label in method_labels), (
+        f"one() should attach to Foo, got {method_labels}"
+    )
+    assert any("two" in label for label in method_labels), (
+        f"extension method two() should attach to Foo, got {method_labels}"
+    )
 
 
 # ── Elixir ────────────────────────────────────────────────────────────────────
 
-from graphify.extract import extract_elixir
+from graphify.extract import extract_elixir  # noqa: E402
+
 
 def test_elixir_finds_module():
     r = extract_elixir(FIXTURES / "sample.ex")
     assert "error" not in r
     labels = [n["label"] for n in r["nodes"]]
-    assert any("MyApp.Accounts.User" in l for l in labels)
+    assert any("MyApp.Accounts.User" in label for label in labels)
+
 
 def test_elixir_finds_functions():
     r = extract_elixir(FIXTURES / "sample.ex")
     labels = [n["label"] for n in r["nodes"]]
-    assert any("create" in l for l in labels)
-    assert any("find" in l for l in labels)
-    assert any("validate" in l for l in labels)
+    assert any("create" in label for label in labels)
+    assert any("find" in label for label in labels)
+    assert any("validate" in label for label in labels)
+
 
 def test_elixir_finds_imports():
     r = extract_elixir(FIXTURES / "sample.ex")
@@ -721,11 +828,14 @@ def test_elixir_import_edges_have_import_context():
     assert import_edges
     assert all(e.get("context") == "import" for e in import_edges)
 
+
 def test_elixir_finds_calls():
     r = extract_elixir(FIXTURES / "sample.ex")
     calls = {(e["source"], e["target"]) for e in r["edges"] if e["relation"] == "calls"}
     labels = {n["id"]: n["label"] for n in r["nodes"]}
-    assert any("create" in labels.get(src, "") and "validate" in labels.get(tgt, "") for src, tgt in calls)
+    assert any(
+        "create" in labels.get(src, "") and "validate" in labels.get(tgt, "") for src, tgt in calls
+    )
 
 
 def test_elixir_call_edges_have_call_context():
@@ -734,6 +844,7 @@ def test_elixir_call_edges_have_call_context():
     assert call_edges
     assert all(e.get("context") == "call" for e in call_edges)
 
+
 def test_elixir_method_edges():
     r = extract_elixir(FIXTURES / "sample.ex")
     methods = [e for e in r["edges"] if e["relation"] == "method"]
@@ -741,7 +852,7 @@ def test_elixir_method_edges():
 
 
 # ── Objective-C ──────────────────────────────────────────────────────────────
-from graphify.extract import extract_objc
+from graphify.extract import extract_objc  # noqa: E402
 
 
 def test_objc_finds_interface():
@@ -759,7 +870,7 @@ def test_objc_finds_subclass():
 def test_objc_finds_methods():
     r = extract_objc(FIXTURES / "sample.m")
     labels = [n["label"] for n in r["nodes"]]
-    assert any("speak" in l or "fetch" in l or "initWithName" in l for l in labels)
+    assert any("speak" in label or "fetch" in label or "initWithName" in label for label in labels)
 
 
 def test_objc_finds_imports():
@@ -804,6 +915,7 @@ def test_objc_no_dangling_edges():
 # Go
 # ---------------------------------------------------------------------------
 
+
 def test_go_receiver_methods_share_type_node():
     """Methods on the same receiver type must share one canonical type node."""
     r = extract_go(FIXTURES / "sample.go")
@@ -811,6 +923,7 @@ def test_go_receiver_methods_share_type_node():
     # Both Start() and Stop() are on *Server — should produce exactly one Server node
     assert len(server_nodes) == 1
 
+
 def test_go_receiver_uses_pkg_scope():
     """Type node id should be scoped to directory, not file stem."""
     r = extract_go(FIXTURES / "sample.go")
@@ -824,6 +937,7 @@ def test_go_receiver_uses_pkg_scope():
 # Julia
 # ---------------------------------------------------------------------------
 
+
 def test_julia_finds_module():
     r = extract_julia(FIXTURES / "sample.jl")
     labels = [n["label"] for n in r["nodes"]]
@@ -846,14 +960,14 @@ def test_julia_finds_abstract_type():
 def test_julia_finds_functions():
     r = extract_julia(FIXTURES / "sample.jl")
     labels = [n["label"] for n in r["nodes"]]
-    assert any("area" in l for l in labels)
-    assert any("distance" in l for l in labels)
+    assert any("area" in label for label in labels)
+    assert any("distance" in label for label in labels)
 
 
 def test_julia_finds_short_function():
     r = extract_julia(FIXTURES / "sample.jl")
     labels = [n["label"] for n in r["nodes"]]
-    assert any("perimeter" in l for l in labels)
+    assert any("perimeter" in label for label in labels)
 
 
 def test_julia_finds_imports():
@@ -910,6 +1024,7 @@ def test_julia_no_dangling_edges():
 
 # ── Fortran extractor ────────────────────────────────────────────────────────
 
+
 def test_fortran_finds_module():
     r = extract_fortran(FIXTURES / "sample.f90")
     assert "error" not in r
@@ -920,14 +1035,14 @@ def test_fortran_finds_module():
 def test_fortran_finds_subroutines():
     r = extract_fortran(FIXTURES / "sample.f90")
     labels = [n["label"] for n in r["nodes"]]
-    assert any("circle_area" in l for l in labels)
-    assert any("print_area" in l for l in labels)
+    assert any("circle_area" in label for label in labels)
+    assert any("print_area" in label for label in labels)
 
 
 def test_fortran_finds_function():
     r = extract_fortran(FIXTURES / "sample.f90")
     labels = [n["label"] for n in r["nodes"]]
-    assert any("distance" in l for l in labels)
+    assert any("distance" in label for label in labels)
 
 
 def test_fortran_finds_program():
@@ -957,7 +1072,11 @@ def test_fortran_finds_calls():
 def test_fortran_case_insensitive_names():
     r = extract_fortran(FIXTURES / "sample.f90")
     labels = [n["label"] for n in r["nodes"]]
-    assert all(l == l.lower() or "(" in l for l in labels if l.endswith(("()", "")) and not "." in l)
+    assert all(
+        label == label.lower() or "(" in label
+        for label in labels
+        if label.endswith(("()", "")) and "." not in label
+    )
     assert "geometry" in labels
     assert "main" in labels
 
@@ -986,7 +1105,7 @@ def test_fortran_capital_F_parses_preprocessed():
     assert "error" not in r
     labels = [n["label"] for n in r["nodes"]]
     assert "shapes" in labels
-    assert any("compute_volume" in l for l in labels)
+    assert any("compute_volume" in label for label in labels)
 
 
 # ── PowerShell ───────────────────────────────────────────────────────────────
@@ -1000,7 +1119,7 @@ def test_powershell_finds_class_and_method():
     r = extract_powershell(FIXTURES / "sample.ps1")
     labels = [n["label"] for n in r["nodes"]]
     assert "DataProcessor" in labels
-    assert any("Transform" in l for l in labels)
+    assert any("Transform" in label for label in labels)
 
 
 def test_powershell_property_field_type_context():
@@ -1017,10 +1136,12 @@ def test_powershell_method_parameter_and_return_type_contexts():
 
 # ── TypeScript dynamic imports ───────────────────────────────────────────────
 
+
 def test_ts_dynamic_import_no_error():
     r = extract_js(FIXTURES / "dynamic_import.ts")
     assert "error" not in r
 
+
 def test_ts_dynamic_import_extracts_edges():
     """Dynamic import() calls inside functions should produce imports_from edges."""
     r = extract_js(FIXTURES / "dynamic_import.ts")
@@ -1028,56 +1149,71 @@ def test_ts_dynamic_import_extracts_edges():
     targets = {e["target"] for e in dyn_edges}
     # Should find: static ./logger, dynamic ./mayaEngine.js, dynamic ./queue.js
     assert any("logger" in t for t in targets), f"Missing static import of logger: {targets}"
-    assert any("mayaengine" in t.lower() for t in targets), f"Missing dynamic import of mayaEngine: {targets}"
+    assert any("mayaengine" in t.lower() for t in targets), (
+        f"Missing dynamic import of mayaEngine: {targets}"
+    )
     assert any("queue" in t.lower() for t in targets), f"Missing dynamic import of queue: {targets}"
 
+
 def test_ts_dynamic_import_confidence():
     """Dynamic imports should have EXTRACTED confidence (they are deterministic string literals)."""
     r = extract_js(FIXTURES / "dynamic_import.ts")
-    dyn_edges = [e for e in r["edges"]
-                 if e["relation"] == "imports_from"
-                 and "mayaengine" in e["target"].lower()]
+    dyn_edges = [
+        e
+        for e in r["edges"]
+        if e["relation"] == "imports_from" and "mayaengine" in e["target"].lower()
+    ]
     assert len(dyn_edges) >= 1
     assert dyn_edges[0]["confidence"] == "EXTRACTED"
 
+
 def test_ts_dynamic_import_source_is_function():
     """Dynamic import edge source should be the enclosing function, not the file."""
     r = extract_js(FIXTURES / "dynamic_import.ts")
     node_labels = {n["id"]: n["label"] for n in r["nodes"]}
-    dyn_edges = [e for e in r["edges"]
-                 if e["relation"] == "imports_from"
-                 and "mayaengine" in e["target"].lower()]
+    dyn_edges = [
+        e
+        for e in r["edges"]
+        if e["relation"] == "imports_from" and "mayaengine" in e["target"].lower()
+    ]
     assert len(dyn_edges) >= 1
     src_label = node_labels.get(dyn_edges[0]["source"], "")
     assert "processInbound" in src_label, f"Expected processInbound as source, got {src_label}"
 
+
 def test_ts_no_dynamic_import_in_sync_fn():
     """Functions without dynamic imports should not get spurious imports_from edges."""
     r = extract_js(FIXTURES / "dynamic_import.ts")
     node_ids = {n["label"]: n["id"] for n in r["nodes"]}
     sync_nid = node_ids.get("syncOnly()")
     if sync_nid:
-        sync_imports = [e for e in r["edges"]
-                        if e["source"] == sync_nid and e["relation"] == "imports_from"]
+        sync_imports = [
+            e for e in r["edges"] if e["source"] == sync_nid and e["relation"] == "imports_from"
+        ]
         assert len(sync_imports) == 0
 
+
 def test_ts_dynamic_template_literal_skipped():
     """Dynamic template literals (with ${}) must not produce an imports_from edge."""
     r = extract_js(FIXTURES / "dynamic_import.ts")
     targets = {e["target"] for e in r["edges"] if e["relation"] == "imports_from"}
     # loadHandler uses `./handlers/${handlerName}` — no static path, must be absent
-    assert not any("handler" in t.lower() and "$" in t for t in targets), \
+    assert not any("handler" in t.lower() and "$" in t for t in targets), (
         f"Garbage edge from dynamic template literal found: {targets}"
+    )
     # More robust: no target should contain a brace character
-    assert not any("{" in t or "}" in t for t in targets), \
+    assert not any("{" in t or "}" in t for t in targets), (
         f"Target contains unresolved template expression: {targets}"
+    )
+
 
 def test_ts_static_template_literal_resolved():
     """Static template literals (no ${}) should resolve the same as a plain string."""
     r = extract_js(FIXTURES / "dynamic_import.ts")
     targets = {e["target"] for e in r["edges"] if e["relation"] == "imports_from"}
-    assert any("statichelper" in t.lower() for t in targets), \
+    assert any("statichelper" in t.lower() for t in targets), (
         f"Static template literal import not resolved: {targets}"
+    )
 
 
 def test_js_local_const_does_not_emit_phantom_node(tmp_path):
@@ -1151,25 +1287,29 @@ def test_ts_local_const_does_not_emit_phantom_node(tmp_path):
 
 # ── Markdown ─────────────────────────────────────────────────────────────────
 
-from graphify.extract import extract_markdown
+from graphify.extract import extract_markdown  # noqa: E402
+
 
 def test_markdown_no_error():
     r = extract_markdown(FIXTURES / "deploy_guide.md")
     assert "error" not in r
 
+
 def test_markdown_finds_headings():
     r = extract_markdown(FIXTURES / "deploy_guide.md")
     labels = _labels(r)
-    assert any("Deploy Guide" in l for l in labels)
-    assert any("Prerequisites" in l for l in labels)
-    assert any("Full Deploy" in l for l in labels)
-    assert any("Rollback" in l for l in labels)
+    assert any("Deploy Guide" in label for label in labels)
+    assert any("Prerequisites" in label for label in labels)
+    assert any("Full Deploy" in label for label in labels)
+    assert any("Rollback" in label for label in labels)
+
 
 def test_markdown_finds_nested_heading():
     """### Database Migration is nested under ## Full Deploy."""
     r = extract_markdown(FIXTURES / "deploy_guide.md")
     labels = _labels(r)
-    assert any("Database Migration" in l for l in labels)
+    assert any("Database Migration" in label for label in labels)
+
 
 def test_markdown_skips_fenced_code_blocks():
     """Fenced code blocks should NOT emit nodes (#1077).
@@ -1225,6 +1365,7 @@ def test_markdown_fenced_heading_not_parsed():
     assert not any("Not A Heading" in l for l in labels), \
         f"fenced '## Not A Heading' was incorrectly parsed as a node: {labels}"
 
+
 def test_markdown_no_dangling_edges():
     r = extract_markdown(FIXTURES / "deploy_guide.md")
     node_ids = {n["id"] for n in r["nodes"]}
@@ -1242,14 +1383,14 @@ def test_groovy_no_error():
 
 def test_groovy_finds_class():
     r = extract_groovy(FIXTURES / "sample.groovy")
-    assert any("SampleService" in l for l in _labels(r))
+    assert any("SampleService" in label for label in _labels(r))
 
 
 def test_groovy_finds_methods():
     r = extract_groovy(FIXTURES / "sample.groovy")
     labels = _labels(r)
-    assert any("process" in l for l in labels)
-    assert any("reset" in l for l in labels)
+    assert any("process" in label for label in labels)
+    assert any("reset" in label for label in labels)
 
 
 def test_groovy_finds_imports():
@@ -1273,18 +1414,18 @@ def test_groovy_no_dangling_edges():
 
 def test_groovy_spock_finds_class():
     r = extract_groovy(FIXTURES / "sample_spock.groovy")
-    assert any("SampleSpec" in l for l in _labels(r))
+    assert any("SampleSpec" in label for label in _labels(r))
 
 
 def test_groovy_spock_finds_feature_methods():
     r = extract_groovy(FIXTURES / "sample_spock.groovy")
-    feature_labels = [l for l in _labels(r) if l.startswith('"')]
+    feature_labels = [label for label in _labels(r) if label.startswith('"')]
     assert len(feature_labels) >= 2
 
 
 def test_groovy_spock_finds_method_with_apostrophe():
     r = extract_groovy(FIXTURES / "sample_spock.groovy")
-    assert any("it's" in l for l in _labels(r))
+    assert any("it's" in label for label in _labels(r))
 
 
 def test_groovy_spock_preserves_import_edges():
@@ -1308,8 +1449,8 @@ def test_dm_no_error():
 def test_dm_finds_global_proc():
     r = extract_dm(FIXTURES / "sample.dm")
     labels = _labels(r)
-    assert any(l == "log_event()" for l in labels)
-    assert any(l == "RunTest()" for l in labels)
+    assert any(label == "log_event()" for label in labels)
+    assert any(label == "RunTest()" for label in labels)
 
 def test_dm_finds_type_definition():
     r = extract_dm(FIXTURES / "sample.dm")
@@ -1389,7 +1530,7 @@ def test_dmi_no_error():
 def test_dmi_emits_state_nodes():
     r = extract_dmi(FIXTURES / "sample.dmi")
     labels = _labels(r)
-    assert any(l == '"mob"' for l in labels)
+    assert any(label == '"mob"' for label in labels)
 
 def test_dmi_state_contained_by_file():
     r = extract_dmi(FIXTURES / "sample.dmi")
@@ -1463,68 +1604,83 @@ def test_dmf_no_dangling_edges():
 
 # -- .NET project files (.sln, .csproj, .razor) -------------------------------
 
+
 def test_sln_no_error():
     r = extract_sln(FIXTURES / "sample.sln")
     assert "error" not in r
 
+
 def test_sln_finds_projects():
     r = extract_sln(FIXTURES / "sample.sln")
     labels = _labels(r)
-    assert any("WebApi" in l for l in labels)
-    assert any("Domain" in l for l in labels)
+    assert any("WebApi" in label for label in labels)
+    assert any("Domain" in label for label in labels)
+
 
 def test_sln_contains_edges():
     r = extract_sln(FIXTURES / "sample.sln")
     assert "contains" in _relations(r)
 
+
 def test_sln_project_dependency_edges():
     r = extract_sln(FIXTURES / "sample.sln")
     assert "imports" in _relations(r)
 
+
 def test_csproj_no_error():
     r = extract_csproj(FIXTURES / "sample.csproj")
     assert "error" not in r
 
+
 def test_csproj_finds_packages():
     r = extract_csproj(FIXTURES / "sample.csproj")
     labels = _labels(r)
-    assert any("MediatR" in l for l in labels)
-    assert any("FluentValidation" in l for l in labels)
+    assert any("MediatR" in label for label in labels)
+    assert any("FluentValidation" in label for label in labels)
+
 
 def test_csproj_finds_project_references():
     r = extract_csproj(FIXTURES / "sample.csproj")
     labels = _labels(r)
-    assert any("Domain.csproj" in l for l in labels)
+    assert any("Domain.csproj" in label for label in labels)
+
 
 def test_csproj_finds_target_framework():
     r = extract_csproj(FIXTURES / "sample.csproj")
-    assert any("net8.0" in l for l in _labels(r))
+    assert any("net8.0" in label for label in _labels(r))
+
 
 def test_csproj_finds_sdk():
     r = extract_csproj(FIXTURES / "sample.csproj")
-    assert any("Microsoft.NET.Sdk.Web" in l for l in _labels(r))
+    assert any("Microsoft.NET.Sdk.Web" in label for label in _labels(r))
+
 
 def test_razor_no_error():
     r = extract_razor(FIXTURES / "sample.razor")
     assert "error" not in r
 
+
 def test_razor_finds_using_directives():
     r = extract_razor(FIXTURES / "sample.razor")
     assert "imports" in _relations(r)
 
+
 def test_razor_finds_component_references():
     r = extract_razor(FIXTURES / "sample.razor")
     assert "calls" in _relations(r)
 
+
 def test_razor_finds_inherits():
     r = extract_razor(FIXTURES / "sample.razor")
     assert "inherits" in _relations(r)
 
+
 def test_razor_finds_code_block_methods():
     r = extract_razor(FIXTURES / "sample.razor")
     labels = _labels(r)
-    assert any("IncrementCount" in l for l in labels)
-    assert any("LoadData" in l for l in labels)
+    assert any("IncrementCount" in label for label in labels)
+    assert any("LoadData" in label for label in labels)
+
 
 def test_razor_no_dangling_edges():
     r = extract_razor(FIXTURES / "sample.razor")
diff --git a/tests/test_llm_backends.py b/tests/test_llm_backends.py
index d2ee058c4..7cd670e83 100644
--- a/tests/test_llm_backends.py
+++ b/tests/test_llm_backends.py
@@ -224,6 +224,7 @@ def test_response_is_hollow_accepts_real_extraction():
 
 def _fake_openai_response(content, *, finish_reason="stop", prompt_tokens=100, completion_tokens=0):
     """Build a minimal stand-in for an `openai` SDK ChatCompletion response."""
+
     class _Usage:
         def __init__(self):
             self.prompt_tokens = prompt_tokens
@@ -257,11 +258,12 @@ class _FakeOpenAI:
         def __init__(self, *_, **__):
             self.chat = self
             self.completions = self
+
         def create(self, **__):
             return fake_resp
 
     fake_module = types.ModuleType("openai")
-    fake_module.OpenAI = _FakeOpenAI
+    setattr(fake_module, "OpenAI", _FakeOpenAI)
     monkeypatch.setitem(sys.modules, "openai", fake_module)
 
 
@@ -274,8 +276,13 @@ def test_call_openai_compat_relabels_empty_content_as_length(monkeypatch):
     _install_fake_openai(monkeypatch, fake_resp)
 
     result = llm._call_openai_compat(
-        "http://localhost:11434/v1", "ollama", "qwen2.5-coder:7b",
-        "user msg", temperature=0, max_completion_tokens=8192, backend="ollama",
+        "http://localhost:11434/v1",
+        "ollama",
+        "qwen2.5-coder:7b",
+        "user msg",
+        temperature=0,
+        max_completion_tokens=8192,
+        backend="ollama",
     )
     assert result["finish_reason"] == "length", (
         "empty content from a 'successful' call must be re-labelled so the "
@@ -288,8 +295,13 @@ def test_call_openai_compat_relabels_none_content_as_length(monkeypatch):
     _install_fake_openai(monkeypatch, fake_resp)
 
     result = llm._call_openai_compat(
-        "http://localhost:11434/v1", "ollama", "qwen2.5-coder:7b",
-        "u", temperature=0, max_completion_tokens=8192, backend="ollama",
+        "http://localhost:11434/v1",
+        "ollama",
+        "qwen2.5-coder:7b",
+        "u",
+        temperature=0,
+        max_completion_tokens=8192,
+        backend="ollama",
     )
     assert result["finish_reason"] == "length"
 
@@ -298,12 +310,19 @@ def test_call_openai_compat_relabels_unparseable_json_as_length(monkeypatch):
     # A half-generated response: `{"nodes": [{"id":` parses to {} (empty
     # fragment) via _parse_llm_json's JSONDecodeError fallback. That is also
     # hollow and must trigger bisection.
-    fake_resp = _fake_openai_response('{"nodes": [{"id":', finish_reason="stop", completion_tokens=20)
+    fake_resp = _fake_openai_response(
+        '{"nodes": [{"id":', finish_reason="stop", completion_tokens=20
+    )
     _install_fake_openai(monkeypatch, fake_resp)
 
     result = llm._call_openai_compat(
-        "http://localhost:11434/v1", "ollama", "qwen2.5-coder:7b",
-        "u", temperature=0, max_completion_tokens=8192, backend="ollama",
+        "http://localhost:11434/v1",
+        "ollama",
+        "qwen2.5-coder:7b",
+        "u",
+        temperature=0,
+        max_completion_tokens=8192,
+        backend="ollama",
     )
     assert result["finish_reason"] == "length"
 
@@ -318,8 +337,13 @@ def test_call_openai_compat_preserves_real_finish_reason(monkeypatch):
     _install_fake_openai(monkeypatch, fake_resp)
 
     result = llm._call_openai_compat(
-        "http://localhost:11434/v1", "k", "m",
-        "u", temperature=0, max_completion_tokens=8192, backend="kimi",
+        "http://localhost:11434/v1",
+        "k",
+        "m",
+        "u",
+        temperature=0,
+        max_completion_tokens=8192,
+        backend="kimi",
     )
     assert result["finish_reason"] == "stop"
     assert result["nodes"] == [{"id": "a"}]
@@ -352,7 +376,7 @@ def create(self, **kwargs):
             )
 
     fake_module = types.ModuleType("openai")
-    fake_module.OpenAI = _FakeOpenAI
+    setattr(fake_module, "OpenAI", _FakeOpenAI)
     monkeypatch.setitem(sys.modules, "openai", fake_module)
     return captured
 
@@ -363,8 +387,13 @@ def test_ollama_extra_body_sets_num_ctx_and_keep_alive(monkeypatch):
     monkeypatch.delenv("GRAPHIFY_OLLAMA_KEEP_ALIVE", raising=False)
 
     llm._call_openai_compat(
-        "http://localhost:11434/v1", "ollama", "qwen2.5-coder:7b",
-        "user msg", temperature=0, max_completion_tokens=8192, backend="ollama",
+        "http://localhost:11434/v1",
+        "ollama",
+        "qwen2.5-coder:7b",
+        "user msg",
+        temperature=0,
+        max_completion_tokens=8192,
+        backend="ollama",
     )
 
     assert "extra_body" in captured, "extra_body must be sent to Ollama"
@@ -387,8 +416,13 @@ def test_ollama_num_ctx_scales_with_small_token_budget(monkeypatch):
     small_chunk_msg = "x" * 32_000
 
     llm._call_openai_compat(
-        "http://localhost:11434/v1", "ollama", "qwen2.5-coder:7b",
-        small_chunk_msg, temperature=0, max_completion_tokens=16384, backend="ollama",
+        "http://localhost:11434/v1",
+        "ollama",
+        "qwen2.5-coder:7b",
+        small_chunk_msg,
+        temperature=0,
+        max_completion_tokens=16384,
+        backend="ollama",
     )
 
     num_ctx = captured["extra_body"]["options"]["num_ctx"]
@@ -407,8 +441,13 @@ def test_ollama_num_ctx_env_override(monkeypatch):
     monkeypatch.delenv("GRAPHIFY_OLLAMA_KEEP_ALIVE", raising=False)
 
     llm._call_openai_compat(
-        "http://localhost:11434/v1", "ollama", "qwen2.5-coder:7b",
-        "u", temperature=0, max_completion_tokens=8192, backend="ollama",
+        "http://localhost:11434/v1",
+        "ollama",
+        "qwen2.5-coder:7b",
+        "u",
+        temperature=0,
+        max_completion_tokens=8192,
+        backend="ollama",
     )
 
     assert captured["extra_body"]["options"]["num_ctx"] == 65536
@@ -418,8 +457,13 @@ def test_non_ollama_backend_gets_no_num_ctx_extra_body(monkeypatch):
     captured = _install_capturing_openai(monkeypatch)
 
     llm._call_openai_compat(
-        "https://api.openai.com/v1", "sk-test", "gpt-4.1-mini",
-        "u", temperature=0, max_completion_tokens=8192, backend="openai",
+        "https://api.openai.com/v1",
+        "sk-test",
+        "gpt-4.1-mini",
+        "u",
+        temperature=0,
+        max_completion_tokens=8192,
+        backend="openai",
     )
 
     eb = captured.get("extra_body")
@@ -445,8 +489,14 @@ def fake_extract(chunk, *_, **__):
     with patch("graphify.llm.extract_files_direct", side_effect=fake_extract):
         with patch("graphify.llm.ThreadPoolExecutor") as mock_pool:
             result = llm.extract_corpus_parallel(
-                files, backend="ollama", api_key="ollama", model="qwen2.5-coder:7b",
-                root=tmp_path, token_budget=None, chunk_size=2, max_concurrency=4,
+                files,
+                backend="ollama",
+                api_key="ollama",
+                model="qwen2.5-coder:7b",
+                root=tmp_path,
+                token_budget=None,
+                chunk_size=2,
+                max_concurrency=4,
             )
 
     mock_pool.assert_not_called()
@@ -469,8 +519,14 @@ def test_extract_corpus_parallel_ollama_parallel_env_restores_concurrency(tmp_pa
             )()
             try:
                 llm.extract_corpus_parallel(
-                    files, backend="ollama", api_key="ollama", model="m",
-                    root=tmp_path, token_budget=None, chunk_size=2, max_concurrency=4,
+                    files,
+                    backend="ollama",
+                    api_key="ollama",
+                    model="m",
+                    root=tmp_path,
+                    token_budget=None,
+                    chunk_size=2,
+                    max_concurrency=4,
                 )
             except Exception:
                 pass  # mock scaffolding may not be complete; we only care about the call
@@ -496,16 +552,24 @@ def fake_extract(chunk, *_, **__):
             # Hollow response: looks successful, finish_reason already
             # rewritten to "length" by _call_openai_compat.
             return {
-                "nodes": [], "edges": [], "hyperedges": [],
-                "input_tokens": 100, "output_tokens": 0,
-                "model": "m", "finish_reason": "length",
+                "nodes": [],
+                "edges": [],
+                "hyperedges": [],
+                "input_tokens": 100,
+                "output_tokens": 0,
+                "model": "m",
+                "finish_reason": "length",
             }
         return _ok(nodes=[{"id": f.stem} for f in chunk])
 
     with patch("graphify.llm.extract_files_direct", side_effect=fake_extract):
         result = llm._extract_with_adaptive_retry(
-            files, backend="ollama", api_key="ollama", model="qwen2.5-coder:7b",
-            root=tmp_path, max_depth=3,
+            files,
+            backend="ollama",
+            api_key="ollama",
+            model="qwen2.5-coder:7b",
+            root=tmp_path,
+            max_depth=3,
         )
 
     assert len(result["nodes"]) == 4, (
diff --git a/tests/test_llm_parser.py b/tests/test_llm_parser.py
index c643807b9..ad8bd07dd 100644
--- a/tests/test_llm_parser.py
+++ b/tests/test_llm_parser.py
@@ -11,8 +11,6 @@
 import json
 from unittest.mock import patch
 
-import pytest
-
 from graphify import llm
 
 
diff --git a/tests/test_mcp_ingest.py b/tests/test_mcp_ingest.py
index 8e46b170d..2f447ac54 100644
--- a/tests/test_mcp_ingest.py
+++ b/tests/test_mcp_ingest.py
@@ -1,10 +1,10 @@
 """Tests for graphify.mcp_ingest — MCP config file extraction."""
+
 from __future__ import annotations
 
 import json
 from pathlib import Path
 
-import pytest
 
 from graphify.mcp_ingest import (
     MCP_CONFIG_FILENAMES,
@@ -29,11 +29,7 @@ def _relations(result):
 
 
 def _label_by_kind(result, kind):
-    return [
-        n["label"]
-        for n in result["nodes"]
-        if n.get("metadata", {}).get("mcp_kind") == kind
-    ]
+    return [n["label"] for n in result["nodes"] if n.get("metadata", {}).get("mcp_kind") == kind]
 
 
 def _write(tmp_path: Path, name: str, payload) -> Path:
@@ -167,13 +163,21 @@ def test_every_edge_has_confidence_score():
 
 def test_same_command_collapses_to_one_node_across_configs(tmp_path):
     # Two configs both use "npx". The mcp_command node should be shared.
-    config_a = _write(tmp_path, ".mcp.json", {
-        "mcpServers": {"a": {"command": "npx", "args": ["@scope/server-a"]}},
-    })
+    config_a = _write(
+        tmp_path,
+        ".mcp.json",
+        {
+            "mcpServers": {"a": {"command": "npx", "args": ["@scope/server-a"]}},
+        },
+    )
     (tmp_path / "subdir").mkdir()
-    config_b = _write(tmp_path / "subdir", "claude_desktop_config.json", {
-        "mcpServers": {"b": {"command": "npx", "args": ["@scope/server-b"]}},
-    })
+    config_b = _write(
+        tmp_path / "subdir",
+        "claude_desktop_config.json",
+        {
+            "mcpServers": {"b": {"command": "npx", "args": ["@scope/server-b"]}},
+        },
+    )
     r_a = extract_mcp_config(config_a)
     r_b = extract_mcp_config(config_b)
     cmd_id_a = next(n["id"] for n in r_a["nodes"] if n["metadata"]["mcp_kind"] == "mcp_command")
@@ -183,17 +187,25 @@ def test_same_command_collapses_to_one_node_across_configs(tmp_path):
 
 def test_same_env_var_collapses_to_one_node_across_configs(tmp_path):
     # Two configs both require OPENAI_API_KEY. The env_var node ID must be identical.
-    a = _write(tmp_path, ".mcp.json", {
-        "mcpServers": {
-            "x": {"command": "npx", "args": ["@scope/x"], "env": {"OPENAI_API_KEY": "v1"}},
+    a = _write(
+        tmp_path,
+        ".mcp.json",
+        {
+            "mcpServers": {
+                "x": {"command": "npx", "args": ["@scope/x"], "env": {"OPENAI_API_KEY": "v1"}},
+            },
         },
-    })
+    )
     (tmp_path / "sub").mkdir()
-    b = _write(tmp_path / "sub", "claude_desktop_config.json", {
-        "mcpServers": {
-            "y": {"command": "uvx", "args": ["mcp-server-y"], "env": {"OPENAI_API_KEY": "v2"}},
+    b = _write(
+        tmp_path / "sub",
+        "claude_desktop_config.json",
+        {
+            "mcpServers": {
+                "y": {"command": "uvx", "args": ["mcp-server-y"], "env": {"OPENAI_API_KEY": "v2"}},
+            },
         },
-    })
+    )
     r_a = extract_mcp_config(a)
     r_b = extract_mcp_config(b)
     env_id_a = next(n["id"] for n in r_a["nodes"] if n["metadata"]["mcp_kind"] == "env_var")
@@ -206,12 +218,20 @@ def test_same_server_name_in_different_dirs_does_not_collide(tmp_path):
     # The server nodes should NOT collide (stem-scoped via parent dir).
     (tmp_path / "proj_a").mkdir()
     (tmp_path / "proj_b").mkdir()
-    a = _write(tmp_path / "proj_a", ".mcp.json", {
-        "mcpServers": {"filesystem": {"command": "npx", "args": ["@scope/a"]}},
-    })
-    b = _write(tmp_path / "proj_b", ".mcp.json", {
-        "mcpServers": {"filesystem": {"command": "npx", "args": ["@scope/b"]}},
-    })
+    a = _write(
+        tmp_path / "proj_a",
+        ".mcp.json",
+        {
+            "mcpServers": {"filesystem": {"command": "npx", "args": ["@scope/a"]}},
+        },
+    )
+    b = _write(
+        tmp_path / "proj_b",
+        ".mcp.json",
+        {
+            "mcpServers": {"filesystem": {"command": "npx", "args": ["@scope/b"]}},
+        },
+    )
     r_a = extract_mcp_config(a)
     r_b = extract_mcp_config(b)
     srv_a = next(n["id"] for n in r_a["nodes"] if n["metadata"]["mcp_kind"] == "mcp_server")
@@ -232,9 +252,13 @@ def test_missing_mcp_servers_key(tmp_path):
 
 def test_nested_mcp_servers_shape(tmp_path):
     # Some tools wrap the map: {"mcp": {"servers": {...}}}
-    p = _write(tmp_path, ".mcp.json", {
-        "mcp": {"servers": {"x": {"command": "node", "args": ["dist/index.js"]}}},
-    })
+    p = _write(
+        tmp_path,
+        ".mcp.json",
+        {
+            "mcp": {"servers": {"x": {"command": "node", "args": ["dist/index.js"]}}},
+        },
+    )
     r = extract_mcp_config(p)
     assert "error" not in r
     assert "x" in _label_by_kind(r, "mcp_server")
@@ -266,12 +290,16 @@ def test_root_not_an_object(tmp_path):
 
 
 def test_non_dict_server_entry_skipped(tmp_path):
-    p = _write(tmp_path, ".mcp.json", {
-        "mcpServers": {
-            "valid": {"command": "npx", "args": ["@scope/pkg"]},
-            "broken": ["this", "is", "not", "an", "object"],
+    p = _write(
+        tmp_path,
+        ".mcp.json",
+        {
+            "mcpServers": {
+                "valid": {"command": "npx", "args": ["@scope/pkg"]},
+                "broken": ["this", "is", "not", "an", "object"],
+            },
         },
-    })
+    )
     r = extract_mcp_config(p)
     server_labels = _label_by_kind(r, "mcp_server")
     assert "valid" in server_labels
@@ -283,26 +311,38 @@ def test_non_dict_server_entry_skipped(tmp_path):
 
 def test_package_detection_skips_flags(tmp_path):
     # First arg is -y (flag); second is the package. Detection should skip the flag.
-    p = _write(tmp_path, ".mcp.json", {
-        "mcpServers": {"x": {"command": "npx", "args": ["-y", "@scope/server-x"]}},
-    })
+    p = _write(
+        tmp_path,
+        ".mcp.json",
+        {
+            "mcpServers": {"x": {"command": "npx", "args": ["-y", "@scope/server-x"]}},
+        },
+    )
     r = extract_mcp_config(p)
     assert "@scope/server-x" in _label_by_kind(r, "mcp_package")
 
 
 def test_no_package_detected_for_unknown_arg_shape(tmp_path):
     # Args don't look like any known package pattern => no package node.
-    p = _write(tmp_path, ".mcp.json", {
-        "mcpServers": {"x": {"command": "node", "args": ["./local-script.js", "--verbose"]}},
-    })
+    p = _write(
+        tmp_path,
+        ".mcp.json",
+        {
+            "mcpServers": {"x": {"command": "node", "args": ["./local-script.js", "--verbose"]}},
+        },
+    )
     r = extract_mcp_config(p)
     assert _label_by_kind(r, "mcp_package") == []
 
 
 def test_server_without_command_still_emits_server_node(tmp_path):
-    p = _write(tmp_path, ".mcp.json", {
-        "mcpServers": {"x": {"args": ["@scope/server-x"]}},
-    })
+    p = _write(
+        tmp_path,
+        ".mcp.json",
+        {
+            "mcpServers": {"x": {"args": ["@scope/server-x"]}},
+        },
+    )
     r = extract_mcp_config(p)
     assert "x" in _label_by_kind(r, "mcp_server")
     assert _label_by_kind(r, "mcp_command") == []
@@ -316,9 +356,13 @@ def test_dispatch_routes_mcp_filename_to_mcp_extractor(tmp_path):
     # extract_mcp_config, NOT extract_json.
     from graphify.extract import _get_extractor
 
-    p = _write(tmp_path, ".mcp.json", {
-        "mcpServers": {"x": {"command": "npx", "args": ["@scope/server-x"]}},
-    })
+    p = _write(
+        tmp_path,
+        ".mcp.json",
+        {
+            "mcpServers": {"x": {"command": "npx", "args": ["@scope/server-x"]}},
+        },
+    )
     extractor = _get_extractor(p)
     assert extractor is extract_mcp_config
 
diff --git a/tests/test_multilang.py b/tests/test_multilang.py
index c30b9e10c..9496a9664 100644
--- a/tests/test_multilang.py
+++ b/tests/test_multilang.py
@@ -1,6 +1,6 @@
 """Tests for multi-language AST extraction: JS/TS, Go, Rust, SQL."""
+
 from __future__ import annotations
-import shutil
 from pathlib import Path
 import pytest
 from graphify.extract import extract_js, extract_go, extract_rust, extract, extract_sql
@@ -10,16 +10,20 @@
 
 # ── helpers ──────────────────────────────────────────────────────────────────
 
+
 def _labels(result):
     return [n["label"] for n in result["nodes"]]
 
+
 def _call_pairs(result):
     node_by_id = {n["id"]: n["label"] for n in result["nodes"]}
     return {
         (node_by_id.get(e["source"], e["source"]), node_by_id.get(e["target"], e["target"]))
-        for e in result["edges"] if e["relation"] == "calls"
+        for e in result["edges"]
+        if e["relation"] == "calls"
     }
 
+
 def _confidences(result):
     return {e["confidence"] for e in result["edges"]}
 
@@ -46,20 +50,24 @@ def _edge_labels(result, relation, context=None):
 
 # ── TypeScript ────────────────────────────────────────────────────────────────
 
+
 def test_ts_finds_class():
     r = extract_js(FIXTURES / "sample.ts")
     assert "error" not in r
     assert "HttpClient" in _labels(r)
 
+
 def test_ts_finds_methods():
     r = extract_js(FIXTURES / "sample.ts")
     labels = _labels(r)
-    assert any("get" in l for l in labels)
-    assert any("post" in l for l in labels)
+    assert any("get" in label for label in labels)
+    assert any("post" in label for label in labels)
+
 
 def test_ts_finds_function():
     r = extract_js(FIXTURES / "sample.ts")
-    assert any("buildHeaders" in l for l in _labels(r))
+    assert any("buildHeaders" in label for label in _labels(r))
+
 
 def test_ts_emits_calls():
     r = extract_js(FIXTURES / "sample.ts")
@@ -67,6 +75,7 @@ def test_ts_emits_calls():
     # .post() calls .get()
     assert any("post" in src and "get" in tgt for src, tgt in calls)
 
+
 def test_ts_calls_are_extracted():
     r = extract_js(FIXTURES / "sample.ts")
     for e in r["edges"]:
@@ -87,6 +96,7 @@ def test_ts_call_edges_have_call_context():
     assert call_edges
     assert all(e.get("context") == "call" for e in call_edges)
 
+
 def test_ts_no_dangling_edges():
     r = extract_js(FIXTURES / "sample.ts")
     node_ids = {n["id"] for n in r["nodes"]}
@@ -97,26 +107,31 @@ def test_ts_no_dangling_edges():
 
 # ── Go ────────────────────────────────────────────────────────────────────────
 
+
 def test_go_finds_struct():
     r = extract_go(FIXTURES / "sample.go")
     assert "error" not in r
     assert "Server" in _labels(r)
 
+
 def test_go_finds_methods():
     r = extract_go(FIXTURES / "sample.go")
     labels = _labels(r)
-    assert any("Start" in l for l in labels)
-    assert any("Stop" in l for l in labels)
+    assert any("Start" in label for label in labels)
+    assert any("Stop" in label for label in labels)
+
 
 def test_go_finds_constructor():
     r = extract_go(FIXTURES / "sample.go")
-    assert any("NewServer" in l for l in _labels(r))
+    assert any("NewServer" in label for label in _labels(r))
+
 
 def test_go_emits_calls():
     r = extract_go(FIXTURES / "sample.go")
     # main() calls NewServer and Start
     assert len(_call_pairs(r)) > 0
 
+
 def test_go_has_extracted_calls():
     r = extract_go(FIXTURES / "sample.go")
     assert "EXTRACTED" in _confidences(r)
@@ -135,6 +150,7 @@ def test_go_call_edges_have_call_context():
     assert call_edges
     assert all(e.get("context") == "call" for e in call_edges)
 
+
 def test_go_no_dangling_edges():
     r = extract_go(FIXTURES / "sample.go")
     node_ids = {n["id"] for n in r["nodes"]}
@@ -288,26 +304,31 @@ def _is_guarded(use: ast.AST) -> bool:
 
 # ── Rust ──────────────────────────────────────────────────────────────────────
 
+
 def test_rust_finds_struct():
     r = extract_rust(FIXTURES / "sample.rs")
     assert "error" not in r
     assert "Graph" in _labels(r)
 
+
 def test_rust_finds_impl_methods():
     r = extract_rust(FIXTURES / "sample.rs")
     labels = _labels(r)
-    assert any("add_node" in l for l in labels)
-    assert any("add_edge" in l for l in labels)
+    assert any("add_node" in label for label in labels)
+    assert any("add_edge" in label for label in labels)
+
 
 def test_rust_finds_function():
     r = extract_rust(FIXTURES / "sample.rs")
-    assert any("build_graph" in l for l in _labels(r))
+    assert any("build_graph" in label for label in _labels(r))
+
 
 def test_rust_emits_calls():
     r = extract_rust(FIXTURES / "sample.rs")
     calls = _call_pairs(r)
     assert any("build_graph" in src for src, _ in calls)
 
+
 def test_rust_calls_are_extracted():
     r = extract_rust(FIXTURES / "sample.rs")
     for e in r["edges"]:
@@ -328,6 +349,7 @@ def test_rust_call_edges_have_call_context():
     assert call_edges
     assert all(e.get("context") == "call" for e in call_edges)
 
+
 def test_rust_no_dangling_edges():
     r = extract_rust(FIXTURES / "sample.rs")
     node_ids = {n["id"] for n in r["nodes"]}
@@ -363,6 +385,7 @@ def test_rust_no_cross_crate_spurious_edges():
     """Scoped calls (Type::method) and blocklisted names must not produce
     INFERRED cross-crate calls edges (#908)."""
     from graphify.extract import extract
+
     crate_a = FIXTURES / "crate_a" / "src" / "lib.rs"
     crate_b = FIXTURES / "crate_b" / "src" / "lib.rs"
     r = extract([crate_a, crate_b])
@@ -370,18 +393,16 @@ def test_rust_no_cross_crate_spurious_edges():
     node_ids_b = {n["id"] for n in r["nodes"] if "crate_b" in (n.get("source_file") or "")}
     # No calls edge should cross from crate_b into crate_a
     cross_crate_calls = [
-        e for e in r["edges"]
-        if e["relation"] == "calls"
-        and e["source"] in node_ids_b
-        and e["target"] in node_ids_a
+        e
+        for e in r["edges"]
+        if e["relation"] == "calls" and e["source"] in node_ids_b and e["target"] in node_ids_a
     ]
-    assert cross_crate_calls == [], (
-        f"Spurious cross-crate edges: {cross_crate_calls}"
-    )
+    assert cross_crate_calls == [], f"Spurious cross-crate edges: {cross_crate_calls}"
 
 
 # ── extract() dispatch ────────────────────────────────────────────────────────
 
+
 def test_extract_dispatches_all_languages():
     files = [
         FIXTURES / "sample.py",
@@ -400,6 +421,7 @@ def test_extract_dispatches_all_languages():
 
 # ── Cache ─────────────────────────────────────────────────────────────────────
 
+
 def test_cache_hit_returns_same_result(tmp_path):
     src = FIXTURES / "sample.py"
     dst = tmp_path / "sample.py"
@@ -410,20 +432,22 @@ def test_cache_hit_returns_same_result(tmp_path):
     assert len(r1["nodes"]) == len(r2["nodes"])
     assert len(r1["edges"]) == len(r2["edges"])
 
+
 def test_cache_miss_after_file_change(tmp_path):
     dst = tmp_path / "a.py"
     dst.write_text("def foo(): pass\n")
-    r1 = extract([dst])
+    extract([dst])
 
     dst.write_text("def foo(): pass\ndef bar(): pass\n")
     r2 = extract([dst])
     # bar() should appear in the second result
     labels2 = [n["label"] for n in r2["nodes"]]
-    assert any("bar" in l for l in labels2)
+    assert any("bar" in label for label in labels2)
 
 
 # ── SQL ───────────────────────────────────────────────────────────────────────
 
+
 def _extract_sql_or_skip(fixture: str = "sample.sql"):
     pytest.importorskip("tree_sitter_sql")
     return extract_sql(FIXTURES / fixture)
@@ -432,35 +456,41 @@ def _extract_sql_or_skip(fixture: str = "sample.sql"):
 def test_sql_finds_tables():
     r = _extract_sql_or_skip()
     labels = [n["label"] for n in r["nodes"]]
-    assert any("users" in l for l in labels)
-    assert any("organizations" in l for l in labels)
+    assert any("users" in label for label in labels)
+    assert any("organizations" in label for label in labels)
+
 
 def test_sql_finds_view():
     r = _extract_sql_or_skip()
     labels = [n["label"] for n in r["nodes"]]
-    assert any("active_users" in l for l in labels)
+    assert any("active_users" in label for label in labels)
+
 
 def test_sql_finds_function():
     r = _extract_sql_or_skip()
     labels = [n["label"] for n in r["nodes"]]
-    assert any("get_user" in l for l in labels)
+    assert any("get_user" in label for label in labels)
+
 
 def test_sql_emits_foreign_key_edge():
     r = _extract_sql_or_skip()
     relations = {e["relation"] for e in r["edges"]}
     assert "references" in relations
 
+
 def test_sql_emits_reads_from_edge():
     r = _extract_sql_or_skip()
     relations = {e["relation"] for e in r["edges"]}
     assert "reads_from" in relations
 
+
 def test_sql_no_dangling_edges():
     r = _extract_sql_or_skip()
     node_ids = {n["id"] for n in r["nodes"]}
     for e in r["edges"]:
         assert e["source"] in node_ids, f"dangling source: {e['source']}"
 
+
 def test_sql_alter_table_fk_edge():
     """ALTER TABLE ... FOREIGN KEY ... REFERENCES produces a references edge."""
     r = _extract_sql_or_skip("sample_alter_fk.sql")
@@ -471,12 +501,14 @@ def test_sql_alter_table_fk_edge():
         assert e["source"] in node_ids, f"dangling source: {e['source']}"
         assert e["target"] in node_ids, f"dangling target: {e['target']}"
 
+
 def test_sql_schema_qualified_names():
     """Schema-qualified table names (Schema.Table) are preserved."""
     r = _extract_sql_or_skip("sample_schema_qualified.sql")
     labels = [n["label"] for n in r["nodes"]]
-    assert any("Sales.Customer" in l for l in labels)
-    assert any("Sales.SalesOrder" in l for l in labels)
+    assert any("Sales.Customer" in label for label in labels)
+    assert any("Sales.SalesOrder" in label for label in labels)
+
 
 def test_sql_schema_qualified_alter_fk():
     """ALTER TABLE with schema-qualified names produces correct edges."""
diff --git a/tests/test_ollama.py b/tests/test_ollama.py
index a7af29a64..7336dd5fe 100644
--- a/tests/test_ollama.py
+++ b/tests/test_ollama.py
@@ -1,4 +1,5 @@
 """Tests for the Ollama backend additions in graphify/llm.py."""
+
 from __future__ import annotations
 
 from graphify.llm import detect_backend, BACKENDS
@@ -60,6 +61,7 @@ def test_ollama_api_key_sentinel(monkeypatch):
     }
     with patch("graphify.llm._call_openai_compat", return_value=fake_result) as mock_call:
         from graphify.llm import extract_files_direct
+
         with tempfile.NamedTemporaryFile(suffix=".py", mode="w", delete=False) as f:
             f.write("x = 1\n")
             tmp = Path(f.name)
@@ -68,7 +70,9 @@ def test_ollama_api_key_sentinel(monkeypatch):
             # Should have called _call_openai_compat with api_key="ollama"
             assert mock_call.called
             call_kwargs = mock_call.call_args
-            api_key_used = call_kwargs.args[1] if call_kwargs.args else call_kwargs.kwargs.get("api_key", "")
+            api_key_used = (
+                call_kwargs.args[1] if call_kwargs.args else call_kwargs.kwargs.get("api_key", "")
+            )
             assert api_key_used == "ollama"
         finally:
             tmp.unlink(missing_ok=True)
diff --git a/tests/test_pascal.py b/tests/test_pascal.py
index 36c1b8747..50b0c7412 100644
--- a/tests/test_pascal.py
+++ b/tests/test_pascal.py
@@ -1,4 +1,5 @@
 """Tests for the Pascal/Delphi extractor."""
+
 from __future__ import annotations
 from pathlib import Path
 
@@ -19,48 +20,55 @@ def _edges_with_relation(r, *relations):
 
 def test_pascal_no_error():
     from graphify.extract import extract_pascal
+
     r = extract_pascal(FIXTURES / "sample.pas")
     assert "error" not in r
 
 
 def test_pascal_finds_unit():
     from graphify.extract import extract_pascal
+
     r = extract_pascal(FIXTURES / "sample.pas")
-    assert any("SampleUnit" in l for l in _labels(r))
+    assert any("SampleUnit" in label for label in _labels(r))
 
 
 def test_pascal_finds_classes():
     from graphify.extract import extract_pascal
+
     r = extract_pascal(FIXTURES / "sample.pas")
     labels = _labels(r)
-    assert any("TBaseProcessor" in l for l in labels)
-    assert any("TDataProcessor" in l for l in labels)
+    assert any("TBaseProcessor" in label for label in labels)
+    assert any("TDataProcessor" in label for label in labels)
 
 
 def test_pascal_finds_interface():
     from graphify.extract import extract_pascal
+
     r = extract_pascal(FIXTURES / "sample.pas")
-    assert any("IProcessor" in l for l in _labels(r))
+    assert any("IProcessor" in label for label in _labels(r))
 
 
 def test_pascal_finds_methods():
     from graphify.extract import extract_pascal
+
     r = extract_pascal(FIXTURES / "sample.pas")
     labels = _labels(r)
-    assert any("Process" in l for l in labels)
-    assert any("Initialize" in l for l in labels)
-    assert any("GetCount" in l for l in labels)
-    assert any("Reset" in l for l in labels)
+    assert any("Process" in label for label in labels)
+    assert any("Initialize" in label for label in labels)
+    assert any("GetCount" in label for label in labels)
+    assert any("Reset" in label for label in labels)
 
 
 def test_pascal_finds_imports():
     from graphify.extract import extract_pascal
+
     r = extract_pascal(FIXTURES / "sample.pas")
     assert "imports" in _relations(r)
 
 
 def test_pascal_import_edges_have_import_context():
     from graphify.extract import extract_pascal
+
     r = extract_pascal(FIXTURES / "sample.pas")
     import_edges = _edges_with_relation(r, "imports")
     assert import_edges
@@ -69,30 +77,31 @@ def test_pascal_import_edges_have_import_context():
 
 def test_pascal_finds_inherits():
     from graphify.extract import extract_pascal
+
     r = extract_pascal(FIXTURES / "sample.pas")
     assert "inherits" in _relations(r)
 
 
 def test_pascal_inherits_from_base():
     from graphify.extract import extract_pascal
+
     r = extract_pascal(FIXTURES / "sample.pas")
     node_by_id = {n["id"]: n["label"] for n in r["nodes"]}
     inherits = [e for e in r["edges"] if e["relation"] == "inherits"]
-    found = any(
-        "TDataProcessor" in node_by_id.get(e["source"], "")
-        for e in inherits
-    )
+    found = any("TDataProcessor" in node_by_id.get(e["source"], "") for e in inherits)
     assert found, "TDataProcessor should have at least one inherits edge"
 
 
 def test_pascal_finds_calls():
     from graphify.extract import extract_pascal
+
     r = extract_pascal(FIXTURES / "sample.pas")
     assert "calls" in _relations(r)
 
 
 def test_pascal_call_edges_have_call_context():
     from graphify.extract import extract_pascal
+
     r = extract_pascal(FIXTURES / "sample.pas")
     call_edges = _edges_with_relation(r, "calls")
     assert call_edges
@@ -101,6 +110,7 @@ def test_pascal_call_edges_have_call_context():
 
 def test_pascal_all_edges_extracted():
     from graphify.extract import extract_pascal
+
     r = extract_pascal(FIXTURES / "sample.pas")
     structural = {"contains", "method", "inherits", "imports"}
     for e in r["edges"]:
@@ -110,6 +120,7 @@ def test_pascal_all_edges_extracted():
 
 def test_pascal_no_dangling_edges():
     from graphify.extract import extract_pascal
+
     r = extract_pascal(FIXTURES / "sample.pas")
     node_ids = {n["id"] for n in r["nodes"]}
     # imports edges are cross-file by design; only check within-file edge targets
@@ -122,6 +133,7 @@ def test_pascal_no_dangling_edges():
 
 def test_pascal_dispatch_registered():
     from graphify.extract import _DISPATCH
+
     assert ".pas" in _DISPATCH
     assert ".pp" in _DISPATCH
     assert ".dpr" in _DISPATCH
@@ -134,6 +146,7 @@ def test_pascal_dispatch_registered():
 
 def test_pascal_detect_extensions_registered():
     from graphify.detect import CODE_EXTENSIONS
+
     assert ".pas" in CODE_EXTENSIONS
     assert ".pp" in CODE_EXTENSIONS
     assert ".dpr" in CODE_EXTENSIONS
@@ -144,38 +157,44 @@ def test_pascal_detect_extensions_registered():
 
 # ── Lazarus Form (.lfm) ───────────────────────────────────────────────────────
 
+
 def test_lfm_no_error():
     from graphify.extract import extract_lazarus_form
+
     r = extract_lazarus_form(FIXTURES / "sample.lfm")
     assert "error" not in r
 
 
 def test_lfm_finds_root_form_class():
     from graphify.extract import extract_lazarus_form
+
     r = extract_lazarus_form(FIXTURES / "sample.lfm")
-    assert any("TSampleForm" in l for l in _labels(r))
+    assert any("TSampleForm" in label for label in _labels(r))
 
 
 def test_lfm_finds_component_classes():
     from graphify.extract import extract_lazarus_form
+
     r = extract_lazarus_form(FIXTURES / "sample.lfm")
     labels = _labels(r)
-    assert any("TPanel" in l for l in labels)
-    assert any("TButton" in l for l in labels)
-    assert any("TLabel" in l for l in labels)
-    assert any("TTimer" in l for l in labels)
+    assert any("TPanel" in label for label in labels)
+    assert any("TButton" in label for label in labels)
+    assert any("TLabel" in label for label in labels)
+    assert any("TTimer" in label for label in labels)
 
 
 def test_lfm_finds_event_handlers():
     from graphify.extract import extract_lazarus_form
+
     r = extract_lazarus_form(FIXTURES / "sample.lfm")
     labels = _labels(r)
-    assert any("ButtonOKClick" in l for l in labels)
-    assert any("TimerRefreshTimer" in l for l in labels)
+    assert any("ButtonOKClick" in label for label in labels)
+    assert any("TimerRefreshTimer" in label for label in labels)
 
 
 def test_lfm_event_edges_have_event_context():
     from graphify.extract import extract_lazarus_form
+
     r = extract_lazarus_form(FIXTURES / "sample.lfm")
     ref_edges = [e for e in r["edges"] if e["relation"] == "references"]
     assert ref_edges
@@ -184,12 +203,14 @@ def test_lfm_event_edges_have_event_context():
 
 def test_lfm_contains_edges_form_hierarchy():
     from graphify.extract import extract_lazarus_form
+
     r = extract_lazarus_form(FIXTURES / "sample.lfm")
     assert "contains" in _relations(r)
 
 
 def test_lfm_no_dangling_edges():
     from graphify.extract import extract_lazarus_form
+
     r = extract_lazarus_form(FIXTURES / "sample.lfm")
     node_ids = {n["id"] for n in r["nodes"]}
     for e in r["edges"]:
@@ -198,28 +219,33 @@ def test_lfm_no_dangling_edges():
 
 # ── Lazarus Package (.lpk) ───────────────────────────────────────────────────
 
+
 def test_lpk_no_error():
     from graphify.extract import extract_lazarus_package
+
     r = extract_lazarus_package(FIXTURES / "sample.lpk")
     assert "error" not in r
 
 
 def test_lpk_finds_package_name():
     from graphify.extract import extract_lazarus_package
+
     r = extract_lazarus_package(FIXTURES / "sample.lpk")
-    assert any("SamplePackage" in l for l in _labels(r))
+    assert any("SamplePackage" in label for label in _labels(r))
 
 
 def test_lpk_finds_required_packages():
     from graphify.extract import extract_lazarus_package
+
     r = extract_lazarus_package(FIXTURES / "sample.lpk")
     labels = _labels(r)
-    assert any("FCL" in l for l in labels)
-    assert any("LCL" in l for l in labels)
+    assert any("FCL" in label for label in labels)
+    assert any("LCL" in label for label in labels)
 
 
 def test_lpk_imports_edges_have_import_context():
     from graphify.extract import extract_lazarus_package
+
     r = extract_lazarus_package(FIXTURES / "sample.lpk")
     import_edges = _edges_with_relation(r, "imports")
     assert import_edges
@@ -228,14 +254,16 @@ def test_lpk_imports_edges_have_import_context():
 
 def test_lpk_contains_listed_units():
     from graphify.extract import extract_lazarus_package
+
     r = extract_lazarus_package(FIXTURES / "sample.lpk")
     labels = _labels(r)
-    assert any("sample" in l.lower() for l in labels)
-    assert any("sampleutils" in l.lower() for l in labels)
+    assert any("sample" in label.lower() for label in labels)
+    assert any("sampleutils" in label.lower() for label in labels)
 
 
 def test_lpk_no_dangling_edges():
     from graphify.extract import extract_lazarus_package
+
     r = extract_lazarus_package(FIXTURES / "sample.lpk")
     node_ids = {n["id"] for n in r["nodes"]}
     for e in r["edges"]:
@@ -244,38 +272,44 @@ def test_lpk_no_dangling_edges():
 
 # ── Delphi Form (.dfm) ───────────────────────────────────────────────────────
 
+
 def test_dfm_no_error():
     from graphify.extract import extract_delphi_form
+
     r = extract_delphi_form(FIXTURES / "sample.dfm")
     assert "error" not in r
 
 
 def test_dfm_finds_root_form_class():
     from graphify.extract import extract_delphi_form
+
     r = extract_delphi_form(FIXTURES / "sample.dfm")
-    assert any("TMainForm" in l for l in _labels(r))
+    assert any("TMainForm" in label for label in _labels(r))
 
 
 def test_dfm_finds_component_classes():
     from graphify.extract import extract_delphi_form
+
     r = extract_delphi_form(FIXTURES / "sample.dfm")
     labels = _labels(r)
-    assert any("TPanel" in l for l in labels)
-    assert any("TButton" in l for l in labels)
-    assert any("TMemo" in l for l in labels)
-    assert any("TStatusBar" in l for l in labels)
+    assert any("TPanel" in label for label in labels)
+    assert any("TButton" in label for label in labels)
+    assert any("TMemo" in label for label in labels)
+    assert any("TStatusBar" in label for label in labels)
 
 
 def test_dfm_finds_event_handlers():
     from graphify.extract import extract_delphi_form
+
     r = extract_delphi_form(FIXTURES / "sample.dfm")
     labels = _labels(r)
-    assert any("FormCreate" in l for l in labels)
-    assert any("ButtonOKClick" in l for l in labels)
+    assert any("FormCreate" in label for label in labels)
+    assert any("ButtonOKClick" in label for label in labels)
 
 
 def test_dfm_event_edges_have_event_context():
     from graphify.extract import extract_delphi_form
+
     r = extract_delphi_form(FIXTURES / "sample.dfm")
     ref_edges = [e for e in r["edges"] if e["relation"] == "references"]
     assert ref_edges
@@ -284,12 +318,14 @@ def test_dfm_event_edges_have_event_context():
 
 def test_dfm_contains_edges_form_hierarchy():
     from graphify.extract import extract_delphi_form
+
     r = extract_delphi_form(FIXTURES / "sample.dfm")
     assert "contains" in _relations(r)
 
 
 def test_dfm_no_dangling_edges():
     from graphify.extract import extract_delphi_form
+
     r = extract_delphi_form(FIXTURES / "sample.dfm")
     node_ids = {n["id"] for n in r["nodes"]}
     for e in r["edges"]:
@@ -298,7 +334,9 @@ def test_dfm_no_dangling_edges():
 
 def test_dfm_binary_returns_empty_not_crash():
     from graphify.extract import extract_delphi_form
-    import tempfile, pathlib
+    import tempfile
+    import pathlib
+
     # Write a fake binary DFM (FF 0A magic header)
     with tempfile.NamedTemporaryFile(suffix=".dfm", delete=False) as f:
         f.write(b"\xff\x0a\x00\x00some binary data")
@@ -314,9 +352,11 @@ def test_dfm_binary_returns_empty_not_crash():
 
 def test_dfm_dispatch_registered():
     from graphify.extract import _DISPATCH
+
     assert ".dfm" in _DISPATCH
 
 
 def test_dfm_detect_extension_registered():
     from graphify.detect import CODE_EXTENSIONS
+
     assert ".dfm" in CODE_EXTENSIONS
diff --git a/tests/test_path_cli.py b/tests/test_path_cli.py
index de7e8837f..57ae3ebdd 100644
--- a/tests/test_path_cli.py
+++ b/tests/test_path_cli.py
@@ -1,23 +1,36 @@
 """Regression tests for `graphify path` arrow direction (#849)."""
+
 from __future__ import annotations
 import json
-import networkx as nx
-from networkx.readwrite import json_graph
 import graphify.__main__ as mainmod
 
 
 def _write_graph(tmp_path):
     graph_data = {
-        "directed": False, "multigraph": False, "graph": {},
+        "directed": False,
+        "multigraph": False,
+        "graph": {},
         "nodes": [
-            {"id": "create_patch", "label": "createPatchHandler()",
-             "source_file": "server/create-patch-handler.ts", "community": 0},
-            {"id": "validate", "label": "validateSanitySession()",
-             "source_file": "server/sanity-validate-session.ts", "community": 0},
+            {
+                "id": "create_patch",
+                "label": "createPatchHandler()",
+                "source_file": "server/create-patch-handler.ts",
+                "community": 0,
+            },
+            {
+                "id": "validate",
+                "label": "validateSanitySession()",
+                "source_file": "server/sanity-validate-session.ts",
+                "community": 0,
+            },
         ],
         "links": [
-            {"source": "create_patch", "target": "validate",
-             "relation": "calls", "confidence": "EXTRACTED"},
+            {
+                "source": "create_patch",
+                "target": "validate",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+            },
         ],
     }
     p = tmp_path / "graph.json"
@@ -27,8 +40,9 @@ def _write_graph(tmp_path):
 
 def _run(monkeypatch, graph_path, src, tgt, capsys):
     monkeypatch.setattr(mainmod, "_check_skill_version", lambda _: None)
-    monkeypatch.setattr(mainmod.sys, "argv",
-        ["graphify", "path", src, tgt, "--graph", str(graph_path)])
+    monkeypatch.setattr(
+        mainmod.sys, "argv", ["graphify", "path", src, tgt, "--graph", str(graph_path)]
+    )
     mainmod.main()
     return capsys.readouterr().out
 
diff --git a/tests/test_pipeline.py b/tests/test_pipeline.py
index ce6055d8b..de977ad91 100644
--- a/tests/test_pipeline.py
+++ b/tests/test_pipeline.py
@@ -3,14 +3,13 @@
 Uses the existing test fixtures (code + markdown). No LLM calls - AST extraction only.
 Catches regressions in how modules connect, not just individual module behaviour.
 """
+
 import json
-import tempfile
 from pathlib import Path
 
-import pytest
 
 from graphify.detect import detect
-from graphify.extract import collect_files, extract
+from graphify.extract import extract
 from graphify.build import build_from_json
 from graphify.cluster import cluster, score_all
 from graphify.analyze import god_nodes, surprising_connections, suggest_questions
@@ -62,7 +61,18 @@ def run_pipeline(tmp_path: Path) -> dict:
 
     # Step 6: report
     tokens = {"input": 0, "output": 0}
-    report = generate(G, communities, cohesion, labels, gods, surprises, detection, tokens, str(FIXTURES), suggested_questions=questions)
+    report = generate(
+        G,
+        communities,
+        cohesion,
+        labels,
+        gods,
+        surprises,
+        detection,
+        tokens,
+        str(FIXTURES),
+        suggested_questions=questions,
+    )
     assert "God Nodes" in report
     assert "Communities" in report
     assert len(report) > 100
@@ -85,7 +95,9 @@ def run_pipeline(tmp_path: Path) -> dict:
 
     # Step 9: export - Obsidian vault
     vault_path = tmp_path / "obsidian"
-    n_notes = to_obsidian(G, communities, str(vault_path), community_labels=labels, cohesion=cohesion)
+    n_notes = to_obsidian(
+        G, communities, str(vault_path), community_labels=labels, cohesion=cohesion
+    )
     assert n_notes > 0
     assert (vault_path / ".obsidian" / "graph.json").exists()
     md_files = list(vault_path.glob("*.md"))
diff --git a/tests/test_prs.py b/tests/test_prs.py
index 61ebc140e..1f7ef1fba 100644
--- a/tests/test_prs.py
+++ b/tests/test_prs.py
@@ -1,4 +1,5 @@
 """Tests for graphify/prs.py."""
+
 from __future__ import annotations
 
 import subprocess
@@ -6,7 +7,6 @@
 from unittest.mock import patch, MagicMock
 
 import networkx as nx
-import pytest
 
 from graphify.prs import (
     PRInfo,
@@ -23,6 +23,7 @@
 
 # ── Helpers ───────────────────────────────────────────────────────────────────
 
+
 def make_pr(
     number: int = 1,
     title: str = "Test PR",
@@ -54,6 +55,7 @@ def make_pr(
 
 # ── _classify ─────────────────────────────────────────────────────────────────
 
+
 class TestClassify:
     def test_ready(self):
         pr = make_pr(ci_status="SUCCESS", review_decision="", is_draft=False)
@@ -94,6 +96,7 @@ def test_wrong_base(self):
 
 # ── _parse_ci ─────────────────────────────────────────────────────────────────
 
+
 class TestParseCi:
     def test_empty_rollup_returns_none(self):
         assert _parse_ci([]) == "NONE"
@@ -128,6 +131,7 @@ def test_mixed_success_and_failure_is_failure(self):
 
 # ── _path_match ───────────────────────────────────────────────────────────────
 
+
 class TestPathMatch:
     def test_exact_match(self):
         assert _path_match("src/auth/api.py", "src/auth/api.py") is True
@@ -150,6 +154,7 @@ def test_both_directions_work(self):
 
 # ── compute_pr_impact ─────────────────────────────────────────────────────────
 
+
 class TestComputePrImpact:
     def _make_graph(self) -> nx.Graph:
         """3 nodes across 2 communities, 2 distinct source files."""
@@ -167,9 +172,7 @@ def test_matching_files_returns_correct_communities_and_count(self):
 
     def test_matching_both_files(self):
         G = self._make_graph()
-        comms, nodes = compute_pr_impact(
-            ["src/auth/api.py", "src/utils/helpers.py"], G
-        )
+        comms, nodes = compute_pr_impact(["src/auth/api.py", "src/utils/helpers.py"], G)
         assert comms == [0, 1]
         assert nodes == 3
 
@@ -208,6 +211,7 @@ def test_no_double_counting_same_graph_file_matched_by_two_pr_files(self):
 
 # ── fetch_worktrees ───────────────────────────────────────────────────────────
 
+
 class TestFetchWorktrees:
     def test_normal_case_maps_branch_to_path(self):
         porcelain = (
@@ -279,6 +283,7 @@ def test_subprocess_failure_returns_empty_dict(self):
 
 # ── format_prs_text ───────────────────────────────────────────────────────────
 
+
 class TestFormatPrsText:
     def test_contains_pr_metadata_and_count_header(self):
         prs = [
@@ -330,6 +335,7 @@ def test_empty_pr_list(self):
 
 # ── _detect_default_branch ────────────────────────────────────────────────────
 
+
 class TestDetectDefaultBranch:
     def test_gh_returns_main(self):
         with patch(
@@ -342,8 +348,9 @@ def test_falls_back_to_git_symbolic_ref(self):
         mock_result = MagicMock()
         mock_result.returncode = 0
         mock_result.stdout = "refs/remotes/origin/develop\n"
-        with patch("graphify.prs._gh", return_value=None), patch(
-            "graphify.prs.subprocess.run", return_value=mock_result
+        with (
+            patch("graphify.prs._gh", return_value=None),
+            patch("graphify.prs.subprocess.run", return_value=mock_result),
         ):
             assert _detect_default_branch() == "develop"
 
@@ -351,8 +358,9 @@ def test_both_fail_returns_main(self):
         mock_result = MagicMock()
         mock_result.returncode = 1
         mock_result.stdout = ""
-        with patch("graphify.prs._gh", return_value=None), patch(
-            "graphify.prs.subprocess.run", return_value=mock_result
+        with (
+            patch("graphify.prs._gh", return_value=None),
+            patch("graphify.prs.subprocess.run", return_value=mock_result),
         ):
             assert _detect_default_branch() == "main"
 
@@ -361,27 +369,32 @@ def test_gh_returns_empty_dict_falls_back(self):
         mock_result = MagicMock()
         mock_result.returncode = 0
         mock_result.stdout = "refs/remotes/origin/trunk\n"
-        with patch("graphify.prs._gh", return_value={}), patch(
-            "graphify.prs.subprocess.run", return_value=mock_result
+        with (
+            patch("graphify.prs._gh", return_value={}),
+            patch("graphify.prs.subprocess.run", return_value=mock_result),
         ):
             assert _detect_default_branch() == "trunk"
 
     def test_git_timeout_returns_main(self):
-        with patch("graphify.prs._gh", return_value=None), patch(
-            "graphify.prs.subprocess.run",
-            side_effect=subprocess.TimeoutExpired("git", 5),
+        with (
+            patch("graphify.prs._gh", return_value=None),
+            patch(
+                "graphify.prs.subprocess.run",
+                side_effect=subprocess.TimeoutExpired("git", 5),
+            ),
         ):
             assert _detect_default_branch() == "main"
 
 
 # ── build_community_labels ─────────────────────────────────────────────────────
 
+
 class TestBuildCommunityLabels:
     def test_basic_grouping(self):
         data = {
             "nodes": [
                 {"id": "a", "label": "Alpha", "community": 0},
-                {"id": "b", "label": "Beta",  "community": 0},
+                {"id": "b", "label": "Beta", "community": 0},
                 {"id": "c", "label": "Gamma", "community": 1},
             ]
         }
diff --git a/tests/test_python_import_resolution.py b/tests/test_python_import_resolution.py
index 2a517aaea..fb333eac4 100644
--- a/tests/test_python_import_resolution.py
+++ b/tests/test_python_import_resolution.py
@@ -23,9 +23,7 @@ def _node_id(result: dict, label: str, source_file: str) -> str:
 
 def _has_edge(result: dict, source: str, target: str, relation: str) -> bool:
     return any(
-        edge["source"] == source
-        and edge["target"] == target
-        and edge["relation"] == relation
+        edge["source"] == source and edge["target"] == target and edge["relation"] == relation
         for edge in result["edges"]
     )
 
@@ -35,9 +33,7 @@ def test_python_package_reexport_resolves_import_and_call_to_origin_symbol(tmp_p
     barrel = _write(tmp_path / "pkg/__init__.py", "from .foo import Foo as PublicFoo\n")
     consumer = _write(
         tmp_path / "app.py",
-        "from pkg import PublicFoo\n\n"
-        "def X():\n"
-        "    return PublicFoo()\n",
+        "from pkg import PublicFoo\n\ndef X():\n    return PublicFoo()\n",
     )
 
     result = extract([origin, barrel, consumer], cache_root=tmp_path)
@@ -57,10 +53,7 @@ def test_python_parameter_return_and_generic_contexts(tmp_path: Path):
     model = tmp_path / "pkg" / "model.py"
     model.parent.mkdir(parents=True)
     model.write_text(
-        "class Payload:\n"
-        "    pass\n\n"
-        "class Result:\n"
-        "    pass\n",
+        "class Payload:\n    pass\n\nclass Result:\n    pass\n",
         encoding="utf-8",
     )
     service = tmp_path / "pkg" / "service.py"
@@ -77,7 +70,11 @@ def test_python_parameter_return_and_generic_contexts(tmp_path: Path):
     labels = {node["id"]: node["label"] for node in result["nodes"]}
     edges = [edge for edge in result["edges"] if edge.get("relation") == "references"]
     pairs = {
-        (labels.get(e["source"], e["source"]), labels.get(e["target"], e["target"]), e.get("context"))
+        (
+            labels.get(e["source"], e["source"]),
+            labels.get(e["target"], e["target"]),
+            e.get("context"),
+        )
         for e in edges
     }
 
diff --git a/tests/test_query_cli.py b/tests/test_query_cli.py
index cf8eb6e56..ef3ebbe46 100644
--- a/tests/test_query_cli.py
+++ b/tests/test_query_cli.py
@@ -1,4 +1,5 @@
 """Tests for graphify query CLI context filtering."""
+
 from __future__ import annotations
 
 import json
diff --git a/tests/test_rationale.py b/tests/test_rationale.py
index b52aa3909..8ab29d157 100644
--- a/tests/test_rationale.py
+++ b/tests/test_rationale.py
@@ -1,7 +1,7 @@
 """Tests for rationale/docstring extraction in extract.py."""
+
 import textwrap
 from pathlib import Path
-import pytest
 from graphify.extract import extract_python
 from graphify.build import build_from_json
 
@@ -13,10 +13,13 @@ def _write_py(tmp_path: Path, code: str) -> Path:
 
 
 def test_module_docstring_extracted(tmp_path):
-    path = _write_py(tmp_path, '''
+    path = _write_py(
+        tmp_path,
+        '''
         """This module handles authentication because legacy sessions were insecure."""
         def login(): pass
-    ''')
+    ''',
+    )
     result = extract_python(path)
     rationale = [n for n in result["nodes"] if n.get("file_type") == "rationale"]
     assert len(rationale) >= 1
@@ -24,45 +27,57 @@ def login(): pass
 
 
 def test_function_docstring_extracted(tmp_path):
-    path = _write_py(tmp_path, '''
+    path = _write_py(
+        tmp_path,
+        '''
         def process():
             """We use chunked processing here because the full dataset exceeds RAM."""
             pass
-    ''')
+    ''',
+    )
     result = extract_python(path)
     rationale = [n for n in result["nodes"] if n.get("file_type") == "rationale"]
     assert any("chunked" in n["label"] for n in rationale)
 
 
 def test_class_docstring_extracted(tmp_path):
-    path = _write_py(tmp_path, '''
+    path = _write_py(
+        tmp_path,
+        '''
         class Cache:
             """Chosen over Redis because we need zero external dependencies in the test env."""
             pass
-    ''')
+    ''',
+    )
     result = extract_python(path)
     rationale = [n for n in result["nodes"] if n.get("file_type") == "rationale"]
     assert any("Redis" in n["label"] for n in rationale)
 
 
 def test_rationale_comment_extracted(tmp_path):
-    path = _write_py(tmp_path, '''
+    path = _write_py(
+        tmp_path,
+        """
         def build():
             # NOTE: must run before compile() or linker will fail
             pass
-    ''')
+    """,
+    )
     result = extract_python(path)
     rationale = [n for n in result["nodes"] if n.get("file_type") == "rationale"]
     assert any("NOTE" in n["label"] for n in rationale)
 
 
 def test_rationale_for_edges_present(tmp_path):
-    path = _write_py(tmp_path, '''
+    path = _write_py(
+        tmp_path,
+        '''
         """Module docstring explaining the why."""
         def foo():
             """Function docstring with rationale."""
             pass
-    ''')
+    ''',
+    )
     result = extract_python(path)
     rationale_edges = [e for e in result["edges"] if e.get("relation") == "rationale_for"]
     assert len(rationale_edges) >= 1
@@ -70,28 +85,36 @@ def foo():
 
 def test_short_docstring_ignored(tmp_path):
     """Trivial docstrings under 20 chars should not become rationale nodes."""
-    path = _write_py(tmp_path, '''
+    path = _write_py(
+        tmp_path,
+        '''
         def foo():
             """Constructor."""
             pass
-    ''')
+    ''',
+    )
     result = extract_python(path)
     rationale = [n for n in result["nodes"] if n.get("file_type") == "rationale"]
     assert len(rationale) == 0
 
 
 def test_rationale_confidence_is_extracted(tmp_path):
-    path = _write_py(tmp_path, '''
+    path = _write_py(
+        tmp_path,
+        '''
         """This module exists because we needed a standalone parser."""
         def parse(): pass
-    ''')
+    ''',
+    )
     result = extract_python(path)
     rationale_edges = [e for e in result["edges"] if e.get("relation") == "rationale_for"]
     assert all(e.get("confidence") == "EXTRACTED" for e in rationale_edges)
 
 
 def test_alembic_module_docstring_suppressed(tmp_path):
-    path = _write_py(tmp_path, '''
+    path = _write_py(
+        tmp_path,
+        '''
         """initial schema
 
         Revision ID: 0001abcd
@@ -107,7 +130,8 @@ def upgrade():
 
         def downgrade():
             pass
-    ''')
+    ''',
+    )
     result = extract_python(path)
     rationale = [n for n in result["nodes"] if n.get("file_type") == "rationale"]
     assert not any("Revision ID" in n["label"] for n in rationale)
@@ -115,7 +139,9 @@ def downgrade():
 
 def test_alembic_function_docstrings_still_extracted(tmp_path):
     """Function docstrings inside upgrade/downgrade should still be captured."""
-    path = _write_py(tmp_path, '''
+    path = _write_py(
+        tmp_path,
+        '''
         """Revision ID: 0002 Revises: 0001"""
         revision = "0002"
         down_revision = "0001"
@@ -126,7 +152,8 @@ def upgrade():
 
         def downgrade():
             pass
-    ''')
+    ''',
+    )
     result = extract_python(path)
     rationale = [n for n in result["nodes"] if n.get("file_type") == "rationale"]
     # module docstring suppressed
@@ -137,39 +164,48 @@ def downgrade():
 
 def test_non_migration_revision_var_not_suppressed(tmp_path):
     """A file with a `revision` variable but no Alembic markers keeps its docstring."""
-    path = _write_py(tmp_path, '''
+    path = _write_py(
+        tmp_path,
+        '''
         """This module tracks document revisions because we need audit history."""
         revision = 42
 
         def get_revision(): pass
-    ''')
+    ''',
+    )
     result = extract_python(path)
     rationale = [n for n in result["nodes"] if n.get("file_type") == "rationale"]
     assert any("audit history" in n["label"] for n in rationale)
 
 
 def test_django_migration_module_docstring_suppressed(tmp_path):
-    path = _write_py(tmp_path, '''
+    path = _write_py(
+        tmp_path,
+        '''
         """Add post_priority_config table."""
         from django.db import migrations
 
         class Migration(migrations.Migration):
             dependencies = [("myapp", "0001_initial")]
             operations = []
-    ''')
+    ''',
+    )
     result = extract_python(path)
     rationale = [n for n in result["nodes"] if n.get("file_type") == "rationale"]
     assert not any("post_priority" in n["label"] for n in rationale)
 
 
 def test_generated_file_module_docstring_suppressed(tmp_path):
-    path = _write_py(tmp_path, '''
+    path = _write_py(
+        tmp_path,
+        '''
         """Generated by the protocol buffer compiler. DO NOT EDIT!"""
         from google.protobuf import descriptor as _descriptor
 
         class UserMessage:
             pass
-    ''')
+    ''',
+    )
     result = extract_python(path)
     rationale = [n for n in result["nodes"] if n.get("file_type") == "rationale"]
     assert not any("protocol buffer" in n["label"].lower() for n in rationale)
diff --git a/tests/test_report.py b/tests/test_report.py
index a5b3916a1..d9b9253d6 100644
--- a/tests/test_report.py
+++ b/tests/test_report.py
@@ -7,6 +7,7 @@
 
 FIXTURES = Path(__file__).parent / "fixtures"
 
+
 def make_inputs():
     extraction = json.loads((FIXTURES / "extraction.json").read_text())
     G = build_from_json(extraction)
@@ -19,45 +20,78 @@ def make_inputs():
     tokens = {"input": extraction["input_tokens"], "output": extraction["output_tokens"]}
     return G, communities, cohesion, labels, gods, surprises, detection, tokens
 
+
 def test_report_contains_header():
     G, communities, cohesion, labels, gods, surprises, detection, tokens = make_inputs()
-    report = generate(G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project")
+    report = generate(
+        G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project"
+    )
     assert "# Graph Report" in report
 
+
 def test_report_contains_corpus_check():
     G, communities, cohesion, labels, gods, surprises, detection, tokens = make_inputs()
-    report = generate(G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project")
+    report = generate(
+        G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project"
+    )
     assert "## Corpus Check" in report
 
+
 def test_report_contains_god_nodes():
     G, communities, cohesion, labels, gods, surprises, detection, tokens = make_inputs()
-    report = generate(G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project")
+    report = generate(
+        G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project"
+    )
     assert "## God Nodes" in report
 
+
 def test_report_contains_surprising_connections():
     G, communities, cohesion, labels, gods, surprises, detection, tokens = make_inputs()
-    report = generate(G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project")
+    report = generate(
+        G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project"
+    )
     assert "## Surprising Connections" in report
 
+
 def test_report_contains_communities():
     G, communities, cohesion, labels, gods, surprises, detection, tokens = make_inputs()
-    report = generate(G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project")
+    report = generate(
+        G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project"
+    )
     assert "## Communities" in report
 
+
 def test_report_contains_ambiguous_section():
     G, communities, cohesion, labels, gods, surprises, detection, tokens = make_inputs()
-    report = generate(G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project")
+    report = generate(
+        G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project"
+    )
     assert "## Ambiguous Edges" in report
 
+
 def test_report_shows_token_cost():
     G, communities, cohesion, labels, gods, surprises, detection, tokens = make_inputs()
-    report = generate(G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project")
+    report = generate(
+        G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project"
+    )
     assert "Token cost" in report
     assert "1,200" in report
 
+
 def test_report_shows_raw_cohesion_scores():
     G, communities, cohesion, labels, gods, surprises, detection, tokens = make_inputs()
-    report = generate(G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project", min_community_size=1)
+    report = generate(
+        G,
+        communities,
+        cohesion,
+        labels,
+        gods,
+        surprises,
+        detection,
+        tokens,
+        "./project",
+        min_community_size=1,
+    )
     assert "Cohesion:" in report
     assert "✓" not in report
     assert "⚠" not in report
diff --git a/tests/test_security.py b/tests/test_security.py
index c547ab842..32e160865 100644
--- a/tests/test_security.py
+++ b/tests/test_security.py
@@ -1,7 +1,7 @@
 """Tests for graphify/security.py - URL validation, safe fetch, path guards, label sanitisation."""
+
 from __future__ import annotations
 
-import json
 import urllib.error
 from pathlib import Path
 from typing import Any
@@ -17,9 +17,7 @@
     safe_fetch_text,
     validate_graph_path,
     validate_url,
-    _MAX_FETCH_BYTES,
     _MAX_GRAPH_FILE_BYTES,
-    _MAX_TEXT_BYTES,
     _METADATA_MAX_LIST_ITEMS,
     _METADATA_MAX_VALUE_LEN,
     _sanitize_metadata_string,
@@ -31,24 +29,30 @@
 # validate_url
 # ---------------------------------------------------------------------------
 
+
 def test_validate_url_accepts_http():
     assert validate_url("http://example.com/page") == "http://example.com/page"
 
+
 def test_validate_url_accepts_https():
     assert validate_url("https://arxiv.org/abs/1706.03762") == "https://arxiv.org/abs/1706.03762"
 
+
 def test_validate_url_rejects_file():
     with pytest.raises(ValueError, match="file"):
         validate_url("file:///etc/passwd")
 
+
 def test_validate_url_rejects_ftp():
     with pytest.raises(ValueError, match="ftp"):
         validate_url("ftp://files.example.com/data.zip")
 
+
 def test_validate_url_rejects_data():
     with pytest.raises(ValueError, match="data"):
         validate_url("data:text/html,<script>alert(1)</script>")
 
+
 def test_validate_url_rejects_empty_scheme():
     with pytest.raises(ValueError):
         validate_url("//no-scheme.example.com")
@@ -58,13 +62,14 @@ def test_validate_url_rejects_empty_scheme():
 # safe_fetch - scheme and redirect guards (mocked network)
 # ---------------------------------------------------------------------------
 
+
 def _make_mock_response(content: bytes, status: int = 200):
     mock = MagicMock()
     mock.__enter__ = lambda s: s
     mock.__exit__ = MagicMock(return_value=False)
     mock.status = status
     mock.code = status
-    chunks = [content[i:i+65536] for i in range(0, len(content), 65536)] + [b""]
+    chunks = [content[i : i + 65536] for i in range(0, len(content), 65536)] + [b""]
     mock.read.side_effect = chunks
     return mock
 
@@ -73,10 +78,12 @@ def test_safe_fetch_rejects_file_url():
     with pytest.raises(ValueError, match="file"):
         safe_fetch("file:///etc/passwd")
 
+
 def test_safe_fetch_rejects_ftp_url():
     with pytest.raises(ValueError, match="ftp"):
         safe_fetch("ftp://example.com/file.zip")
 
+
 def test_safe_fetch_returns_bytes(tmp_path):
     mock_resp = _make_mock_response(b"hello world")
     with patch("graphify.security._build_opener") as mock_opener_fn:
@@ -86,6 +93,7 @@ def test_safe_fetch_returns_bytes(tmp_path):
         result = safe_fetch("https://example.com/")
     assert result == b"hello world"
 
+
 def test_safe_fetch_raises_on_non_2xx():
     mock_resp = _make_mock_response(b"Not Found", status=404)
     with patch("graphify.security._build_opener") as mock_opener_fn:
@@ -95,6 +103,7 @@ def test_safe_fetch_raises_on_non_2xx():
         with pytest.raises(urllib.error.HTTPError):
             safe_fetch("https://example.com/missing")
 
+
 def test_safe_fetch_raises_on_size_exceeded():
     # Build a response larger than max_bytes
     big_chunk = b"x" * 65_537
@@ -118,6 +127,7 @@ def test_safe_fetch_raises_on_size_exceeded():
 # safe_fetch_text
 # ---------------------------------------------------------------------------
 
+
 def test_safe_fetch_text_decodes_utf8():
     content = "héllo wörld".encode("utf-8")
     mock_resp = _make_mock_response(content)
@@ -128,6 +138,7 @@ def test_safe_fetch_text_decodes_utf8():
         result = safe_fetch_text("https://example.com/")
     assert result == "héllo wörld"
 
+
 def test_safe_fetch_text_replaces_bad_bytes():
     bad = b"hello \xff world"
     mock_resp = _make_mock_response(bad)
@@ -145,6 +156,7 @@ def test_safe_fetch_text_replaces_bad_bytes():
 # validate_graph_path
 # ---------------------------------------------------------------------------
 
+
 def test_validate_graph_path_allows_inside_base(tmp_path):
     base = tmp_path / "graphify-out"
     base.mkdir()
@@ -153,6 +165,7 @@ def test_validate_graph_path_allows_inside_base(tmp_path):
     result = validate_graph_path(str(graph), base=base)
     assert result == graph.resolve()
 
+
 def test_validate_graph_path_blocks_traversal(tmp_path):
     base = tmp_path / "graphify-out"
     base.mkdir()
@@ -160,11 +173,13 @@ def test_validate_graph_path_blocks_traversal(tmp_path):
     with pytest.raises(ValueError, match="escapes"):
         validate_graph_path(str(evil), base=base)
 
+
 def test_validate_graph_path_requires_base_exists(tmp_path):
     base = tmp_path / "graphify-out"  # not created
     with pytest.raises(ValueError, match="does not exist"):
         validate_graph_path(str(base / "graph.json"), base=base)
 
+
 def test_validate_graph_path_raises_if_file_missing(tmp_path):
     base = tmp_path / "graphify-out"
     base.mkdir()
@@ -176,22 +191,26 @@ def test_validate_graph_path_raises_if_file_missing(tmp_path):
 # sanitize_label
 # ---------------------------------------------------------------------------
 
+
 def test_sanitize_label_passthrough_html_chars():
     # sanitize_label does NOT HTML-escape — callers that inject into HTML must
     # wrap with html.escape() themselves (e.g. the title in to_html())
     assert sanitize_label("<script>") == "<script>"
     assert sanitize_label("foo & bar") == "foo & bar"
 
+
 def test_sanitize_label_strips_control_chars():
     result = sanitize_label("hello\x00\x1fworld")
     assert "\x00" not in result
     assert "\x1f" not in result
     assert "helloworld" in result
 
+
 def test_sanitize_label_caps_at_256():
     long_label = "a" * 300
     assert len(sanitize_label(long_label)) <= 256
 
+
 def test_sanitize_label_safe_passthrough():
     assert sanitize_label("MyClass") == "MyClass"
     assert sanitize_label("extract_python") == "extract_python"
@@ -201,6 +220,7 @@ def test_sanitize_label_safe_passthrough():
 # check_graph_file_size_cap (#F4 — graph-load memory bomb protection)
 # ---------------------------------------------------------------------------
 
+
 def test_graph_size_cap_default_is_512_mib():
     assert _MAX_GRAPH_FILE_BYTES == 512 * 1024 * 1024
 
@@ -227,7 +247,7 @@ def test_graph_size_cap_error_message_includes_size_and_cap(monkeypatch, tmp_pat
         check_graph_file_size_cap(p)
     msg = str(excinfo.value)
     assert "16" in msg  # observed size
-    assert "8" in msg   # cap
+    assert "8" in msg  # cap
     assert "byte" in msg.lower()
 
 
@@ -266,6 +286,7 @@ def _boom(self):
 # sanitize_metadata (recursive, bounded, HTML-safe)
 # ---------------------------------------------------------------------------
 
+
 def test_sanitize_metadata_string_strips_control_chars():
     result = _sanitize_metadata_string("hello\x00\x1fworld")
     assert "\x00" not in result
@@ -281,7 +302,7 @@ def test_sanitize_metadata_string_escapes_html():
 
 
 def test_sanitize_metadata_string_escapes_quotes():
-    result = _sanitize_metadata_string('a"b\'c')
+    result = _sanitize_metadata_string("a\"b'c")
     # quote=True escapes both " and '
     assert "&quot;" in result
     assert "&#x27;" in result or "&apos;" in result
@@ -298,6 +319,7 @@ def test_sanitize_metadata_string_coerces_non_string():
     class _Custom:
         def __str__(self) -> str:
             return "custom-repr"
+
     assert _sanitize_metadata_string(_Custom()) == "custom-repr"
 
 
diff --git a/tests/test_semantic_similarity.py b/tests/test_semantic_similarity.py
index 55d9cce34..afde61082 100644
--- a/tests/test_semantic_similarity.py
+++ b/tests/test_semantic_similarity.py
@@ -1,8 +1,9 @@
 """Tests for semantically_similar_to edge support."""
+
 import networkx as nx
 import pytest
 from graphify.build import build_from_json
-from graphify.analyze import surprising_connections, _surprise_score
+from graphify.analyze import _surprise_score
 from graphify.report import generate
 
 
@@ -10,14 +11,25 @@
 # Helpers
 # ---------------------------------------------------------------------------
 
+
 def _make_extraction_with_semantic_edge():
     """Two nodes in separate files connected by a semantically_similar_to edge."""
     return {
         "nodes": [
-            {"id": "a_validate_input", "label": "validate_input", "file_type": "code",
-             "source_file": "auth/validators.py", "source_location": "L5"},
-            {"id": "b_check_input", "label": "check_input", "file_type": "code",
-             "source_file": "api/checks.py", "source_location": "L12"},
+            {
+                "id": "a_validate_input",
+                "label": "validate_input",
+                "file_type": "code",
+                "source_file": "auth/validators.py",
+                "source_location": "L5",
+            },
+            {
+                "id": "b_check_input",
+                "label": "check_input",
+                "file_type": "code",
+                "source_file": "api/checks.py",
+                "source_location": "L12",
+            },
         ],
         "edges": [
             {
@@ -51,13 +63,29 @@ def _make_two_edge_graph():
     ]:
         G.add_node(nid, label=label, source_file=src, file_type="code")
     # semantically_similar_to edge
-    G.add_edge("a", "b", relation="semantically_similar_to", confidence="INFERRED",
-               confidence_score=0.82, source_file="auth/validators.py", weight=0.82,
-               _src="a", _tgt="b")
+    G.add_edge(
+        "a",
+        "b",
+        relation="semantically_similar_to",
+        confidence="INFERRED",
+        confidence_score=0.82,
+        source_file="auth/validators.py",
+        weight=0.82,
+        _src="a",
+        _tgt="b",
+    )
     # plain references edge (same confidence tier)
-    G.add_edge("c", "d", relation="references", confidence="INFERRED",
-               confidence_score=0.7, source_file="config/loader.py", weight=0.7,
-               _src="c", _tgt="d")
+    G.add_edge(
+        "c",
+        "d",
+        relation="references",
+        confidence="INFERRED",
+        confidence_score=0.7,
+        source_file="config/loader.py",
+        weight=0.7,
+        _src="c",
+        _tgt="d",
+    )
     return G
 
 
@@ -65,6 +93,7 @@ def _make_two_edge_graph():
 # Test 1: semantically_similar_to passes through build_from_json without being dropped
 # ---------------------------------------------------------------------------
 
+
 def test_semantic_edge_survives_build_from_json():
     G = _make_graph_with_semantic_edge()
     assert G.number_of_edges() == 1
@@ -82,6 +111,7 @@ def test_semantic_edge_nodes_present():
 # Test 2: confidence_score is preserved for semantically_similar_to edges
 # ---------------------------------------------------------------------------
 
+
 def test_semantic_edge_confidence_score_preserved():
     G = _make_graph_with_semantic_edge()
     u, v, data = next(iter(G.edges(data=True)))
@@ -94,30 +124,26 @@ def test_semantic_edge_confidence_score_preserved():
 #         than references edges with the same community membership
 # ---------------------------------------------------------------------------
 
+
 def test_semantic_edge_scores_higher_than_references():
     G = _make_two_edge_graph()
-    communities = {0: ["a", "b"], 1: ["c", "d"]}
     node_community = {"a": 0, "b": 0, "c": 1, "d": 1}
 
     score_sem, reasons_sem = _surprise_score(
-        G, "a", "b", G.edges["a", "b"], node_community,
-        "auth/validators.py", "api/checks.py"
+        G, "a", "b", G.edges["a", "b"], node_community, "auth/validators.py", "api/checks.py"
     )
     score_ref, _ = _surprise_score(
-        G, "c", "d", G.edges["c", "d"], node_community,
-        "config/loader.py", "utils/reader.py"
+        G, "c", "d", G.edges["c", "d"], node_community, "config/loader.py", "utils/reader.py"
     )
     assert score_sem > score_ref
 
 
 def test_semantic_edge_reason_mentions_similarity():
     G = _make_two_edge_graph()
-    communities = {0: ["a", "b"], 1: ["c", "d"]}
     node_community = {"a": 0, "b": 0, "c": 1, "d": 1}
 
     _, reasons = _surprise_score(
-        G, "a", "b", G.edges["a", "b"], node_community,
-        "auth/validators.py", "api/checks.py"
+        G, "a", "b", G.edges["a", "b"], node_community, "auth/validators.py", "api/checks.py"
     )
     assert any("similar" in r for r in reasons)
 
@@ -126,6 +152,7 @@ def test_semantic_edge_reason_mentions_similarity():
 # Test 4: report renders [semantically similar] tag for these edges
 # ---------------------------------------------------------------------------
 
+
 def _make_report_with_semantic_surprise():
     G = _make_graph_with_semantic_edge()
     communities = {0: ["a_validate_input", "b_check_input"]}
@@ -145,7 +172,9 @@ def _make_report_with_semantic_surprise():
     ]
     detection = {"total_files": 2, "total_words": 500, "needs_graph": True, "warning": None}
     tokens = {"input": 100, "output": 50}
-    return generate(G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project")
+    return generate(
+        G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project"
+    )
 
 
 def test_report_renders_semantically_similar_tag():
@@ -171,8 +200,15 @@ def test_report_no_semantic_tag_for_other_relations():
         ("y", "Beta", "repo2/b.py"),
     ]:
         G.add_node(nid, label=label, source_file=src, file_type="code")
-    G.add_edge("x", "y", relation="references", confidence="EXTRACTED",
-               confidence_score=1.0, source_file="repo1/a.py", weight=1.0)
+    G.add_edge(
+        "x",
+        "y",
+        relation="references",
+        confidence="EXTRACTED",
+        confidence_score=1.0,
+        source_file="repo1/a.py",
+        weight=1.0,
+    )
 
     communities = {0: ["x", "y"]}
     cohesion = {0: 0.5}
@@ -190,5 +226,7 @@ def test_report_no_semantic_tag_for_other_relations():
     ]
     detection = {"total_files": 2, "total_words": 200, "needs_graph": True, "warning": None}
     tokens = {"input": 50, "output": 25}
-    report = generate(G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project")
+    report = generate(
+        G, communities, cohesion, labels, gods, surprises, detection, tokens, "./project"
+    )
     assert "[semantically similar]" not in report
diff --git a/tests/test_serve.py b/tests/test_serve.py
index 5ab6271af..5aeca66a6 100644
--- a/tests/test_serve.py
+++ b/tests/test_serve.py
@@ -1,4 +1,5 @@
 """Tests for serve.py - MCP graph query helpers (no mcp package required)."""
+
 import json
 import pytest
 import networkx as nx
@@ -37,6 +38,7 @@ def _make_graph() -> nx.Graph:
 
 # --- _communities_from_graph ---
 
+
 def test_communities_from_graph_basic():
     G = _make_graph()
     communities = _communities_from_graph(G)
@@ -46,12 +48,14 @@ def test_communities_from_graph_basic():
     assert "n2" in communities[0]
     assert "n3" in communities[1]
 
+
 def test_communities_from_graph_no_community_attr():
     G = nx.Graph()
     G.add_node("a", label="foo")  # no community attr
     communities = _communities_from_graph(G)
     assert communities == {}
 
+
 def test_communities_from_graph_isolated():
     G = _make_graph()
     communities = _communities_from_graph(G)
@@ -61,6 +65,7 @@ def test_communities_from_graph_isolated():
 
 # --- _score_nodes ---
 
+
 def test_score_nodes_exact_label_match():
     G = _make_graph()
     scored = _score_nodes(G, ["extract"])
@@ -68,11 +73,13 @@ def test_score_nodes_exact_label_match():
     assert "n1" in nids
     assert scored[0][1] == "n1"  # highest score first
 
+
 def test_score_nodes_no_match():
     G = _make_graph()
     scored = _score_nodes(G, ["xyzzy"])
     assert scored == []
 
+
 def test_score_nodes_source_file_partial():
     G = _make_graph()
     # "cluster.py" contains "cluster" - should score 0.5 for source match
@@ -112,12 +119,28 @@ def cut(self, text):
 
     monkeypatch.setattr(serve_mod, "_jieba", FakeJieba())
     terms = _query_terms("前端 dependency 依赖 install 安装 to of 包管理器 项目约定 a前")
-    assert terms == ["前端", "dependency", "依赖", "install", "安装", "包", "管理器", "包管理器", "项目", "约定", "项目约定", "前", "a前"]
+    assert terms == [
+        "前端",
+        "dependency",
+        "依赖",
+        "install",
+        "安装",
+        "包",
+        "管理器",
+        "包管理器",
+        "项目",
+        "约定",
+        "项目约定",
+        "前",
+        "a前",
+    ]
 
 
 def test_query_graph_text_keeps_short_non_english_terms():
     G = nx.Graph()
-    G.add_node("frontend", label="前端", source_file="docs/前端.md", source_location="L1", community=0)
+    G.add_node(
+        "frontend", label="前端", source_file="docs/前端.md", source_location="L1", community=0
+    )
     text = _query_graph_text(G, "前端", mode="bfs", depth=1)
     assert "No matching nodes found." not in text
     assert "NODE 前端" in text
@@ -135,6 +158,7 @@ def test_resolve_context_filters_explicit_overrides_heuristic():
 
 # --- _bfs ---
 
+
 def test_bfs_depth_1():
     G = _make_graph()
     visited, edges = _bfs(G, ["n1"], depth=1)
@@ -142,16 +166,19 @@ def test_bfs_depth_1():
     assert "n2" in visited  # direct neighbor
     assert "n3" not in visited  # 2 hops away
 
+
 def test_bfs_depth_2():
     G = _make_graph()
     visited, edges = _bfs(G, ["n1"], depth=2)
     assert "n3" in visited  # n1 -> n2 -> n3
 
+
 def test_bfs_disconnected():
     G = _make_graph()
     visited, edges = _bfs(G, ["n5"], depth=3)
     assert visited == {"n5"}  # isolated node
 
+
 def test_bfs_returns_edges():
     G = _make_graph()
     visited, edges = _bfs(G, ["n1"], depth=1)
@@ -170,6 +197,7 @@ def test_filter_graph_by_context_limits_traversal():
 
 # --- _dfs ---
 
+
 def test_dfs_depth_1():
     G = _make_graph()
     visited, edges = _dfs(G, ["n1"], depth=1)
@@ -177,6 +205,7 @@ def test_dfs_depth_1():
     assert "n2" in visited
     assert "n3" not in visited
 
+
 def test_dfs_full_chain():
     G = _make_graph()
     visited, edges = _dfs(G, ["n1"], depth=5)
@@ -185,18 +214,21 @@ def test_dfs_full_chain():
 
 # --- _subgraph_to_text ---
 
+
 def test_subgraph_to_text_contains_labels():
     G = _make_graph()
     text = _subgraph_to_text(G, {"n1", "n2"}, [("n1", "n2")])
     assert "extract" in text
     assert "cluster" in text
 
+
 def test_subgraph_to_text_truncates():
     G = _make_graph()
     # Very small budget forces truncation
     text = _subgraph_to_text(G, {"n1", "n2", "n3", "n4"}, [("n1", "n2")], token_budget=1)
     assert "truncated" in text
 
+
 def test_subgraph_to_text_edge_included():
     G = _make_graph()
     text = _subgraph_to_text(G, {"n1", "n2"}, [("n1", "n2")])
@@ -212,7 +244,9 @@ def test_subgraph_to_text_includes_edge_context():
 
 def test_query_graph_text_explicit_context_filter_changes_traversal():
     G = _make_graph()
-    text = _query_graph_text(G, "extract", mode="bfs", depth=2, token_budget=2000, context_filters=["call"])
+    text = _query_graph_text(
+        G, "extract", mode="bfs", depth=2, token_budget=2000, context_filters=["call"]
+    )
     assert "Context: call (explicit)" in text
     assert "cluster" in text
     assert "build" not in text
@@ -228,6 +262,7 @@ def test_query_graph_text_heuristic_context_filter_changes_traversal():
 
 # --- _load_graph ---
 
+
 def test_load_graph_roundtrip(tmp_path):
     G = _make_graph()
     data = json_graph.node_link_data(G, edges="links")
@@ -237,6 +272,7 @@ def test_load_graph_roundtrip(tmp_path):
     assert G2.number_of_nodes() == G.number_of_nodes()
     assert G2.number_of_edges() == G.number_of_edges()
 
+
 def test_load_graph_missing_file(tmp_path):
     graphify_dir = tmp_path / "graphify-out"
     graphify_dir.mkdir()
@@ -272,6 +308,7 @@ def test_load_graph_accepts_under_cap(monkeypatch, tmp_path):
 
 # --- #874: MCP hot-reload ---
 
+
 def _write_graph(path, nodes: list[str]) -> None:
     """Write a minimal graph.json with the given node IDs."""
     G = nx.DiGraph()
@@ -284,7 +321,6 @@ def _write_graph(path, nodes: list[str]) -> None:
 def test_maybe_reload_detects_graph_change(tmp_path):
     """serve() picks up a new graph.json written after startup (#874)."""
     import time
-    from unittest.mock import patch
 
     out = tmp_path / "graphify-out"
     out.mkdir()
@@ -326,13 +362,14 @@ def test_load_graph_cache_key_changes_with_content(tmp_path):
 
 # --- IDF weighting tests (#897) ---
 
+
 def _make_noisy_graph() -> nx.Graph:
     """20 error-handler nodes + 1 rare identifier: FooBarService."""
     G = nx.Graph()
     for i in range(20):
         G.add_node(f"err{i}", label=f"error_handler_{i}", source_file=f"err{i}.py", community=0)
         if i > 0:
-            G.add_edge(f"err{i-1}", f"err{i}", relation="calls", confidence="EXTRACTED")
+            G.add_edge(f"err{i - 1}", f"err{i}", relation="calls", confidence="EXTRACTED")
     G.add_node("fbs", label="FooBarService", source_file="service.py", community=1)
     G.add_node("fbs_dep", label="ServiceClient", source_file="client.py", community=1)
     G.add_edge("fbs", "fbs_dep", relation="uses", confidence="EXTRACTED")
@@ -345,9 +382,7 @@ def test_idf_downweights_common_terms():
     G = _make_noisy_graph()
     scored = _score_nodes(G, ["foobarservice", "error"])
     assert scored, "should have results"
-    assert scored[0][1] == "fbs", (
-        f"FooBarService should rank first, got {scored[0][1]}"
-    )
+    assert scored[0][1] == "fbs", f"FooBarService should rank first, got {scored[0][1]}"
 
 
 def test_idf_cached_on_graph():
@@ -368,7 +403,6 @@ def test_idf_new_graph_starts_fresh():
 
 def test_idf_rare_term_gets_high_weight():
     """A term matching only 1 of N nodes should get IDF > 1."""
-    import math
     G = _make_graph()  # 5 nodes
     idf = _compute_idf(G, ["extract"])
     # extract matches only n1: IDF = log(1 + 5/2) ≈ 1.25
@@ -377,7 +411,6 @@ def test_idf_rare_term_gets_high_weight():
 
 def test_idf_common_term_gets_low_weight():
     """A term matching most nodes should get IDF < 1."""
-    import math
     G = nx.Graph()
     # 'handle' in every node label
     for i in range(20):
@@ -388,6 +421,7 @@ def test_idf_common_term_gets_low_weight():
 
 # --- _pick_seeds tests (#897) ---
 
+
 def test_pick_seeds_dominant_identifier_gives_one_seed():
     """FooBarService at 1000 vs error nodes at 1.0 → only 1 seed chosen."""
     scored = [(1000.0, "fbs"), (1.0, "err1"), (0.9, "err2")]
@@ -419,6 +453,7 @@ def test_pick_seeds_respects_max_k():
 
 # --- actionable truncation hint (#897) ---
 
+
 def test_subgraph_to_text_truncation_hint_is_actionable():
     """Truncation message must tell Claude what to do, not just say truncated."""
     G = _make_graph()
@@ -429,6 +464,7 @@ def test_subgraph_to_text_truncation_hint_is_actionable():
 
 # --- integration: identifier + noise query seeds from identifier (#897) ---
 
+
 def test_query_seeds_from_identifier_not_noise():
     """'FooBarService error handling' should expand from FooBarService,
     not from error-handler nodes, so ServiceClient appears in results."""
@@ -446,7 +482,13 @@ def test_query_graph_text_parameter_type_context_filter_changes_traversal():
     graph.add_node("process", label="process", source_file="sample.cs", source_location="L20")
     graph.add_node("payload", label="Payload", source_file="sample.cs", source_location="L5")
     graph.add_node("other", label="PayloadFactory", source_file="sample.cs", source_location="L40")
-    graph.add_edge("process", "payload", relation="references", context="parameter_type", confidence="EXTRACTED")
+    graph.add_edge(
+        "process",
+        "payload",
+        relation="references",
+        context="parameter_type",
+        confidence="EXTRACTED",
+    )
     graph.add_edge("process", "other", relation="calls", context="call", confidence="EXTRACTED")
 
     text = _query_graph_text(graph, "who accepts Payload", context_filters=["parameter_type"])
@@ -457,7 +499,6 @@ def test_query_graph_text_parameter_type_context_filter_changes_traversal():
 
 
 def test_query_graph_text_context_filter_aliases_resolve():
-    import networkx as nx
     from graphify.serve import _normalize_context_filters
 
     assert _normalize_context_filters(["param"]) == ["parameter_type"]
@@ -475,6 +516,7 @@ def test_query_graph_text_context_filter_aliases_resolve():
 
 # --- Chinese segmentation ---
 
+
 def test_query_terms_chinese_segments_with_cached_jieba(monkeypatch):
     """Chinese text should use the cached jieba module and keep the original term."""
     import graphify.serve as serve_mod
@@ -533,8 +575,12 @@ def test_score_nodes_chinese_substring_match():
 def test_query_text_chinese_finds_routing_nodes():
     """Full pipeline: '页面路由' should find nodes with '路由' in label."""
     G = nx.Graph()
-    G.add_node("parent", label="页面路由规范", source_file="doc.md", source_location="L1", community=0)
-    G.add_node("child", label="路由桥接核对表", source_file="doc.md", source_location="L10", community=0)
+    G.add_node(
+        "parent", label="页面路由规范", source_file="doc.md", source_location="L1", community=0
+    )
+    G.add_node(
+        "child", label="路由桥接核对表", source_file="doc.md", source_location="L10", community=0
+    )
     G.add_edge("parent", "child", relation="contains", confidence="EXTRACTED")
     text = _query_graph_text(G, "页面路由", mode="bfs", depth=2)
     assert "No matching nodes found." not in text
diff --git a/tests/test_transcribe.py b/tests/test_transcribe.py
index 8e35f2aaf..7bbc1931c 100644
--- a/tests/test_transcribe.py
+++ b/tests/test_transcribe.py
@@ -1,8 +1,8 @@
 """Tests for graphify.transcribe — video/audio transcription support."""
+
 from __future__ import annotations
 
 import os
-from pathlib import Path
 from unittest.mock import MagicMock, patch
 
 import pytest
@@ -19,6 +19,7 @@
 # VIDEO_EXTENSIONS
 # ---------------------------------------------------------------------------
 
+
 def test_video_extensions_set():
     assert ".mp4" in VIDEO_EXTENSIONS
     assert ".mp3" in VIDEO_EXTENSIONS
@@ -31,6 +32,7 @@ def test_video_extensions_set():
 # build_whisper_prompt
 # ---------------------------------------------------------------------------
 
+
 def test_build_whisper_prompt_no_nodes():
     """Empty god_nodes returns fallback prompt."""
     prompt = build_whisper_prompt([])
@@ -65,6 +67,7 @@ def test_build_whisper_prompt_nodes_without_labels():
 # transcribe
 # ---------------------------------------------------------------------------
 
+
 def test_transcribe_uses_cache(tmp_path):
     """If transcript already exists, transcribe() returns cached path without running Whisper."""
     video = tmp_path / "lecture.mp4"
@@ -105,7 +108,9 @@ def test_transcribe_missing_faster_whisper(tmp_path):
     video = tmp_path / "clip.mp4"
     video.write_bytes(b"fake")
 
-    with patch("graphify.transcribe._get_whisper", side_effect=ImportError("faster-whisper not installed")):
+    with patch(
+        "graphify.transcribe._get_whisper", side_effect=ImportError("faster-whisper not installed")
+    ):
         with pytest.raises(ImportError):
             transcribe(video, output_dir=tmp_path / "out")
 
@@ -114,6 +119,7 @@ def test_transcribe_missing_faster_whisper(tmp_path):
 # transcribe_all
 # ---------------------------------------------------------------------------
 
+
 def test_transcribe_all_empty():
     """Empty input returns empty list without error."""
     assert transcribe_all([]) == []
diff --git a/tests/test_validate.py b/tests/test_validate.py
index e5f9cd50f..421b4f988 100644
--- a/tests/test_validate.py
+++ b/tests/test_validate.py
@@ -34,7 +34,7 @@ def test_missing_edges_key():
 
 
 def test_not_a_dict():
-    errors = validate_extraction([])
+    errors = validate_extraction([])  # type: ignore[reportArgumentType]
     assert len(errors) == 1
 
 
@@ -156,11 +156,13 @@ def test_assert_valid_passes_silently():
 def test_validate_extraction_does_not_typeerror_on_non_list_nodes():
     """validate_extraction must report 'nodes must be a list' without raising TypeError."""
     from graphify.validate import validate_extraction
+
     errors = validate_extraction({"nodes": 123, "edges": []})
     assert any("'nodes' must be a list" in e for e in errors)
 
 
 def test_validate_extraction_does_not_typeerror_on_non_list_edges():
     from graphify.validate import validate_extraction
+
     errors = validate_extraction({"nodes": [], "edges": 42})
     assert any("'edges' must be a list" in e for e in errors)
diff --git a/tests/test_watch.py b/tests/test_watch.py
index 80dff3148..7522a8335 100644
--- a/tests/test_watch.py
+++ b/tests/test_watch.py
@@ -1,4 +1,5 @@
 """Tests for watch.py - file watcher helpers (no watchdog required)."""
+
 import json
 import os
 import subprocess
@@ -12,18 +13,21 @@
 
 # --- _notify_only ---
 
+
 def test_notify_only_creates_flag(tmp_path):
     _notify_only(tmp_path)
     flag = tmp_path / "graphify-out" / "needs_update"
     assert flag.exists()
     assert flag.read_text() == "1"
 
+
 def test_notify_only_creates_flag_dir(tmp_path):
     # graphify-out dir does not exist yet
     assert not (tmp_path / "graphify-out").exists()
     _notify_only(tmp_path)
     assert (tmp_path / "graphify-out").is_dir()
 
+
 def test_notify_only_idempotent(tmp_path):
     _notify_only(tmp_path)
     _notify_only(tmp_path)
@@ -33,21 +37,25 @@ def test_notify_only_idempotent(tmp_path):
 
 # --- _WATCHED_EXTENSIONS ---
 
+
 def test_watched_extensions_includes_code():
     assert ".py" in _WATCHED_EXTENSIONS
     assert ".ts" in _WATCHED_EXTENSIONS
     assert ".go" in _WATCHED_EXTENSIONS
     assert ".rs" in _WATCHED_EXTENSIONS
 
+
 def test_watched_extensions_includes_docs():
     assert ".md" in _WATCHED_EXTENSIONS
     assert ".txt" in _WATCHED_EXTENSIONS
     assert ".pdf" in _WATCHED_EXTENSIONS
 
+
 def test_watched_extensions_includes_images():
     assert ".png" in _WATCHED_EXTENSIONS
     assert ".jpg" in _WATCHED_EXTENSIONS
 
+
 def test_watched_extensions_excludes_noise():
     # .json is now indexed (bash/JSON extractors added in #866)
     assert ".json" in _WATCHED_EXTENSIONS
@@ -58,15 +66,18 @@ def test_watched_extensions_excludes_noise():
 
 # --- watch() import error without watchdog ---
 
+
 def test_check_update_no_flag_returns_true(tmp_path):
     """check_update returns True and is silent when needs_update flag is absent."""
     from graphify.watch import check_update
+
     assert check_update(tmp_path) is True
 
 
 def test_check_update_with_flag_returns_true_and_prints(tmp_path, capsys):
     """check_update returns True and prints notification when flag exists."""
     from graphify.watch import check_update
+
     flag = tmp_path / "graphify-out" / "needs_update"
     flag.parent.mkdir(parents=True, exist_ok=True)
     flag.write_text("1")
@@ -79,6 +90,7 @@ def test_check_update_with_flag_returns_true_and_prints(tmp_path, capsys):
 def test_check_update_does_not_clear_flag(tmp_path):
     """check_update never removes the needs_update flag (clearing is LLM's job)."""
     from graphify.watch import check_update
+
     flag = tmp_path / "graphify-out" / "needs_update"
     flag.parent.mkdir(parents=True, exist_ok=True)
     flag.write_text("1")
@@ -88,6 +100,7 @@ def test_check_update_does_not_clear_flag(tmp_path):
 
 def test_watch_raises_without_watchdog(tmp_path, monkeypatch):
     import builtins
+
     real_import = builtins.__import__
 
     def mock_import(name, *args, **kwargs):
@@ -98,6 +111,7 @@ def mock_import(name, *args, **kwargs):
     monkeypatch.setattr(builtins, "__import__", mock_import)
 
     from graphify.watch import watch
+
     with pytest.raises(ImportError, match="watchdog not installed"):
         watch(tmp_path)
 
@@ -150,12 +164,8 @@ def test_rebuild_code_evicts_nodes_from_deleted_files(tmp_path):
     corpus = tmp_path / "corpus"
     corpus.mkdir()
 
-    (corpus / "auth.py").write_text(
-        "def login(): pass\ndef logout(): pass\n", encoding="utf-8"
-    )
-    (corpus / "utils.py").write_text(
-        "def format_date(): pass\n", encoding="utf-8"
-    )
+    (corpus / "auth.py").write_text("def login(): pass\ndef logout(): pass\n", encoding="utf-8")
+    (corpus / "utils.py").write_text("def format_date(): pass\n", encoding="utf-8")
 
     assert _rebuild_code(corpus, acquire_lock=False) is True
     graph_path = corpus / "graphify-out" / "graph.json"
@@ -168,7 +178,9 @@ def test_rebuild_code_evicts_nodes_from_deleted_files(tmp_path):
     assert _rebuild_code(corpus, acquire_lock=False) is True
     data = json.loads(graph_path.read_text(encoding="utf-8"))
     node_labels_after = {n["label"] for n in data.get("nodes", [])}
-    assert "format_date()" not in node_labels_after, "stale function node from deleted file must be evicted"
+    assert "format_date()" not in node_labels_after, (
+        "stale function node from deleted file must be evicted"
+    )
     assert "login()" in node_labels_after, "nodes from surviving file must be kept"
 
 
@@ -192,7 +204,9 @@ def test_rebuild_code_is_idempotent_when_cluster_ids_flap(tmp_path, monkeypatch)
     from graphify.watch import _rebuild_code
 
     src = tmp_path / "app.py"
-    src.write_text("def alpha():\n    return 1\n\ndef beta():\n    return alpha()\n", encoding="utf-8")
+    src.write_text(
+        "def alpha():\n    return 1\n\ndef beta():\n    return alpha()\n", encoding="utf-8"
+    )
 
     calls = {"n": 0}
 
@@ -225,7 +239,9 @@ def test_rebuild_code_skips_cluster_when_topology_unchanged(tmp_path, monkeypatc
     from graphify.watch import _rebuild_code
 
     src = tmp_path / "app.py"
-    src.write_text("def alpha():\n    return 1\n\ndef beta():\n    return alpha()\n", encoding="utf-8")
+    src.write_text(
+        "def alpha():\n    return 1\n\ndef beta():\n    return alpha()\n", encoding="utf-8"
+    )
 
     calls = {"n": 0}
 
@@ -243,12 +259,33 @@ def cluster_once(G):
     assert calls["n"] == 1
 
 
+def test_rebuild_code_no_viz_removes_stale_html_and_skips_export(tmp_path, monkeypatch, capsys):
+    from graphify import export as export_mod
+    from graphify.watch import _rebuild_code
+
+    (tmp_path / "app.py").write_text("def alpha():\n    return 1\n", encoding="utf-8")
+    out = tmp_path / "graphify-out"
+    out.mkdir()
+    stale_html = out / "graph.html"
+    stale_html.write_text("<html/>", encoding="utf-8")
+
+    def fail_to_html(*_args, **_kwargs):
+        raise AssertionError("to_html should not be called when no_viz=True")
+
+    monkeypatch.setattr(export_mod, "to_html", fail_to_html)
+
+    assert _rebuild_code(tmp_path, no_viz=True)
+    assert not stale_html.exists()
+    assert "Skipped graph.html" not in capsys.readouterr().out
+
+
 # --- .graphifyignore honored in watch handler (gh-928) ---
 
 
 def _watchdog_available() -> bool:
     try:
         import watchdog  # noqa: F401
+
         return True
     except ImportError:
         return False
@@ -274,7 +311,9 @@ def test_watch_handler_honors_graphifyignore(tmp_path, monkeypatch):
 
     # Run watch() in a thread with a short debounce so we can verify the
     # post-debounce dispatch path actually runs on real events.
-    t = threading.Thread(target=watch_mod.watch, args=(tmp_path,), kwargs={"debounce": 0.2}, daemon=True)
+    t = threading.Thread(
+        target=watch_mod.watch, args=(tmp_path,), kwargs={"debounce": 0.2}, daemon=True
+    )
     t.start()
     time.sleep(0.5)  # let observer.start() settle
 
@@ -318,7 +357,9 @@ def counting_loader(root):
     monkeypatch.setattr(watch_mod, "_rebuild_code", lambda p, **kw: True)
     monkeypatch.setattr(watch_mod, "_notify_only", lambda p: None)
 
-    t = threading.Thread(target=watch_mod.watch, args=(tmp_path,), kwargs={"debounce": 0.2}, daemon=True)
+    t = threading.Thread(
+        target=watch_mod.watch, args=(tmp_path,), kwargs={"debounce": 0.2}, daemon=True
+    )
     t.start()
     time.sleep(0.5)
 
@@ -331,6 +372,7 @@ def counting_loader(root):
 
 # --- _check_shrink: silent-corruption guard with explicit-deletion bypass ---
 
+
 def _shrink_payload(n: int) -> dict:
     """Build a minimal graph-data dict with *n* placeholder nodes."""
     return {"nodes": [{"id": f"n{i}"} for i in range(n)], "links": []}
@@ -426,6 +468,7 @@ def test_check_shrink_keeps_tmp_when_deletions_declared(tmp_path):
 
 # --- _rebuild_code integration: post-commit delete scenario ---
 
+
 @pytest.mark.skipif(sys.platform == "win32", reason="git CLI behaviour varies on Windows runners")
 def test_rebuild_code_prunes_deleted_file_nodes(tmp_path):
     """End-to-end probe of the post-commit-delete bug fix.
diff --git a/tests/test_wiki.py b/tests/test_wiki.py
index 063d47ed6..09279a814 100644
--- a/tests/test_wiki.py
+++ b/tests/test_wiki.py
@@ -1,8 +1,8 @@
 """Tests for graphify.wiki — Wikipedia-style article generation."""
+
 import pytest
-from pathlib import Path
 import networkx as nx
-from graphify.wiki import to_wiki, _index_md, _community_article, _god_node_article
+from graphify.wiki import to_wiki
 
 
 def _make_graph():
@@ -25,14 +25,28 @@ def _make_graph():
 
 def test_to_wiki_writes_index(tmp_path):
     G = _make_graph()
-    n = to_wiki(G, COMMUNITIES, tmp_path, community_labels=LABELS, cohesion=COHESION, god_nodes_data=GOD_NODES)
+    to_wiki(
+        G,
+        COMMUNITIES,
+        tmp_path,
+        community_labels=LABELS,
+        cohesion=COHESION,
+        god_nodes_data=GOD_NODES,
+    )
     assert (tmp_path / "index.md").exists()
 
 
 def test_to_wiki_returns_article_count(tmp_path):
     G = _make_graph()
     # 2 communities + 1 god node = 3
-    n = to_wiki(G, COMMUNITIES, tmp_path, community_labels=LABELS, cohesion=COHESION, god_nodes_data=GOD_NODES)
+    n = to_wiki(
+        G,
+        COMMUNITIES,
+        tmp_path,
+        community_labels=LABELS,
+        cohesion=COHESION,
+        god_nodes_data=GOD_NODES,
+    )
     assert n == 3
 
 
@@ -169,6 +183,7 @@ def test_god_node_article_community_without_node_attr(tmp_path):
 
 # Regression tests for #936 - stale community node IDs crash to_wiki after dedup/re-extract
 
+
 def test_to_wiki_drops_stale_community_nodes(tmp_path):
     """Stale node IDs in communities dict are silently dropped without crash (#936)."""
     G = _make_graph()
diff --git a/uv.lock b/uv.lock
index 1a78eb925..a93755d00 100644
--- a/uv.lock
+++ b/uv.lock
@@ -1159,6 +1159,7 @@ all = [
     { name = "openpyxl" },
     { name = "pypdf" },
     { name = "python-docx" },
+    { name = "starlette" },
     { name = "tiktoken" },
     { name = "tree-sitter-sql" },
     { name = "watchdog" },
@@ -1186,6 +1187,7 @@ leiden = [
 ]
 mcp = [
     { name = "mcp" },
+    { name = "starlette" },
 ]
 neo4j = [
     { name = "neo4j" },
@@ -1270,6 +1272,8 @@ requires-dist = [
     { name = "python-docx", marker = "extra == 'all'" },
     { name = "python-docx", marker = "extra == 'office'" },
     { name = "rapidfuzz" },
+    { name = "starlette", marker = "extra == 'all'", specifier = ">=1.0.1" },
+    { name = "starlette", marker = "extra == 'mcp'", specifier = ">=1.0.1" },
     { name = "tiktoken", marker = "extra == 'all'" },
     { name = "tiktoken", marker = "extra == 'gemini'" },
     { name = "tiktoken", marker = "extra == 'kimi'" },
@@ -4163,15 +4167,15 @@ wheels = [
 
 [[package]]
 name = "starlette"
-version = "1.0.0"
+version = "1.2.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "anyio" },
     { name = "typing-extensions", marker = "python_full_version < '3.13'" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/81/69/17425771797c36cded50b7fe44e850315d039f28b15901ab44839e70b593/starlette-1.0.0.tar.gz", hash = "sha256:6a4beaf1f81bb472fd19ea9b918b50dc3a77a6f2e190a12954b25e6ed5eea149", size = 2655289, upload-time = "2026-03-22T18:29:46.779Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/c5/bf/616a066c2760f6c2b1ae3437cc28149734d069fbb46511712beae118a68c/starlette-1.2.0.tar.gz", hash = "sha256:3c5a6b23fff42492914e93890bb80cbfea72dbf37de268eec06185d62a4ca553", size = 2668923, upload-time = "2026-05-28T11:42:50.568Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/0b/c9/584bc9651441b4ba60cc4d557d8a547b5aff901af35bda3a4ee30c819b82/starlette-1.0.0-py3-none-any.whl", hash = "sha256:d3ec55e0bb321692d275455ddfd3df75fff145d009685eb40dc91fc66b03d38b", size = 72651, upload-time = "2026-03-22T18:29:45.111Z" },
+    { url = "https://files.pythonhosted.org/packages/9f/85/492183764d5d01d4514be3730fdb8e228a80605783099551c51627578b5d/starlette-1.2.0-py3-none-any.whl", hash = "sha256:36e0c76ac59157e75dc4b3bdeafba97fb04eaf1878045f15dbef666a6f092ed7", size = 73213, upload-time = "2026-05-28T11:42:48.801Z" },
 ]
 
 [[package]]

From 1d4c5d608af668d38486e0d67052091f8bdd0cf7 Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Wed, 27 May 2026 15:33:46 -0500
Subject: [PATCH 03/21] feat(multigraph): PR 4A dedup/remap contract for keyed
 parallel edges

Make entity dedup/remap safe under MultiDiGraph keyed parallel edges:
- deduplicate_entities() tracks self-loop drops and exact-duplicate
  collapses per relation/source via new diagnostics dict parameter
- DEC-013: preserve real pre-existing self-loops; drop only
  remap-induced artifacts
- remove_all_parallel_edges() helper avoids the NetworkX two-tuple
  removal trap (remove_edges_from removes only one edge per 2-tuple)
- Diagnostics wired through build() only for multigraph=True builds
  so simple-graph output stays byte-identical
- 18 new tests in test_dedup_remap.py; 105 targeted pass, 1507 full

gost
---
 graphify/build.py         |  57 ++--
 graphify/dedup.py         | 122 ++++++---
 graphify/edge_identity.py |  33 +++
 tests/test_dedup_remap.py | 542 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 706 insertions(+), 48 deletions(-)
 create mode 100644 tests/test_dedup_remap.py

diff --git a/graphify/build.py b/graphify/build.py
index 386560005..f7bad0d35 100644
--- a/graphify/build.py
+++ b/graphify/build.py
@@ -38,13 +38,30 @@
 # emit. Keeps semantic intent close (markdown→document, tool→code) and falls
 # back to "concept" for any other invalid value (see #840).
 _LANG_FAMILY: dict[str, str] = {
-    ".py": "py", ".pyi": "py",
-    ".js": "js", ".mjs": "js", ".cjs": "js", ".jsx": "js",
-    ".ts": "js", ".tsx": "js",
-    ".go": "go", ".rs": "rs",
-    ".java": "jvm", ".kt": "jvm", ".scala": "jvm", ".groovy": "jvm",
-    ".c": "c", ".h": "c", ".cc": "cpp", ".cpp": "cpp", ".hpp": "cpp",
-    ".rb": "rb", ".php": "php", ".cs": "cs", ".swift": "swift", ".lua": "lua",
+    ".py": "py",
+    ".pyi": "py",
+    ".js": "js",
+    ".mjs": "js",
+    ".cjs": "js",
+    ".jsx": "js",
+    ".ts": "js",
+    ".tsx": "js",
+    ".go": "go",
+    ".rs": "rs",
+    ".java": "jvm",
+    ".kt": "jvm",
+    ".scala": "jvm",
+    ".groovy": "jvm",
+    ".c": "c",
+    ".h": "c",
+    ".cc": "cpp",
+    ".cpp": "cpp",
+    ".hpp": "cpp",
+    ".rb": "rb",
+    ".php": "php",
+    ".cs": "cs",
+    ".swift": "swift",
+    ".lua": "lua",
 }
 
 
@@ -107,7 +124,9 @@ def _stable_identity_component(value: object) -> str | None:
         # os.fspath can return bytes for bytes-flavored PathLike; coerce to str
         # so downstream json.dumps / hashing always sees text.
         fs_value = os.fspath(value)
-        return fs_value.decode("utf-8", errors="replace") if isinstance(fs_value, bytes) else fs_value
+        return (
+            fs_value.decode("utf-8", errors="replace") if isinstance(fs_value, bytes) else fs_value
+        )
     if isinstance(value, (set, frozenset)):
         return json.dumps(sorted(str(item) for item in value), ensure_ascii=False)
     try:
@@ -284,6 +303,7 @@ def build_from_json(
     multigraph_groups: dict[tuple[Hashable, Hashable, str], list[dict]] = {}
     multigraph_explicit_keys: set[tuple[Hashable, Hashable, str]] = set()
     multigraph_diagnostics = {"exact_duplicate_edges": 0, "key_collision_edges": 0}
+
     # Iterate edges in a deterministic order. The graph is undirected and stores
     # direction in _src/_tgt; when two edges collapse onto the same node pair the
     # last write wins, so an unstable iteration order flips _src/_tgt run-to-run
@@ -329,9 +349,7 @@ def _edge_sort_key(edge: object) -> tuple[str, str, str, str]:
             continue  # skip edges to external/stdlib nodes - expected, not an error
         # Exclude legacy from/to alongside source/target so they don't survive
         # as ordinary edge attrs after legacy-shape remap above.
-        base_attrs = {
-            k: v for k, v in edge.items() if k not in ("source", "target", "from", "to")
-        }
+        base_attrs = {k: v for k, v in edge.items() if k not in ("source", "target", "from", "to")}
         raw_key, attrs = strip_schema_key(base_attrs)
         if "source_file" in attrs:
             attrs["source_file"] = _norm_source_file(
@@ -515,14 +533,21 @@ def build(
         combined["hyperedges"].extend(ext.get("hyperedges", []))
         combined["input_tokens"] += ext.get("input_tokens", 0)
         combined["output_tokens"] += ext.get("output_tokens", 0)
+    dedup_diagnostics: dict = {}
     if dedup and combined["nodes"]:
         combined["nodes"], combined["edges"] = deduplicate_entities(
             combined["nodes"],
             combined["edges"],
             communities={},
             dedup_llm_backend=dedup_llm_backend,
+            diagnostics=dedup_diagnostics,
         )
-    return build_from_json(combined, directed=directed, root=root, multigraph=multigraph)
+    G = build_from_json(combined, directed=directed, root=root, multigraph=multigraph)
+    if multigraph and dedup_diagnostics:
+        existing = G.graph.get("graphify_multigraph_diagnostics", {})
+        existing.update(dedup_diagnostics)
+        G.graph["graphify_multigraph_diagnostics"] = existing
+    return G
 
 
 def _norm_label(label: str) -> str:
@@ -615,8 +640,7 @@ def build_merge(
         data = json.loads(graph_path.read_text(encoding="utf-8"))
         if not isinstance(data, dict):
             raise TypeError(
-                f"saved graph.json at {graph_path} must be a JSON object, "
-                f"got {type(data).__name__}"
+                f"saved graph.json at {graph_path} must be a JSON object, got {type(data).__name__}"
             )
         # Refuse to silently collapse a saved multigraph. build() runs in
         # simple mode here, which would drop parallel edges; stateful
@@ -684,10 +708,7 @@ def build_merge(
             norm = _norm_source_file(p, _root_str)
             if norm:
                 prune_set.add(norm)
-        to_remove = [
-            n for n, d in G.nodes(data=True)
-            if d.get("source_file") in prune_set
-        ]
+        to_remove = [n for n, d in G.nodes(data=True) if d.get("source_file") in prune_set]
         G.remove_nodes_from(to_remove)
         n_files = len(prune_sources)
         n_nodes = len(to_remove)
diff --git a/graphify/dedup.py b/graphify/dedup.py
index d5b82766c..34ea20423 100644
--- a/graphify/dedup.py
+++ b/graphify/dedup.py
@@ -3,11 +3,14 @@
 Pipeline: exact normalization → entropy gate → MinHash/LSH blocking →
 Jaro-Winkler verification → same-community boost → union-find merge.
 """
+
 from __future__ import annotations
+import json
 import math
 import re
 import unicodedata
 from collections import defaultdict
+from typing import Any
 
 from datasketch import MinHash, MinHashLSH
 from rapidfuzz.distance import JaroWinkler
@@ -15,6 +18,7 @@
 
 # ── helpers ───────────────────────────────────────────────────────────────────
 
+
 def _norm(label: str) -> str:
     """Lowercase + collapse non-alphanumeric runs to space (Unicode-aware)."""
     label = unicodedata.normalize("NFKC", label)
@@ -80,6 +84,7 @@ def _short_label_blocked(a: str, b: str, jw_score: float) -> bool:
     if max(len(a), len(b)) >= 12:
         return False
     from rapidfuzz.distance import DamerauLevenshtein
+
     # Allow only same-length single-char substitutions (true typos like "Extractor"/"Extractar").
     # Block length-differing pairs regardless of score.
     if jw_score >= 97.0 and len(a) == len(b) and DamerauLevenshtein.distance(a, b) <= 1:
@@ -89,6 +94,7 @@ def _short_label_blocked(a: str, b: str, jw_score: float) -> bool:
 
 # ── union-find ────────────────────────────────────────────────────────────────
 
+
 class _UF:
     def __init__(self) -> None:
         self._parent: dict[str, str] = {}
@@ -118,20 +124,22 @@ def components(self) -> dict[str, list[str]]:
 
 _ENTROPY_THRESHOLD = 2.5
 _LSH_THRESHOLD = 0.7
-_MERGE_THRESHOLD = 92.0     # rapidfuzz normalized_similarity * 100
-_COMMUNITY_BOOST = 5.0      # score bonus when both nodes share community
+_MERGE_THRESHOLD = 92.0  # rapidfuzz normalized_similarity * 100
+_COMMUNITY_BOOST = 5.0  # score bonus when both nodes share community
 _NUM_PERM = 128
 _CHUNK_SUFFIX = re.compile(r"_c\d+$")
 
 
 # ── main entry point ──────────────────────────────────────────────────────────
 
+
 def deduplicate_entities(
     nodes: list[dict],
     edges: list[dict],
     *,
     communities: dict[str, int],
     dedup_llm_backend: str | None = None,
+    diagnostics: dict | None = None,
 ) -> tuple[list[dict], list[dict]]:
     """Deduplicate near-identical entities in a knowledge graph.
 
@@ -147,7 +155,7 @@ def deduplicate_entities(
     # Guard: cross-project dedup is not supported — nodes from different repos
     # share label names by coincidence and must never be merged by string similarity.
     # If you need to dedup a global graph, run deduplicate_entities per-repo first.
-    repos_seen = {n.get("repo") for n in nodes if n.get("repo")}
+    repos_seen = {str(repo) for n in nodes if (repo := n.get("repo"))}
     if len(repos_seen) > 1:
         raise ValueError(
             f"deduplicate_entities: nodes span multiple repos {sorted(repos_seen)!r}. "
@@ -160,7 +168,7 @@ def deduplicate_entities(
     # Pre-deduplicate: keep first occurrence of each id
     seen_ids: dict[str, dict] = {}
     for node in nodes:
-        nid = node.get("id", "")
+        nid = str(node.get("id") or "")
         if nid and nid not in seen_ids:
             seen_ids[nid] = node
     unique_nodes = list(seen_ids.values())
@@ -190,7 +198,7 @@ def deduplicate_entities(
             if len(file_group) > 1:
                 winner = _pick_winner(file_group)
                 for node in file_group:
-                    uf.union(winner["id"], node["id"])
+                    uf.union(str(winner["id"]), str(node["id"]))
                 exact_merges += len(file_group) - 1
 
     # ── pass 2: MinHash/LSH + Jaro-Winkler (high-entropy nodes only) ─────────
@@ -211,18 +219,20 @@ def deduplicate_entities(
         for node in candidates:
             norm_label = _norm(node.get("label", node.get("id", "")))
             m = _make_minhash(norm_label)
-            minhashes[node["id"]] = m
+            node_id = str(node["id"])
+            minhashes[node_id] = m
             try:
-                lsh.insert(node["id"], m)
+                lsh.insert(node_id, m)
             except ValueError:
                 pass  # duplicate key in LSH — already inserted
 
         for node in candidates:
-            node_id = node["id"]
+            node_id = str(node["id"])
             norm_label = _norm(node.get("label", node.get("id", "")))
-            neighbors = lsh.query(minhashes[node_id])
+            neighbors: list[Any] = lsh.query(minhashes[node_id])
 
             for neighbor_id in neighbors:
+                neighbor_id = str(neighbor_id)
                 if neighbor_id == node_id:
                     continue
                 if uf.find(node_id) == uf.find(neighbor_id):
@@ -242,8 +252,12 @@ def deduplicate_entities(
 
                 c1 = communities.get(node_id)
                 c2 = communities.get(neighbor_id)
-                if (c1 is not None and c2 is not None and c1 == c2
-                        and min(len(norm_label), len(neighbor_norm)) >= 12):
+                if (
+                    c1 is not None
+                    and c2 is not None
+                    and c1 == c2
+                    and min(len(norm_label), len(neighbor_norm)) >= 12
+                ):
                     score += _COMMUNITY_BOOST
 
                 if score >= _MERGE_THRESHOLD:
@@ -256,11 +270,12 @@ def deduplicate_entities(
                         sf_b = neighbor.get("source_file") or ""
                         if sf_a != sf_b:
                             continue
-                    all_group = norm_to_nodes.get(norm_label, [node]) + \
-                                norm_to_nodes.get(neighbor_norm, [neighbor])
+                    all_group = norm_to_nodes.get(norm_label, [node]) + norm_to_nodes.get(
+                        neighbor_norm, [neighbor]
+                    )
                     winner = _pick_winner(all_group)
-                    uf.union(winner["id"], node_id)
-                    uf.union(winner["id"], neighbor_id)
+                    uf.union(str(winner["id"]), node_id)
+                    uf.union(str(winner["id"]), neighbor_id)
                     fuzzy_merges += 1
 
     # ── pass 3: LLM tiebreaker for ambiguous pairs (opt-in) ──────────────────
@@ -283,6 +298,12 @@ def deduplicate_entities(
 
     # ── apply remap ───────────────────────────────────────────────────────────
     if not remap:
+        if diagnostics is not None:
+            diagnostics["remap_self_loop_drops"] = 0
+            diagnostics["remap_self_loop_drops_by_relation"] = {}
+            diagnostics["remap_self_loop_drops_by_source"] = {}
+            diagnostics["remap_exact_duplicate_collapses"] = 0
+            diagnostics["remap_exact_duplicate_collapses_by_relation"] = {}
         return unique_nodes, edges
 
     total = len(remap)
@@ -295,25 +316,57 @@ def deduplicate_entities(
     print(msg + ".", flush=True)
 
     deduped_nodes = [n for n in unique_nodes if n["id"] not in remap]
-    deduped_edges = []
+    deduped_edges: list[dict] = []
+    seen_fingerprints: set[str] = set()
+    self_loop_drops = 0
+    self_loop_by_relation: dict[str, int] = defaultdict(int)
+    self_loop_by_source: dict[str, int] = defaultdict(int)
+    exact_dup_collapses = 0
+    exact_dup_by_relation: dict[str, int] = defaultdict(int)
+
     for edge in edges:
         e = dict(edge)
-        # Tolerate "from"/"to" keys from LLM backends that don't follow the
-        # schema exactly — build_from_json normalises later but dedup runs
-        # first so bracket access would KeyError here (#803).
-        # Use explicit key presence check (not `or`) so empty-string src/tgt
-        # aren't silently replaced by the fallback key.
         src = e["source"] if "source" in e else e.get("from")
         tgt = e["target"] if "target" in e else e.get("to")
         if src is None or tgt is None:
             continue
         e["source"] = remap.get(src, src)
         e["target"] = remap.get(tgt, tgt)
-        # Remove legacy keys so they don't leak into edge attrs in graph.json.
         e.pop("from", None)
         e.pop("to", None)
-        if e["source"] != e["target"]:
-            deduped_edges.append(e)
+
+        relation = e.get("relation", "")
+        source_file = e.get("source_file", "")
+
+        if e["source"] == e["target"] and src != tgt:
+            self_loop_drops += 1
+            self_loop_by_relation[relation] += 1
+            self_loop_by_source[source_file] += 1
+            continue
+
+        fingerprint = json.dumps(e, sort_keys=True, ensure_ascii=False, default=str)
+        if fingerprint in seen_fingerprints:
+            exact_dup_collapses += 1
+            exact_dup_by_relation[relation] += 1
+            continue
+        seen_fingerprints.add(fingerprint)
+
+        deduped_edges.append(e)
+
+    if diagnostics is not None:
+        diagnostics["remap_self_loop_drops"] = self_loop_drops
+        diagnostics["remap_self_loop_drops_by_relation"] = dict(self_loop_by_relation)
+        diagnostics["remap_self_loop_drops_by_source"] = dict(self_loop_by_source)
+        diagnostics["remap_exact_duplicate_collapses"] = exact_dup_collapses
+        diagnostics["remap_exact_duplicate_collapses_by_relation"] = dict(exact_dup_by_relation)
+
+    if self_loop_drops or exact_dup_collapses:
+        parts = []
+        if self_loop_drops:
+            parts.append(f"dropped {self_loop_drops} self-loop edge(s)")
+        if exact_dup_collapses:
+            parts.append(f"collapsed {exact_dup_collapses} exact-duplicate edge(s)")
+        print(f"[graphify] Remap: {'; '.join(parts)}.", flush=True)
 
     return deduped_nodes, deduped_edges
 
@@ -343,12 +396,18 @@ def _llm_tiebreak(
     """Batch-resolve ambiguous pairs (score in [low, high)) via LLM."""
     try:
         from graphify.llm import BACKENDS, _format_backend_env_keys, _get_backend_api_key
+
         if backend not in BACKENDS:
-            print(f"[graphify] --dedup-llm: unknown backend {backend!r}, skipping LLM tiebreaker.", flush=True)
+            print(
+                f"[graphify] --dedup-llm: unknown backend {backend!r}, skipping LLM tiebreaker.",
+                flush=True,
+            )
             return
         if not _get_backend_api_key(backend):
             env_keys = _format_backend_env_keys(backend)
-            print(f"[graphify] --dedup-llm: {env_keys} not set, skipping LLM tiebreaker.", flush=True)
+            print(
+                f"[graphify] --dedup-llm: {env_keys} not set, skipping LLM tiebreaker.", flush=True
+            )
             return
     except ImportError:
         return
@@ -368,8 +427,12 @@ def _llm_tiebreak(
                 continue
             c1 = communities.get(node["id"])
             c2 = communities.get(neighbor["id"])
-            if (c1 is not None and c2 is not None and c1 == c2
-                    and min(len(norm_i), len(norm_j)) >= 12):
+            if (
+                c1 is not None
+                and c2 is not None
+                and c1 == c2
+                and min(len(norm_i), len(norm_j)) >= 12
+            ):
                 score += _COMMUNITY_BOOST
             if low <= score < high:
                 ambiguous.append((node, neighbor, score))
@@ -392,8 +455,7 @@ def _llm_tiebreak(
     for batch_start in range(0, len(ambiguous), batch_size):
         batch = ambiguous[batch_start : batch_start + batch_size]
         pairs_text = "\n".join(
-            f"{i+1}. \"{a['label']}\" vs \"{b['label']}\""
-            for i, (a, b, _) in enumerate(batch)
+            f'{i + 1}. "{a["label"]}" vs "{b["label"]}"' for i, (a, b, _) in enumerate(batch)
         )
         prompt = (
             "For each pair below, answer only 'yes' or 'no': are they the same real-world concept?\n\n"
diff --git a/graphify/edge_identity.py b/graphify/edge_identity.py
index f1802bca4..cfdece534 100644
--- a/graphify/edge_identity.py
+++ b/graphify/edge_identity.py
@@ -10,6 +10,10 @@
 
 import hashlib
 import json as _json
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    import networkx as nx
 
 SCHEMA_KEY_FIELD = "key"
 
@@ -56,3 +60,32 @@ def strip_schema_key(attrs: dict) -> tuple[object | None, dict]:
     key_val = attrs.get(SCHEMA_KEY_FIELD)
     cleaned = {k: v for k, v in attrs.items() if k != SCHEMA_KEY_FIELD}
     return key_val, cleaned
+
+
+def remove_all_parallel_edges(
+    G: "nx.Graph",
+    u: object,
+    v: object,
+) -> int:
+    """Remove ALL edges between u and v, regardless of key count.
+
+    On MultiDiGraph, ``G.remove_edge(u, v)`` removes only one edge (first key).
+    This helper explicitly iterates keys to remove all parallel edges.
+
+    Returns the number of edges removed.  Does not raise if no edges exist
+    between u and v (returns 0).
+    """
+    import networkx as nx
+
+    if isinstance(G, (nx.MultiGraph, nx.MultiDiGraph)):
+        if not G.has_node(u) or not G.has_node(v):
+            return 0
+        keys = list(G[u][v].keys()) if G.has_edge(u, v) else []
+        for key in keys:
+            G.remove_edge(u, v, key=key)
+        return len(keys)
+    else:
+        if G.has_edge(u, v):
+            G.remove_edge(u, v)
+            return 1
+        return 0
diff --git a/tests/test_dedup_remap.py b/tests/test_dedup_remap.py
new file mode 100644
index 000000000..467f6c0a8
--- /dev/null
+++ b/tests/test_dedup_remap.py
@@ -0,0 +1,542 @@
+"""Tests for PR 4A: dedup remap contract — parallel edge preservation,
+self-loop counting, exact duplicate collapse, build integration, and the
+remove_all_parallel_edges helper.
+
+Groups A–C and E will fail until the production dedup/build changes land
+(diagnostics parameter, remap counters).  Group D tests the helper
+implemented in edge_identity.py and should pass immediately.
+"""
+
+from __future__ import annotations
+
+import networkx as nx
+
+from graphify.dedup import deduplicate_entities
+from graphify.build import build
+from graphify.edge_identity import remove_all_parallel_edges
+
+
+# ── helpers (mirrors test_dedup.py patterns) ─────────────────────────────────
+
+
+def _make_nodes(*labels, source_file="test.md"):
+    return [
+        {"id": label.lower().replace(" ", "_"), "label": label, "source_file": source_file}
+        for label in labels
+    ]
+
+
+def _make_edge(src, tgt, relation="relates_to", source_file="test.py", **extra):
+    edge = {"source": src, "target": tgt, "relation": relation, "source_file": source_file}
+    edge.update(extra)
+    return edge
+
+
+# ═══════════════════════════════════════════════════════════════════════════════
+# Group A: Post-remap parallel edge preservation
+# ═══════════════════════════════════════════════════════════════════════════════
+
+
+def test_remap_preserves_parallel_edges_different_relation():
+    """Two edges A->C (calls) and A->C (imports) survive when B is merged into C.
+
+    Setup: nodes A, B, C where B and C are exact duplicates (same normalized label).
+    Edges: A->B (calls), A->C (imports).  After dedup, B merges into C (winner).
+    Expected: two edges A->C with different relations survive.
+    """
+    # B and C share the normalized label "dataloader" so they are exact duplicates.
+    nodes = [
+        {"id": "a", "label": "Caller", "source_file": "a.py"},
+        {"id": "b", "label": "DataLoader", "source_file": "a.py"},
+        {"id": "c", "label": "dataloader", "source_file": "a.py"},
+    ]
+    edges = [
+        _make_edge("a", "b", relation="calls", source_file="a.py"),
+        _make_edge("a", "c", relation="imports", source_file="a.py"),
+    ]
+    diagnostics: dict = {}
+    result_nodes, result_edges = deduplicate_entities(
+        nodes,
+        edges,
+        communities={},
+        diagnostics=diagnostics,
+    )
+    # B merged into winner; both edges now point to the winner
+    assert len(result_nodes) == 2
+    # Two distinct relations -> both edges survive
+    relations = {e["relation"] for e in result_edges}
+    assert "calls" in relations
+    assert "imports" in relations
+    assert len(result_edges) == 2
+
+
+def test_remap_preserves_parallel_edges_incoming_and_outgoing():
+    """Edges B->X and Y->B survive as C->X and Y->C when B merges into C."""
+    nodes = [
+        {"id": "x", "label": "NodeX", "source_file": "a.py"},
+        {"id": "y", "label": "NodeY", "source_file": "a.py"},
+        {"id": "b", "label": "DataLoader", "source_file": "a.py"},
+        {"id": "c", "label": "dataloader", "source_file": "a.py"},
+    ]
+    edges = [
+        _make_edge("b", "x", relation="calls", source_file="a.py"),
+        _make_edge("y", "b", relation="imports", source_file="a.py"),
+    ]
+    diagnostics: dict = {}
+    result_nodes, result_edges = deduplicate_entities(
+        nodes,
+        edges,
+        communities={},
+        diagnostics=diagnostics,
+    )
+    assert len(result_nodes) == 3  # x, y, winner(b/c)
+    # Both edges survive: one outgoing, one incoming
+    assert len(result_edges) == 2
+    # The loser ID should be remapped to the winner
+    winner_id = next(n["id"] for n in result_nodes if n["label"] in ("DataLoader", "dataloader"))
+    sources = {e["source"] for e in result_edges}
+    targets = {e["target"] for e in result_edges}
+    assert winner_id in sources or winner_id in targets
+
+
+def test_remap_preserves_edges_with_different_source_location():
+    """Two edges A->B with same relation but different source_location survive remap."""
+    nodes = [
+        {"id": "a", "label": "Caller", "source_file": "a.py"},
+        {"id": "b", "label": "DataLoader", "source_file": "a.py"},
+        {"id": "c", "label": "dataloader", "source_file": "a.py"},
+    ]
+    edges = [
+        _make_edge("a", "b", relation="calls", source_file="a.py", source_location="L10"),
+        _make_edge("a", "c", relation="calls", source_file="a.py", source_location="L20"),
+    ]
+    diagnostics: dict = {}
+    result_nodes, result_edges = deduplicate_entities(
+        nodes,
+        edges,
+        communities={},
+        diagnostics=diagnostics,
+    )
+    assert len(result_nodes) == 2
+    # Same relation but different source_location -> both survive (not exact duplicates)
+    assert len(result_edges) == 2
+    locations = {e.get("source_location") for e in result_edges}
+    assert locations == {"L10", "L20"}
+
+
+def test_remap_preserves_key_field_through_dict_copy():
+    """If edge dicts carry a pre-existing 'key' field, remap preserves it verbatim."""
+    nodes = [
+        {"id": "a", "label": "Caller", "source_file": "a.py"},
+        {"id": "b", "label": "DataLoader", "source_file": "a.py"},
+        {"id": "c", "label": "dataloader", "source_file": "a.py"},
+    ]
+    edges = [
+        _make_edge("a", "b", relation="calls", source_file="a.py", key="user-key-1"),
+    ]
+    diagnostics: dict = {}
+    result_nodes, result_edges = deduplicate_entities(
+        nodes,
+        edges,
+        communities={},
+        diagnostics=diagnostics,
+    )
+    assert len(result_edges) == 1
+    assert result_edges[0].get("key") == "user-key-1"
+
+
+# ═══════════════════════════════════════════════════════════════════════════════
+# Group B: Self-loop counting + exact duplicate collapse
+# ═══════════════════════════════════════════════════════════════════════════════
+
+
+def test_remap_counts_self_loop_drops():
+    """Self-loop drops counted in diagnostics dict, broken down by relation and source_file.
+
+    Setup: nodes A, B that are exact duplicates.  Edge A->B (calls, from a.py).
+    After remap: B merges into A, edge becomes A->A = self-loop, dropped.
+    Assert diagnostics: remap_self_loop_drops=1, by_relation={'calls':1}, by_source={'a.py':1}
+    """
+    nodes = [
+        {"id": "a", "label": "DataLoader", "source_file": "a.py"},
+        {"id": "b", "label": "dataloader", "source_file": "a.py"},
+    ]
+    edges = [
+        _make_edge("a", "b", relation="calls", source_file="a.py"),
+    ]
+    diagnostics: dict = {}
+    result_nodes, result_edges = deduplicate_entities(
+        nodes,
+        edges,
+        communities={},
+        diagnostics=diagnostics,
+    )
+    assert len(result_nodes) == 1
+    assert result_edges == []  # self-loop dropped
+    assert diagnostics.get("remap_self_loop_drops") == 1
+    assert diagnostics.get("remap_self_loop_drops_by_relation", {}).get("calls") == 1
+    assert diagnostics.get("remap_self_loop_drops_by_source", {}).get("a.py") == 1
+
+
+def test_remap_preserves_preexisting_self_loop_on_remapped_node():
+    """A real self-loop survives when its node is remapped to the canonical winner."""
+    nodes = [
+        {"id": "winner", "label": "DataLoader", "source_file": "a.py"},
+        {"id": "loser_long", "label": "dataloader", "source_file": "a.py"},
+    ]
+    edges = [
+        _make_edge("loser_long", "loser_long", relation="calls", source_file="a.py"),
+    ]
+    diagnostics: dict = {}
+
+    result_nodes, result_edges = deduplicate_entities(
+        nodes,
+        edges,
+        communities={},
+        diagnostics=diagnostics,
+    )
+
+    assert len(result_nodes) == 1
+    assert result_edges == [
+        {"source": "winner", "target": "winner", "relation": "calls", "source_file": "a.py"}
+    ]
+    assert diagnostics.get("remap_self_loop_drops") == 0
+
+
+def test_remap_collapses_exact_duplicates_after_remap():
+    """Two edges that become identical after remap collapse to one.
+
+    Setup: nodes A, B, C where B merges into C.
+    Two edges: A->B (calls, from x.py, line 10) and A->C (calls, from x.py, line 10).
+    After remap both become A->C with identical attrs -> collapse to one.
+    Assert diagnostics: remap_exact_duplicate_collapses=1, by_relation={'calls':1}
+    """
+    nodes = [
+        {"id": "a", "label": "Caller", "source_file": "x.py"},
+        {"id": "b", "label": "DataLoader", "source_file": "x.py"},
+        {"id": "c", "label": "dataloader", "source_file": "x.py"},
+    ]
+    edges = [
+        _make_edge("a", "b", relation="calls", source_file="x.py", source_location="L10"),
+        _make_edge("a", "c", relation="calls", source_file="x.py", source_location="L10"),
+    ]
+    diagnostics: dict = {}
+    result_nodes, result_edges = deduplicate_entities(
+        nodes,
+        edges,
+        communities={},
+        diagnostics=diagnostics,
+    )
+    assert len(result_nodes) == 2
+    # After remap, both edges are identical (A->winner, calls, x.py, L10) -> collapse to 1
+    assert len(result_edges) == 1
+    assert diagnostics.get("remap_exact_duplicate_collapses") == 1
+    assert diagnostics.get("remap_exact_duplicate_collapses_by_relation", {}).get("calls") == 1
+
+
+def test_remap_does_not_collapse_non_exact_duplicates():
+    """Two edges with same source/target after remap but different attrs both survive.
+
+    Setup: nodes A, B, C where B merges into C.
+    Edges: A->B (calls, line 10), A->C (calls, line 20).
+    After remap: A->C (calls, line 10) and A->C (calls, line 20) — both survive.
+    """
+    nodes = [
+        {"id": "a", "label": "Caller", "source_file": "x.py"},
+        {"id": "b", "label": "DataLoader", "source_file": "x.py"},
+        {"id": "c", "label": "dataloader", "source_file": "x.py"},
+    ]
+    edges = [
+        _make_edge("a", "b", relation="calls", source_file="x.py", source_location="L10"),
+        _make_edge("a", "c", relation="calls", source_file="x.py", source_location="L20"),
+    ]
+    diagnostics: dict = {}
+    result_nodes, result_edges = deduplicate_entities(
+        nodes,
+        edges,
+        communities={},
+        diagnostics=diagnostics,
+    )
+    assert len(result_nodes) == 2
+    assert len(result_edges) == 2
+    locations = {e.get("source_location") for e in result_edges}
+    assert locations == {"L10", "L20"}
+
+
+def test_remap_returns_diagnostics_when_dict_provided():
+    """diagnostics dict is populated with all counter keys even when counts are zero."""
+    nodes = [
+        {"id": "a", "label": "Caller", "source_file": "a.py"},
+        {"id": "b", "label": "Target", "source_file": "a.py"},
+    ]
+    edges = [_make_edge("a", "b", relation="calls")]
+    diagnostics: dict = {}
+    deduplicate_entities(nodes, edges, communities={}, diagnostics=diagnostics)
+    # When no merges happen, diagnostics should still have the counter keys at 0
+    assert "remap_self_loop_drops" in diagnostics
+    assert "remap_exact_duplicate_collapses" in diagnostics
+    assert diagnostics["remap_self_loop_drops"] == 0
+    assert diagnostics["remap_exact_duplicate_collapses"] == 0
+
+
+# ═══════════════════════════════════════════════════════════════════════════════
+# Group C: Build integration
+# ═══════════════════════════════════════════════════════════════════════════════
+
+
+def test_build_with_dedup_and_multigraph_preserves_parallel_edges():
+    """build(extractions, dedup=True, multigraph=True) preserves non-duplicate parallel edges.
+
+    Create 2 extraction chunks with overlapping nodes but different edges.
+    Assert the built MultiDiGraph has the expected parallel edges.
+    """
+    ext1 = {
+        "nodes": [
+            {"id": "caller", "label": "Caller", "file_type": "code", "source_file": "a.py"},
+            {"id": "dataloader", "label": "DataLoader", "file_type": "code", "source_file": "b.py"},
+        ],
+        "edges": [
+            {
+                "source": "caller",
+                "target": "dataloader",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "a.py",
+                "source_location": "L10",
+            },
+        ],
+    }
+    ext2 = {
+        "nodes": [
+            {"id": "caller", "label": "Caller", "file_type": "code", "source_file": "a.py"},
+            {"id": "dataloader", "label": "DataLoader", "file_type": "code", "source_file": "b.py"},
+        ],
+        "edges": [
+            {
+                "source": "caller",
+                "target": "dataloader",
+                "relation": "imports",
+                "confidence": "EXTRACTED",
+                "source_file": "a.py",
+                "source_location": "L2",
+            },
+        ],
+    }
+    G = build([ext1, ext2], dedup=True, multigraph=True, directed=True)
+    assert isinstance(G, nx.MultiDiGraph)
+    # Two edges with different relations should both survive
+    assert G.number_of_edges("caller", "dataloader") == 2
+    relations = {data["relation"] for data in G["caller"]["dataloader"].values()}
+    assert relations == {"calls", "imports"}
+
+
+def test_build_with_dedup_and_multigraph_reports_diagnostics():
+    """G.graph['graphify_multigraph_diagnostics'] contains remap_ prefixed counters after build."""
+    ext1 = {
+        "nodes": [
+            {"id": "a", "label": "Caller", "file_type": "code", "source_file": "a.py"},
+            {"id": "b", "label": "DataLoader", "file_type": "code", "source_file": "a.py"},
+            {"id": "c", "label": "dataloader", "file_type": "code", "source_file": "a.py"},
+        ],
+        "edges": [
+            {
+                "source": "a",
+                "target": "b",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "a.py",
+                "source_location": "L10",
+            },
+        ],
+    }
+    G = build([ext1], dedup=True, multigraph=True, directed=True)
+    diag = G.graph.get("graphify_multigraph_diagnostics", {})
+    # Should contain remap_ prefixed counters from the dedup pass
+    remap_keys = [k for k in diag if k.startswith("remap_")]
+    assert len(remap_keys) > 0, f"Expected remap_ keys in diagnostics, got: {diag}"
+
+
+# ═══════════════════════════════════════════════════════════════════════════════
+# Group D: Safe remove-all-parallel helper
+# ═══════════════════════════════════════════════════════════════════════════════
+
+
+def test_remove_all_parallel_edges_removes_all_keys():
+    """On MultiDiGraph with 3 edges between u,v (different keys), removes all 3."""
+    G = nx.MultiDiGraph()
+    G.add_edge("a", "b", key="k1", relation="calls")
+    G.add_edge("a", "b", key="k2", relation="imports")
+    G.add_edge("a", "b", key="k3", relation="references")
+    assert G.number_of_edges("a", "b") == 3
+
+    removed = remove_all_parallel_edges(G, "a", "b")
+
+    assert removed == 3
+    assert G.number_of_edges("a", "b") == 0
+
+
+def test_remove_all_parallel_edges_no_edges_noop():
+    """No edges between u,v -> returns 0, no raise."""
+    G = nx.MultiDiGraph()
+    G.add_node("a")
+    G.add_node("b")
+
+    removed = remove_all_parallel_edges(G, "a", "b")
+
+    assert removed == 0
+
+
+def test_remove_all_parallel_edges_simple_digraph():
+    """On simple DiGraph, removes the single edge, returns 1."""
+    G = nx.DiGraph()
+    G.add_edge("a", "b", relation="calls")
+
+    removed = remove_all_parallel_edges(G, "a", "b")
+
+    assert removed == 1
+    assert not G.has_edge("a", "b")
+
+
+def test_remove_all_parallel_edges_does_not_use_two_tuple_semantics():
+    """Verify the helper works correctly even if NetworkX's remove_edges_from
+    would only remove one.
+
+    Create MultiDiGraph with 3 keyed edges between (a,b). Call helper.
+    Assert all 3 removed.
+    """
+    G = nx.MultiDiGraph()
+    G.add_edge("a", "b", key="k1", relation="calls")
+    G.add_edge("a", "b", key="k2", relation="imports")
+    G.add_edge("a", "b", key="k3", relation="references")
+
+    # NetworkX's remove_edges_from with 2-tuple only removes first key:
+    # G.remove_edges_from([("a", "b")]) would leave 2 edges.
+    # Our helper must remove all 3.
+    removed = remove_all_parallel_edges(G, "a", "b")
+
+    assert removed == 3
+    assert G.number_of_edges() == 0
+    assert G.has_node("a")  # nodes preserved
+    assert G.has_node("b")
+
+
+def test_remove_all_parallel_edges_missing_node():
+    """If either node doesn't exist in the graph, returns 0 without raising."""
+    G = nx.MultiDiGraph()
+    G.add_node("a")
+
+    assert remove_all_parallel_edges(G, "a", "nonexistent") == 0
+    assert remove_all_parallel_edges(G, "nonexistent", "a") == 0
+
+
+def test_remove_all_parallel_edges_simple_graph_no_edge():
+    """On simple Graph with no edge between u,v, returns 0."""
+    G = nx.Graph()
+    G.add_node("a")
+    G.add_node("b")
+
+    assert remove_all_parallel_edges(G, "a", "b") == 0
+
+
+def test_remove_all_parallel_edges_multigraph_undirected():
+    """On undirected MultiGraph, removes all parallel edges."""
+    G = nx.MultiGraph()
+    G.add_edge("a", "b", key="k1", relation="calls")
+    G.add_edge("a", "b", key="k2", relation="imports")
+
+    removed = remove_all_parallel_edges(G, "a", "b")
+
+    assert removed == 2
+    assert G.number_of_edges() == 0
+
+
+# ═══════════════════════════════════════════════════════════════════════════════
+# Group E: Simple-graph regression
+# ═══════════════════════════════════════════════════════════════════════════════
+
+
+def test_simple_graph_dedup_output_unchanged():
+    """Default simple-graph build+dedup on a fixed fixture produces identical output.
+
+    This is the go/no-go regression: if this test fails, PR 4A broke the default path.
+    """
+    extraction = {
+        "nodes": [
+            {
+                "id": "graphextractor",
+                "label": "GraphExtractor",
+                "file_type": "code",
+                "source_file": "a.py",
+            },
+            {
+                "id": "graph_extractor",
+                "label": "graph_extractor",
+                "file_type": "code",
+                "source_file": "a.py",
+            },
+            {"id": "dataloader", "label": "DataLoader", "file_type": "code", "source_file": "b.py"},
+            {
+                "id": "networkanalyzer",
+                "label": "NetworkAnalyzer",
+                "file_type": "code",
+                "source_file": "c.py",
+            },
+        ],
+        "edges": [
+            {
+                "source": "graphextractor",
+                "target": "dataloader",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "a.py",
+                "source_location": "L5",
+            },
+            {
+                "source": "graph_extractor",
+                "target": "dataloader",
+                "relation": "imports",
+                "confidence": "EXTRACTED",
+                "source_file": "a.py",
+                "source_location": "L1",
+            },
+            {
+                "source": "dataloader",
+                "target": "networkanalyzer",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "b.py",
+                "source_location": "L10",
+            },
+        ],
+    }
+
+    # Default simple-graph build with dedup
+    G = build([extraction], dedup=True, directed=True)
+    assert "graphify_multigraph_diagnostics" not in G.graph
+
+    # "GraphExtractor" and "graph_extractor" are near-duplicates — dedup merges them.
+    # _pick_winner prefers shorter ID, no chunk suffix -> "graphextractor" wins
+    # (both are same length=14; tiebreak is by sort, so the first is picked).
+    # Alternatively graph_extractor (15 chars) vs graphextractor (14 chars) -> graphextractor wins.
+    winner_candidates = {"graphextractor", "graph_extractor"}
+    surviving_nodes = set(G.nodes())
+
+    # After dedup: 3 nodes survive (one of the two graph-extractor variants + dataloader + networkanalyzer)
+    assert G.number_of_nodes() == 3, (
+        f"Expected 3 nodes after dedup, got {G.number_of_nodes()}: {sorted(surviving_nodes)}"
+    )
+
+    # The winner is the one that survived
+    winner = winner_candidates & surviving_nodes
+    assert len(winner) == 1, f"Expected exactly one winner from {winner_candidates}, got {winner}"
+
+    assert "dataloader" in surviving_nodes
+    assert "networkanalyzer" in surviving_nodes
+
+    # Edges: after remap, both edges pointing to dataloader should survive
+    # (different relations: calls vs imports), plus the dataloader->networkanalyzer edge.
+    # The self-loop case doesn't apply here since edges go from graph_extractor -> dataloader.
+    assert G.number_of_edges() >= 2, f"Expected at least 2 edges, got {G.number_of_edges()}"
+
+    # The dataloader->networkanalyzer edge must survive unchanged
+    assert G.has_edge("dataloader", "networkanalyzer")

From 32b7eb4a00ec6e7ccf98682a30a50abb1fc540e8 Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Wed, 27 May 2026 16:32:19 -0500
Subject: [PATCH 04/21] feat(multigraph): PR 4B algorithm/report safety via
 explicit projections

Route clustering, cohesion, god-node, and report logic through projections
so parallel edges do not inflate metrics or crash algorithms:

- analyze.py: edge betweenness uses project_for_community() on multigraphs
  to avoid 3-tuple key crash; god_nodes/surprise scoring uses
  distinct_neighbor_degree() instead of G.degree() to prevent inflation
- cluster.py: Louvain/Leiden receives projected simple graph instead of
  raw MultiDiGraph; cohesion_score() projects before counting edges so
  result stays in [0.0, 1.0]
- report.py: edge counts distinguish total records from unique pairs on
  multigraphs ("8 edges (3 unique pairs)")

12 new tests: betweenness crash, degree inflation, cohesion bound,
community detection, edge count semantics. 1515 full suite pass.

gost
---
 graphify/analyze.py   |  20 +++---
 graphify/cluster.py   |  11 +++-
 graphify/report.py    |  12 +++-
 tests/test_analyze.py | 139 +++++++++++++++++++++++++++++++++++++++++-
 tests/test_cluster.py |  84 +++++++++++++++++++++++++
 tests/test_report.py  |  88 ++++++++++++++++++++++++++
 6 files changed, 342 insertions(+), 12 deletions(-)

diff --git a/graphify/analyze.py b/graphify/analyze.py
index 0812d528d..db556ac07 100644
--- a/graphify/analyze.py
+++ b/graphify/analyze.py
@@ -6,6 +6,7 @@
 
 from graphify.build import edge_data
 from graphify.detect import CODE_EXTENSIONS, IMAGE_EXTENSIONS, PAPER_EXTENSIONS
+from graphify.projections import distinct_neighbor_degree, project_for_community
 
 # Language families — extensions sharing a runtime can legitimately call each other
 _LANG_FAMILY: dict[str, str] = {
@@ -63,7 +64,7 @@ def _is_file_node(G: nx.Graph, node_id: str) -> bool:
         return True
     # Module-level function stub: labeled 'function_name()' - only has a contains edge
     # These are real functions but structurally isolated by definition; not a gap worth flagging
-    if label.endswith("()") and G.degree(node_id) <= 1:
+    if label.endswith("()") and distinct_neighbor_degree(G, node_id) <= 1:
         return True
     return False
 
@@ -108,7 +109,7 @@ def god_nodes(G: nx.Graph, top_n: int = 10) -> list[dict]:
     File-level hub nodes are excluded: they accumulate import/contains edges
     mechanically and don't represent meaningful architectural abstractions.
     """
-    degree = dict(G.degree())
+    degree = {n: distinct_neighbor_degree(G, n) for n in G.nodes()}
     sorted_nodes = sorted(degree.items(), key=lambda x: x[1], reverse=True)
     result = []
     for node_id, deg in sorted_nodes:
@@ -258,8 +259,8 @@ def _surprise_score(
         reasons.append("semantically similar concepts with no structural link")
 
     # 5. Peripheral→hub: a low-degree node connecting to a high-degree one
-    deg_u = degrees[u] if degrees is not None else G.degree(u)
-    deg_v = degrees[v] if degrees is not None else G.degree(v)
+    deg_u = degrees[u] if degrees is not None else distinct_neighbor_degree(G, u)
+    deg_v = degrees[v] if degrees is not None else distinct_neighbor_degree(G, v)
     if min(deg_u, deg_v) <= 2 and max(deg_u, deg_v) >= 5:
         score += 1
         peripheral = G.nodes[u].get("label", u) if deg_u <= 2 else G.nodes[v].get("label", v)
@@ -284,7 +285,7 @@ def _cross_file_surprises(G: nx.Graph, communities: dict[int, list[str]], top_n:
     Each result includes a 'why' field explaining what makes it non-obvious.
     """
     node_community = _node_community_map(communities)
-    degrees = dict(G.degree())
+    degrees = {n: distinct_neighbor_degree(G, n) for n in G.nodes()}
     candidates = []
 
     for u, v, data in G.edges(data=True):
@@ -352,7 +353,10 @@ def _cross_community_surprises(
             return []
         if G.number_of_nodes() > 5000:
             return []
-        betweenness = nx.edge_betweenness_centrality(G)
+        # Project to simple graph so betweenness returns 2-tuple keys
+        # and parallel edges don't inflate centrality scores.
+        simple = project_for_community(G) if isinstance(G, (nx.MultiGraph, nx.MultiDiGraph)) else G
+        betweenness = nx.edge_betweenness_centrality(simple)
         top_edges = sorted(betweenness.items(), key=lambda x: x[1], reverse=True)[:top_n]
         result = []
         for (u, v), score in top_edges:
@@ -496,7 +500,7 @@ def suggest_questions(
                 )
 
     # 3. God nodes with many INFERRED edges → verification questions
-    degree = dict(G.degree())
+    degree = {n: distinct_neighbor_degree(G, n) for n in G.nodes()}
     top_nodes = sorted(
         [(n, d) for n, d in degree.items() if not _is_file_node(G, n)],
         key=lambda x: x[1],
@@ -533,7 +537,7 @@ def suggest_questions(
     isolated = [
         n
         for n in G.nodes()
-        if G.degree(n) <= 1 and not _is_file_node(G, n) and not _is_concept_node(G, n)
+        if distinct_neighbor_degree(G, n) <= 1 and not _is_file_node(G, n) and not _is_concept_node(G, n)
     ]
     if isolated:
         labels = [G.nodes[n].get("label", n) for n in isolated[:3]]
diff --git a/graphify/cluster.py b/graphify/cluster.py
index 04e9c2729..97edb4b04 100644
--- a/graphify/cluster.py
+++ b/graphify/cluster.py
@@ -8,6 +8,8 @@
 import sys
 import networkx as nx
 
+from graphify.projections import project_for_community
+
 
 def _suppress_output():
     """Context manager to suppress stdout/stderr during library calls.
@@ -108,7 +110,11 @@ def cluster(
     """
     if G.number_of_nodes() == 0:
         return {}
-    if G.is_directed():
+    # Project multigraphs to simple undirected graph so parallel edges
+    # don't inflate Louvain/Leiden community detection.
+    if isinstance(G, (nx.MultiGraph, nx.MultiDiGraph)):
+        G = project_for_community(G)
+    elif G.is_directed():
         G = G.to_undirected()
     if G.number_of_edges() == 0:
         return {i: [n] for i, n in enumerate(sorted(G.nodes))}
@@ -212,6 +218,9 @@ def cohesion_score(G: nx.Graph, community_nodes: list[str]) -> float:
     if n <= 1:
         return 1.0
     subgraph = G.subgraph(community_nodes)
+    # Project multigraphs to simple graph so parallel edges don't inflate cohesion
+    if isinstance(subgraph, (nx.MultiGraph, nx.MultiDiGraph)):
+        subgraph = project_for_community(subgraph)
     actual = subgraph.number_of_edges()
     possible = n * (n - 1) / 2
     return actual / possible if possible > 0 else 0.0
diff --git a/graphify/report.py b/graphify/report.py
index 7175cbc06..80c214417 100644
--- a/graphify/report.py
+++ b/graphify/report.py
@@ -5,6 +5,16 @@
 import networkx as nx
 
 
+def _edge_count_str(G: nx.Graph) -> str:
+    """Format edge count, distinguishing total records from unique pairs on multigraphs."""
+    total = G.number_of_edges()
+    if isinstance(G, (nx.MultiGraph, nx.MultiDiGraph)):
+        unique_pairs = len(set((u, v) for u, v, *_ in G.edges(keys=False)))
+        if unique_pairs < total:
+            return f"{total} edges ({unique_pairs} unique pairs)"
+    return f"{total} edges"
+
+
 def _safe_community_name(label: str) -> str:
     """Mirrors export.safe_name so community hub filenames and report wikilinks always agree."""
     cleaned = re.sub(
@@ -74,7 +84,7 @@ def generate(
     lines += [
         "",
         "## Summary",
-        f"- {G.number_of_nodes()} nodes · {G.number_of_edges()} edges · {len(communities)} communities"
+        f"- {G.number_of_nodes()} nodes · {_edge_count_str(G)} · {len(communities)} communities"
         + (
             f" ({shown_count} shown, {thin_count_summary} thin omitted)"
             if thin_count_summary
diff --git a/tests/test_analyze.py b/tests/test_analyze.py
index 64d5b768b..1cd36339f 100644
--- a/tests/test_analyze.py
+++ b/tests/test_analyze.py
@@ -812,8 +812,6 @@ def test_god_nodes_filter_is_case_insensitive():
     labels = [r["label"] for r in result]
     for variant in ("Start", "START", "Name", "ID"):
         assert variant not in labels, f"`{variant}` should be filtered as JSON-key noise"
-
-
 # ── find_import_cycles tests ──────────────────────────────────────────────────
 
 
@@ -932,3 +930,140 @@ def test_find_import_cycles_no_cycles():
     G.add_node(y_id, **y)
     G.add_edge(x_id, y_id, relation="imports_from", source_file="x.ts", confidence="EXTRACTED")
     assert find_import_cycles(G) == []
+
+
+# --- MultiDiGraph safety tests (PR 4B) ---
+
+
+def _make_multigraph_with_parallel_edges():
+    """Helper: build a MultiDiGraph with parallel edges for testing."""
+    G = nx.MultiDiGraph()
+    # Node A has 1 neighbor (B) but 5 parallel edges to it
+    G.add_node("a", label="Alpha", source_file="src/alpha.py", file_type="code")
+    G.add_node("b", label="Beta", source_file="src/beta.py", file_type="code")
+    for i in range(5):
+        G.add_edge(
+            "a",
+            "b",
+            key=f"edge_{i}",
+            relation="calls",
+            confidence="EXTRACTED",
+            source_file="src/alpha.py",
+            weight=1.0,
+        )
+    return G
+
+
+def test_god_nodes_multigraph_not_inflated():
+    """Node with 1 neighbor but 5 parallel edges must NOT be flagged as high-degree."""
+    G = _make_multigraph_with_parallel_edges()
+    # Add a genuine low-degree node so the graph isn't trivial
+    G.add_node("c", label="Gamma", source_file="src/gamma.py", file_type="code")
+    G.add_edge("b", "c", relation="calls", confidence="EXTRACTED", source_file="src/beta.py")
+
+    result = god_nodes(G, top_n=10)
+    for entry in result:
+        if entry["id"] == "a":
+            # Alpha has only 1 distinct neighbor (Beta), not 5
+            assert entry["degree"] == 1, (
+                f"god_nodes reported degree {entry['degree']} for node 'a' "
+                f"but it has only 1 distinct neighbor"
+            )
+            break
+
+
+def test_god_nodes_multigraph_real_hub_detected():
+    """Node genuinely connected to 20 different neighbors must be detected as god node."""
+    G = nx.MultiDiGraph()
+    G.add_node("hub", label="HubNode", source_file="src/hub.py", file_type="code")
+    for i in range(20):
+        nid = f"n{i}"
+        G.add_node(nid, label=f"Node{i}", source_file=f"src/n{i}.py", file_type="code")
+        G.add_edge(
+            "hub", nid, relation="calls", confidence="EXTRACTED",
+            source_file="src/hub.py", weight=1.0,
+        )
+
+    result = god_nodes(G, top_n=5)
+    result_ids = [r["id"] for r in result]
+    assert "hub" in result_ids, (
+        f"god_nodes should detect 'hub' with 20 distinct neighbors, got: {result}"
+    )
+    hub_entry = next(r for r in result if r["id"] == "hub")
+    assert hub_entry["degree"] == 20
+
+
+def test_edge_betweenness_multigraph_does_not_crash():
+    """surprising_connections on a MultiDiGraph with parallel edges must not crash."""
+    G = nx.MultiDiGraph()
+    # Two communities with intra-community edges + one bridge
+    for i in range(5):
+        G.add_node(
+            f"a{i}", label=f"A{i}", source_file="single.py",
+            file_type="code", source_location=f"L{i}",
+        )
+    for i in range(5):
+        G.add_node(
+            f"b{i}", label=f"B{i}", source_file="single.py",
+            file_type="code", source_location=f"L{i + 10}",
+        )
+    # Dense intra-community edges (some parallel)
+    for i in range(4):
+        G.add_edge(f"a{i}", f"a{i+1}", relation="calls", confidence="EXTRACTED",
+                   source_file="single.py", weight=1.0)
+        G.add_edge(f"a{i}", f"a{i+1}", relation="uses", confidence="EXTRACTED",
+                   source_file="single.py", weight=1.0)
+    for i in range(4):
+        G.add_edge(f"b{i}", f"b{i+1}", relation="calls", confidence="EXTRACTED",
+                   source_file="single.py", weight=1.0)
+    # Bridge edge
+    G.add_edge("a4", "b0", relation="references", confidence="INFERRED",
+               source_file="single.py", weight=0.5)
+
+    # Should not crash -- this is the core regression test
+    result = surprising_connections(G, communities=None)
+    assert isinstance(result, list)
+    # Every result should have 2-tuple source_files, not 3-tuple
+    for entry in result:
+        assert "source" in entry
+        assert "target" in entry
+
+
+def test_surprising_connections_multigraph_results_valid():
+    """Full integration: MultiDiGraph with communities and parallel edges."""
+    G = nx.MultiDiGraph()
+    # Community 1: nodes in file1.py
+    for i in range(5):
+        G.add_node(
+            f"c1_{i}", label=f"C1_{i}", source_file="repo/file1.py",
+            file_type="code", source_location=f"L{i}",
+        )
+    # Community 2: nodes in file2.py
+    for i in range(5):
+        G.add_node(
+            f"c2_{i}", label=f"C2_{i}", source_file="repo/file2.py",
+            file_type="code", source_location=f"L{i}",
+        )
+    # Intra-community edges with parallel edges
+    for i in range(4):
+        G.add_edge(f"c1_{i}", f"c1_{i+1}", relation="calls", confidence="EXTRACTED",
+                   source_file="repo/file1.py", weight=1.0)
+        G.add_edge(f"c1_{i}", f"c1_{i+1}", relation="uses", confidence="INFERRED",
+                   source_file="repo/file1.py", weight=0.5)
+    for i in range(4):
+        G.add_edge(f"c2_{i}", f"c2_{i+1}", relation="calls", confidence="EXTRACTED",
+                   source_file="repo/file2.py", weight=1.0)
+    # Cross-community bridge with parallel edges
+    G.add_edge("c1_4", "c2_0", relation="references", confidence="AMBIGUOUS",
+               source_file="repo/file1.py", weight=0.3)
+    G.add_edge("c1_4", "c2_0", relation="calls", confidence="INFERRED",
+               source_file="repo/file1.py", weight=0.5)
+
+    communities = cluster(G)
+    result = surprising_connections(G, communities)
+    assert isinstance(result, list)
+    # Should detect cross-community or cross-file edges without crashing
+    for entry in result:
+        assert "source" in entry
+        assert "target" in entry
+        assert "confidence" in entry
diff --git a/tests/test_cluster.py b/tests/test_cluster.py
index 514e4aea0..e63d11854 100644
--- a/tests/test_cluster.py
+++ b/tests/test_cluster.py
@@ -105,3 +105,87 @@ def test_remap_communities_to_previous_assigns_deterministic_new_ids():
     assert list(remapped.keys()) == [0, 1]
     assert remapped[0] == ["x", "y", "z"]
     assert remapped[1] == ["m"]
+
+
+# --- MultiDiGraph safety tests (PR 4B) ---
+
+
+def _make_multigraph_triangle():
+    """MultiDiGraph with nodes {a, b, c}: 5 parallel edges a->b, 3 parallel edges b->c."""
+    G = nx.MultiDiGraph()
+    G.add_nodes_from(["a", "b", "c"])
+    for i in range(5):
+        G.add_edge("a", "b", key=f"ab-{i}", relation=f"rel-{i}")
+    for i in range(3):
+        G.add_edge("b", "c", key=f"bc-{i}", relation=f"rel-{i}")
+    return G
+
+
+def test_cohesion_multigraph_stays_bounded():
+    """Cohesion must be <= 1.0 even when parallel edges outnumber unique pairs."""
+    G = _make_multigraph_triangle()
+    # 3 nodes, 8 total edge records, but only 2 unique pairs -> must not exceed 1.0
+    score = cohesion_score(G, ["a", "b", "c"])
+    assert score <= 1.0, f"cohesion {score} exceeds 1.0 on multigraph"
+    assert score >= 0.0
+
+
+def test_cohesion_multigraph_equals_simple_graph_cohesion():
+    """Cohesion on a multigraph should equal cohesion on the equivalent simple graph."""
+    # Build a MultiDiGraph: a-b, b-c, a-c each with 3 parallel edges
+    MG = nx.MultiDiGraph()
+    MG.add_nodes_from(["a", "b", "c"])
+    for pair in [("a", "b"), ("b", "c"), ("a", "c")]:
+        for i in range(3):
+            MG.add_edge(pair[0], pair[1], key=f"{pair[0]}{pair[1]}-{i}")
+
+    # Build equivalent simple graph: a-b, b-c, a-c (1 edge each)
+    SG = nx.Graph()
+    SG.add_nodes_from(["a", "b", "c"])
+    SG.add_edge("a", "b")
+    SG.add_edge("b", "c")
+    SG.add_edge("a", "c")
+
+    multi_score = cohesion_score(MG, ["a", "b", "c"])
+    simple_score = cohesion_score(SG, ["a", "b", "c"])
+    assert multi_score == simple_score, (
+        f"multi={multi_score} != simple={simple_score}"
+    )
+
+
+def test_cluster_multigraph_produces_valid_communities():
+    """cluster() on a MultiDiGraph with clear community structure should detect communities."""
+    G = nx.MultiDiGraph()
+    # Two triangles connected by a weak bridge, with parallel edges and
+    # confidence data so projected weights are non-zero (avoids graspologic
+    # zero-weight panic in some versions).
+    for pair in [("a", "b"), ("b", "c"), ("a", "c")]:
+        for k in range(3):
+            G.add_edge(pair[0], pair[1], key=f"{pair[0]}{pair[1]}-{k}",
+                       confidence="EXTRACTED")
+    for pair in [("d", "e"), ("e", "f"), ("d", "f")]:
+        for k in range(3):
+            G.add_edge(pair[0], pair[1], key=f"{pair[0]}{pair[1]}-{k}",
+                       confidence="EXTRACTED")
+    G.add_edge("c", "d", key="bridge", confidence="AMBIGUOUS")
+
+    communities = cluster(G)
+    assert isinstance(communities, dict)
+    assert len(communities) > 0
+    all_nodes = {n for nodes in communities.values() for n in nodes}
+    assert all_nodes == set(G.nodes), "Not all nodes assigned to communities"
+
+
+def test_cluster_multigraph_does_not_crash():
+    """Smoke test: cluster() on a MultiDiGraph with parallel edges must not raise."""
+    G = nx.MultiDiGraph()
+    nodes = ["a", "b", "c", "d", "e"]
+    G.add_nodes_from(nodes)
+    for i in range(len(nodes)):
+        for j in range(i + 1, min(i + 3, len(nodes))):
+            for k in range(4):
+                G.add_edge(nodes[i], nodes[j], key=f"{nodes[i]}-{nodes[j]}-{k}",
+                           confidence="EXTRACTED")
+    # Must not raise
+    communities = cluster(G)
+    assert isinstance(communities, dict)
diff --git a/tests/test_report.py b/tests/test_report.py
index d9b9253d6..8fccf296c 100644
--- a/tests/test_report.py
+++ b/tests/test_report.py
@@ -1,5 +1,6 @@
 import json
 from pathlib import Path
+import networkx as nx
 from graphify.build import build_from_json
 from graphify.cluster import cluster, score_all
 from graphify.analyze import god_nodes, surprising_connections
@@ -95,3 +96,90 @@ def test_report_shows_raw_cohesion_scores():
     assert "Cohesion:" in report
     assert "✓" not in report
     assert "⚠" not in report
+
+
+# --- Helpers for edge-count tests ---
+
+def _minimal_report(G):
+    """Generate a report from a graph with minimal scaffolding."""
+    communities = {0: list(G.nodes())}
+    cohesion = {0: 0.5}
+    labels = {0: "Test Community"}
+    god_list = [{"id": n, "label": n, "degree": G.degree(n)} for n in list(G.nodes())[:3]]
+    surprise_list = []
+    detection = {"total_files": 1, "total_words": 100, "needs_graph": True, "warning": None}
+    tokens = {"input": 100, "output": 50}
+    return generate(
+        G, communities, cohesion, labels, god_list, surprise_list, detection, tokens, "./test",
+        min_community_size=1,
+    )
+
+
+# --- PR 4B: Edge count reporting tests ---
+
+def test_report_multigraph_edge_count_distinguishes_pairs():
+    """MultiDiGraph with parallel edges: report must show both total and unique pair count."""
+    G = nx.MultiDiGraph()
+    G.add_nodes_from(["A", "B", "C", "D"], label="x", type="entity")
+    # 3 unique pairs, 8 total edges
+    G.add_edge("A", "B", relation="calls", confidence="EXTRACTED")
+    G.add_edge("A", "B", relation="imports", confidence="EXTRACTED")
+    G.add_edge("A", "B", relation="uses", confidence="EXTRACTED")
+    G.add_edge("B", "C", relation="calls", confidence="EXTRACTED")
+    G.add_edge("B", "C", relation="imports", confidence="EXTRACTED")
+    G.add_edge("B", "C", relation="uses", confidence="EXTRACTED")
+    G.add_edge("C", "D", relation="calls", confidence="EXTRACTED")
+    G.add_edge("C", "D", relation="imports", confidence="EXTRACTED")
+    assert G.number_of_edges() == 8
+    report = _minimal_report(G)
+    assert "8 edges (3 unique pairs)" in report
+
+
+def test_report_simple_graph_edge_count_unchanged():
+    """Simple DiGraph: report must show just 'X edges' without unique-pairs qualifier."""
+    G = nx.DiGraph()
+    G.add_nodes_from(["A", "B", "C"], label="x", type="entity")
+    G.add_edge("A", "B", relation="calls", confidence="EXTRACTED")
+    G.add_edge("B", "C", relation="calls", confidence="EXTRACTED")
+    G.add_edge("A", "C", relation="calls", confidence="EXTRACTED")
+    report = _minimal_report(G)
+    assert "3 edges" in report
+    assert "unique pairs" not in report
+
+
+def test_report_multigraph_no_parallel_just_shows_total():
+    """MultiDiGraph with no actual parallel edges: show just 'X edges', no redundant qualifier."""
+    G = nx.MultiDiGraph()
+    G.add_nodes_from(["A", "B", "C"], label="x", type="entity")
+    G.add_edge("A", "B", relation="calls", confidence="EXTRACTED")
+    G.add_edge("B", "C", relation="calls", confidence="EXTRACTED")
+    G.add_edge("A", "C", relation="calls", confidence="EXTRACTED")
+    assert G.number_of_edges() == 3
+    report = _minimal_report(G)
+    assert "3 edges" in report
+    assert "unique pairs" not in report
+
+
+def test_report_god_node_degree_not_inflated():
+    """God-node degree should reflect unique neighbors, not parallel edge count.
+
+    analyze.god_nodes() already uses distinct_neighbor_degree(), so the degree
+    value in the report should equal the neighbor count, not the edge count.
+    """
+    G = nx.MultiDiGraph()
+    # Nodes need source_file with an extension to avoid being filtered as concept nodes
+    attrs = {"label": "hub", "type": "entity", "source_file": "test.py"}
+    G.add_node("hub", **attrs)
+    for name in ["A", "B", "C"]:
+        G.add_node(name, label=name, type="entity", source_file="test.py")
+    # hub -> A: 4 parallel edges, hub -> B: 3, hub -> C: 3 = 10 total, 3 unique neighbors
+    for _ in range(4):
+        G.add_edge("hub", "A", relation="calls", confidence="EXTRACTED")
+    for _ in range(3):
+        G.add_edge("hub", "B", relation="calls", confidence="EXTRACTED")
+    for _ in range(3):
+        G.add_edge("hub", "C", relation="calls", confidence="EXTRACTED")
+    assert G.number_of_edges() == 10
+    gods = god_nodes(G)
+    hub_entry = next(g for g in gods if g["label"] == "hub")
+    assert hub_entry["degree"] == 3, f"Expected 3 unique neighbors, got {hub_entry['degree']}"

From cab196454574c6bcdd2f1deae6a2e92ff530f1de Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Fri, 29 May 2026 00:46:39 -0500
Subject: [PATCH 05/21] style: format rebased multigraph stack

---
 graphify/analyze.py      |   4 +-
 graphify/extract.py      | 672 +++++++++++++++++++++++++++------------
 graphify/llm.py          |  14 +-
 tests/test_analyze.py    | 126 ++++++--
 tests/test_build.py      |  23 +-
 tests/test_cluster.py    |  15 +-
 tests/test_languages.py  |  52 ++-
 tests/test_llm_parser.py |  37 +--
 tests/test_multilang.py  |  26 +-
 tests/test_rationale.py  |  29 +-
 tests/test_report.py     |  12 +-
 11 files changed, 705 insertions(+), 305 deletions(-)

diff --git a/graphify/analyze.py b/graphify/analyze.py
index db556ac07..c94406027 100644
--- a/graphify/analyze.py
+++ b/graphify/analyze.py
@@ -537,7 +537,9 @@ def suggest_questions(
     isolated = [
         n
         for n in G.nodes()
-        if distinct_neighbor_degree(G, n) <= 1 and not _is_file_node(G, n) and not _is_concept_node(G, n)
+        if distinct_neighbor_degree(G, n) <= 1
+        and not _is_file_node(G, n)
+        and not _is_concept_node(G, n)
     ]
     if isolated:
         labels = [G.nodes[n].get("label", n) for n in isolated[:3]]
diff --git a/graphify/extract.py b/graphify/extract.py
index 3a7222b66..5bd556c87 100644
--- a/graphify/extract.py
+++ b/graphify/extract.py
@@ -820,11 +820,32 @@ def _java_method_annotation_names(method_node, source: bytes) -> list[str]:
     return names
 
 
-_GO_PREDECLARED_TYPES = frozenset({
-    "bool", "byte", "complex64", "complex128", "error", "float32", "float64",
-    "int", "int8", "int16", "int32", "int64", "rune", "string",
-    "uint", "uint8", "uint16", "uint32", "uint64", "uintptr", "any", "comparable",
-})
+_GO_PREDECLARED_TYPES = frozenset(
+    {
+        "bool",
+        "byte",
+        "complex64",
+        "complex128",
+        "error",
+        "float32",
+        "float64",
+        "int",
+        "int8",
+        "int16",
+        "int32",
+        "int64",
+        "rune",
+        "string",
+        "uint",
+        "uint8",
+        "uint16",
+        "uint32",
+        "uint64",
+        "uintptr",
+        "any",
+        "comparable",
+    }
+)
 
 
 def _go_collect_type_refs(node, source: bytes, generic: bool, out: list[tuple[str, str]]) -> None:
@@ -854,8 +875,14 @@ def _go_collect_type_refs(node, source: bytes, generic: bool, out: list[tuple[st
                     if arg.is_named:
                         _go_collect_type_refs(arg, source, True, out)
         return
-    if t in ("pointer_type", "slice_type", "array_type", "map_type",
-             "channel_type", "parenthesized_type"):
+    if t in (
+        "pointer_type",
+        "slice_type",
+        "array_type",
+        "map_type",
+        "channel_type",
+        "parenthesized_type",
+    ):
         for c in node.children:
             if c.is_named:
                 _go_collect_type_refs(c, source, generic, out)
@@ -957,8 +984,14 @@ def _php_method_return_type_node(method_node):
             saw_params = True
             continue
         if saw_params and c.is_named and c.type not in ("compound_statement",):
-            if c.type in ("named_type", "primitive_type", "nullable_type",
-                          "union_type", "intersection_type", "optional_type"):
+            if c.type in (
+                "named_type",
+                "primitive_type",
+                "nullable_type",
+                "union_type",
+                "intersection_type",
+                "optional_type",
+            ):
                 return c
     return None
 
@@ -982,7 +1015,9 @@ def _kotlin_user_type_name(user_type_node, source: bytes) -> str | None:
     return None
 
 
-def _kotlin_collect_type_refs(node, source: bytes, generic: bool, out: list[tuple[str, str]]) -> None:
+def _kotlin_collect_type_refs(
+    node, source: bytes, generic: bool, out: list[tuple[str, str]]
+) -> None:
     """Walk a Kotlin type expression; append (name, role) tuples."""
     if node is None:
         return
@@ -1097,8 +1132,9 @@ def _swift_pre_scan(root_node, source: bytes) -> tuple[set[str], set[str]]:
     return protocols, classes
 
 
-def _swift_classify_base(name: str, kind: str | None, is_first: bool,
-                          protocols: set[str], classes: set[str]) -> str:
+def _swift_classify_base(
+    name: str, kind: str | None, is_first: bool, protocols: set[str], classes: set[str]
+) -> str:
     """Classify a Swift inheritance_specifier entry as `inherits` or `implements`."""
     if name in protocols:
         return "implements"
@@ -1122,7 +1158,9 @@ def _swift_user_type_name(user_type_node, source: bytes) -> str | None:
     return None
 
 
-def _swift_collect_type_refs(node, source: bytes, generic: bool, out: list[tuple[str, str]]) -> None:
+def _swift_collect_type_refs(
+    node, source: bytes, generic: bool, out: list[tuple[str, str]]
+) -> None:
     """Walk a Swift type expression; append (name, role) tuples (role 'type' or 'generic_arg')."""
     if node is None:
         return
@@ -1150,8 +1188,13 @@ def _swift_collect_type_refs(node, source: bytes, generic: bool, out: list[tuple
         if text:
             out.append((text, "generic_arg" if generic else "type"))
         return
-    if t in ("optional_type", "implicitly_unwrapped_optional_type", "array_type",
-             "dictionary_type", "tuple_type"):
+    if t in (
+        "optional_type",
+        "implicitly_unwrapped_optional_type",
+        "array_type",
+        "dictionary_type",
+        "tuple_type",
+    ):
         for c in node.children:
             if c.is_named:
                 _swift_collect_type_refs(c, source, generic, out)
@@ -1172,9 +1215,14 @@ def _swift_property_type_node(property_node):
 
 # ── C / C++ type-ref helpers ─────────────────────────────────────────────────
 
-_C_PRIMITIVE_TYPE_NODES = frozenset({
-    "primitive_type", "sized_type_specifier", "auto", "placeholder_type_specifier",
-})
+_C_PRIMITIVE_TYPE_NODES = frozenset(
+    {
+        "primitive_type",
+        "sized_type_specifier",
+        "auto",
+        "placeholder_type_specifier",
+    }
+)
 
 
 def _c_collect_type_refs(node, source: bytes, generic: bool, out: list[tuple[str, str]]) -> None:
@@ -1188,9 +1236,16 @@ def _c_collect_type_refs(node, source: bytes, generic: bool, out: list[tuple[str
         if text:
             out.append((text, "generic_arg" if generic else "type"))
         return
-    if t in ("pointer_declarator", "reference_declarator", "array_declarator",
-             "type_qualifier", "type_descriptor", "abstract_pointer_declarator",
-             "abstract_reference_declarator", "abstract_array_declarator"):
+    if t in (
+        "pointer_declarator",
+        "reference_declarator",
+        "array_declarator",
+        "type_qualifier",
+        "type_descriptor",
+        "abstract_pointer_declarator",
+        "abstract_reference_declarator",
+        "abstract_array_declarator",
+    ):
         for c in node.children:
             if c.is_named:
                 _c_collect_type_refs(c, source, generic, out)
@@ -1225,9 +1280,16 @@ def _cpp_collect_type_refs(node, source: bytes, generic: bool, out: list[tuple[s
                 if c.is_named:
                     _cpp_collect_type_refs(c, source, True, out)
         return
-    if t in ("type_descriptor", "pointer_declarator", "reference_declarator",
-             "array_declarator", "type_qualifier", "abstract_pointer_declarator",
-             "abstract_reference_declarator", "abstract_array_declarator"):
+    if t in (
+        "type_descriptor",
+        "pointer_declarator",
+        "reference_declarator",
+        "array_declarator",
+        "type_qualifier",
+        "abstract_pointer_declarator",
+        "abstract_reference_declarator",
+        "abstract_array_declarator",
+    ):
         for c in node.children:
             if c.is_named:
                 _cpp_collect_type_refs(c, source, generic, out)
@@ -1235,7 +1297,10 @@ def _cpp_collect_type_refs(node, source: bytes, generic: bool, out: list[tuple[s
 
 # ── Scala type-ref helpers ───────────────────────────────────────────────────
 
-def _scala_collect_type_refs(node, source: bytes, generic: bool, out: list[tuple[str, str]]) -> None:
+
+def _scala_collect_type_refs(
+    node, source: bytes, generic: bool, out: list[tuple[str, str]]
+) -> None:
     """Walk a Scala type expression; append (name, role) tuples.
     Handles type_identifier, generic_type (List[T]), and common type wrappers."""
     if node is None:
@@ -1263,8 +1328,14 @@ def _scala_collect_type_refs(node, source: bytes, generic: bool, out: list[tuple
                     if arg.is_named:
                         _scala_collect_type_refs(arg, source, True, out)
         return
-    if t in ("compound_type", "infix_type", "function_type", "tuple_type",
-             "annotated_type", "projected_type"):
+    if t in (
+        "compound_type",
+        "infix_type",
+        "function_type",
+        "tuple_type",
+        "annotated_type",
+        "projected_type",
+    ):
         for c in node.children:
             if c.is_named:
                 _scala_collect_type_refs(c, source, generic, out)
@@ -2562,7 +2633,9 @@ def walk(node, parent_class_nid: str | None = None) -> None:
 
             # Swift-specific: conformance / inheritance
             if config.ts_module == "tree_sitter_swift":
-                swift_kind = _swift_declaration_keyword(node) if t == "class_declaration" else "protocol"
+                swift_kind = (
+                    _swift_declaration_keyword(node) if t == "class_declaration" else "protocol"
+                )
                 seen_swift_base = False
                 for child in node.children:
                     if child.type != "inheritance_specifier":
@@ -2583,20 +2656,25 @@ def walk(node, parent_class_nid: str | None = None) -> None:
                     if base_nid not in seen_ids:
                         base_nid = _make_id(base_name)
                         if base_nid not in seen_ids:
-                            nodes.append({
-                                "id": base_nid,
-                                "label": base_name,
-                                "file_type": "code",
-                                "source_file": "",
-                                "source_location": "",
-                            })
+                            nodes.append(
+                                {
+                                    "id": base_nid,
+                                    "label": base_name,
+                                    "file_type": "code",
+                                    "source_file": "",
+                                    "source_location": "",
+                                }
+                            )
                             seen_ids.add(base_nid)
                     if t == "protocol_declaration":
                         relation = "inherits"
                     else:
                         relation = _swift_classify_base(
-                            base_name, swift_kind, not seen_swift_base,
-                            swift_protocol_names, swift_class_names,
+                            base_name,
+                            swift_kind,
+                            not seen_swift_base,
+                            swift_protocol_names,
+                            swift_class_names,
                         )
                     seen_swift_base = True
                     add_edge(class_nid, base_nid, relation, line)
@@ -2611,11 +2689,13 @@ def walk(node, parent_class_nid: str | None = None) -> None:
                                 _swift_collect_type_refs(arg, source, True, refs)
                                 for ref_name, _role in refs:
                                     target = ensure_named_node(ref_name, line)
-                                    add_edge(class_nid, target, "references", line,
-                                             context="generic_arg")
+                                    add_edge(
+                                        class_nid, target, "references", line, context="generic_arg"
+                                    )
 
             # PHP-specific: extends → inherits, implements → implements, use → mixes_in
             if config.ts_module == "tree_sitter_php":
+
                 def _php_emit_base(base_name: str, rel: str, at_line: int) -> None:
                     if not base_name:
                         return
@@ -2623,13 +2703,15 @@ def _php_emit_base(base_name: str, rel: str, at_line: int) -> None:
                     if base_nid not in seen_ids:
                         base_nid = _make_id(base_name)
                         if base_nid not in seen_ids:
-                            nodes.append({
-                                "id": base_nid,
-                                "label": base_name,
-                                "file_type": "code",
-                                "source_file": "",
-                                "source_location": "",
-                            })
+                            nodes.append(
+                                {
+                                    "id": base_nid,
+                                    "label": base_name,
+                                    "file_type": "code",
+                                    "source_file": "",
+                                    "source_location": "",
+                                }
+                            )
                             seen_ids.add(base_nid)
                     add_edge(class_nid, base_nid, rel, at_line)
 
@@ -2637,13 +2719,19 @@ def _php_emit_base(base_name: str, rel: str, at_line: int) -> None:
                     if child.type == "base_clause":
                         for sub in child.children:
                             if sub.type in ("name", "qualified_name"):
-                                _php_emit_base(_php_name_text(sub, source) or "",
-                                                "inherits", child.start_point[0] + 1)
+                                _php_emit_base(
+                                    _php_name_text(sub, source) or "",
+                                    "inherits",
+                                    child.start_point[0] + 1,
+                                )
                     elif child.type == "class_interface_clause":
                         for sub in child.children:
                             if sub.type in ("name", "qualified_name"):
-                                _php_emit_base(_php_name_text(sub, source) or "",
-                                                "implements", child.start_point[0] + 1)
+                                _php_emit_base(
+                                    _php_name_text(sub, source) or "",
+                                    "implements",
+                                    child.start_point[0] + 1,
+                                )
                 body = node.child_by_field_name("body")
                 if body is None:
                     for c in node.children:
@@ -2656,8 +2744,11 @@ def _php_emit_base(base_name: str, rel: str, at_line: int) -> None:
                             continue
                         for sub in member.children:
                             if sub.type in ("name", "qualified_name"):
-                                _php_emit_base(_php_name_text(sub, source) or "",
-                                                "mixes_in", member.start_point[0] + 1)
+                                _php_emit_base(
+                                    _php_name_text(sub, source) or "",
+                                    "mixes_in",
+                                    member.start_point[0] + 1,
+                                )
 
             # Kotlin-specific: delegation_specifiers → inherits (constructor_invocation) / implements (user_type)
             if config.ts_module == "tree_sitter_kotlin":
@@ -2689,13 +2780,15 @@ def _php_emit_base(base_name: str, rel: str, at_line: int) -> None:
                         if base_nid not in seen_ids:
                             base_nid = _make_id(base)
                             if base_nid not in seen_ids:
-                                nodes.append({
-                                    "id": base_nid,
-                                    "label": base,
-                                    "file_type": "code",
-                                    "source_file": "",
-                                    "source_location": "",
-                                })
+                                nodes.append(
+                                    {
+                                        "id": base_nid,
+                                        "label": base,
+                                        "file_type": "code",
+                                        "source_file": "",
+                                        "source_location": "",
+                                    }
+                                )
                                 seen_ids.add(base_nid)
                         add_edge(class_nid, base_nid, relation, line)
                         for arg_child in user_type_node.children:
@@ -2710,8 +2803,13 @@ def _php_emit_base(base_name: str, rel: str, at_line: int) -> None:
                                         _kotlin_collect_type_refs(inner, source, True, refs)
                                         for ref_name, _role in refs:
                                             target = ensure_named_node(ref_name, line)
-                                            add_edge(class_nid, target, "references", line,
-                                                     context="generic_arg")
+                                            add_edge(
+                                                class_nid,
+                                                target,
+                                                "references",
+                                                line,
+                                                context="generic_arg",
+                                            )
 
             # C#-specific: inheritance / interface implementation via base_list
             if config.ts_module == "tree_sitter_c_sharp":
@@ -2863,8 +2961,7 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
                             ctx = "generic_arg" if role == "generic_arg" else "field"
                             target_nid = ensure_named_node(ref_name, cp_line)
                             if target_nid != class_nid:
-                                add_edge(class_nid, target_nid, "references",
-                                         cp_line, context=ctx)
+                                add_edge(class_nid, target_nid, "references", cp_line, context=ctx)
 
             # C++-specific: inheritance via base_class_clause (class and struct).
             # tree-sitter-cpp shape:
@@ -2921,9 +3018,7 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
             return
 
         # Event listener property arrays: $listen = [Event::class => [Listener::class]]
-        if (t == "property_declaration"
-                and parent_class_nid
-                and config.event_listener_properties):
+        if t == "property_declaration" and parent_class_nid and config.event_listener_properties:
             handled_event_listener = False
             for element in node.children:
                 if element.type != "property_element":
@@ -3000,12 +3095,20 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
                 )
             return
 
-        if (config.ts_module == "tree_sitter_php"
-                and t == "property_declaration"
-                and parent_class_nid):
+        if (
+            config.ts_module == "tree_sitter_php"
+            and t == "property_declaration"
+            and parent_class_nid
+        ):
             for c in node.children:
-                if c.type not in ("named_type", "primitive_type", "nullable_type",
-                                   "union_type", "intersection_type", "optional_type"):
+                if c.type not in (
+                    "named_type",
+                    "primitive_type",
+                    "nullable_type",
+                    "union_type",
+                    "intersection_type",
+                    "optional_type",
+                ):
                     continue
                 line = node.start_point[0] + 1
                 refs: list[tuple[str, str]] = []
@@ -3018,9 +3121,11 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
                 break
             return
 
-        if (config.ts_module == "tree_sitter_kotlin"
-                and t == "property_declaration"
-                and parent_class_nid):
+        if (
+            config.ts_module == "tree_sitter_kotlin"
+            and t == "property_declaration"
+            and parent_class_nid
+        ):
             type_node = _kotlin_property_type_node(node)
             if type_node is not None:
                 line = node.start_point[0] + 1
@@ -3033,9 +3138,11 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
                         add_edge(parent_class_nid, target_nid, "references", line, context=ctx)
             return
 
-        if (config.ts_module == "tree_sitter_swift"
-                and t == "property_declaration"
-                and parent_class_nid):
+        if (
+            config.ts_module == "tree_sitter_swift"
+            and t == "property_declaration"
+            and parent_class_nid
+        ):
             type_anno = _swift_property_type_node(node)
             if type_anno is not None:
                 line = node.start_point[0] + 1
@@ -3048,9 +3155,7 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
                         add_edge(parent_class_nid, target_nid, "references", line, context=ctx)
             return
 
-        if (config.ts_module == "tree_sitter_scala"
-                and t == "val_definition"
-                and parent_class_nid):
+        if config.ts_module == "tree_sitter_scala" and t == "val_definition" and parent_class_nid:
             type_node = node.child_by_field_name("type")
             if type_node is not None:
                 line = node.start_point[0] + 1
@@ -3060,20 +3165,19 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
                     ctx = "generic_arg" if role == "generic_arg" else "field"
                     target_nid = ensure_named_node(ref_name, line)
                     if target_nid != parent_class_nid:
-                        add_edge(parent_class_nid, target_nid, "references",
-                                 line, context=ctx)
+                        add_edge(parent_class_nid, target_nid, "references", line, context=ctx)
             # fall through so any call expressions in the initializer get walked
 
-        if (config.ts_module == "tree_sitter_cpp"
-                and t == "field_declaration"
-                and parent_class_nid):
+        if config.ts_module == "tree_sitter_cpp" and t == "field_declaration" and parent_class_nid:
             # Skip method prototypes (field_declaration with a function_declarator
             # is a member-function declaration, not a data member).
             decls = list(node.children_by_field_name("declarator"))
             is_method = any(
                 d.type == "function_declarator"
-                or (d.type in ("pointer_declarator", "reference_declarator")
-                    and any(c.type == "function_declarator" for c in d.children))
+                or (
+                    d.type in ("pointer_declarator", "reference_declarator")
+                    and any(c.type == "function_declarator" for c in d.children)
+                )
                 for d in decls
             )
             if not is_method:
@@ -3086,8 +3190,7 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
                         ctx = "generic_arg" if role == "generic_arg" else "field"
                         target_nid = ensure_named_node(ref_name, line)
                         if target_nid != parent_class_nid:
-                            add_edge(parent_class_nid, target_nid, "references",
-                                     line, context=ctx)
+                            add_edge(parent_class_nid, target_nid, "references", line, context=ctx)
             # Emit a node for each data member. Use children_by_field_name so we
             # only visit declarator children, not the type node (which would give
             # us the type name, not the field name). Handles int x, y; via
@@ -3226,8 +3329,14 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
                             continue
                         type_node = None
                         for sub in p.children:
-                            if sub.type in ("named_type", "primitive_type", "nullable_type",
-                                             "union_type", "intersection_type", "optional_type"):
+                            if sub.type in (
+                                "named_type",
+                                "primitive_type",
+                                "nullable_type",
+                                "union_type",
+                                "intersection_type",
+                                "optional_type",
+                            ):
                                 type_node = sub
                                 break
                         refs: list[tuple[str, str]] = []
@@ -3302,8 +3411,11 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
                             add_edge(func_nid, target_nid, "references", line, context=ctx)
 
             if config.ts_module in ("tree_sitter_c", "tree_sitter_cpp"):
-                collect = (_cpp_collect_type_refs if config.ts_module == "tree_sitter_cpp"
-                           else _c_collect_type_refs)
+                collect = (
+                    _cpp_collect_type_refs
+                    if config.ts_module == "tree_sitter_cpp"
+                    else _c_collect_type_refs
+                )
                 return_node = node.child_by_field_name("type")
                 if return_node is not None:
                     refs: list[tuple[str, str]] = []
@@ -3316,7 +3428,9 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
                 # function_declarator may be wrapped in pointer/reference declarators
                 decl = node.child_by_field_name("declarator")
                 while decl is not None and decl.type in (
-                        "pointer_declarator", "reference_declarator"):
+                    "pointer_declarator",
+                    "reference_declarator",
+                ):
                     decl = decl.child_by_field_name("declarator")
                 if decl is not None and decl.type == "function_declarator":
                     params_node = decl.child_by_field_name("parameters")
@@ -3333,8 +3447,7 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
                                 ctx = "generic_arg" if role == "generic_arg" else "parameter_type"
                                 target_nid = ensure_named_node(ref_name, line)
                                 if target_nid != func_nid:
-                                    add_edge(func_nid, target_nid, "references",
-                                             line, context=ctx)
+                                    add_edge(func_nid, target_nid, "references", line, context=ctx)
 
             if config.ts_module == "tree_sitter_scala":
                 params_node = None
@@ -3355,8 +3468,7 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
                             ctx = "generic_arg" if role == "generic_arg" else "parameter_type"
                             target_nid = ensure_named_node(ref_name, line)
                             if target_nid != func_nid:
-                                add_edge(func_nid, target_nid, "references",
-                                         line, context=ctx)
+                                add_edge(func_nid, target_nid, "references", line, context=ctx)
                 return_node = node.child_by_field_name("return_type")
                 if return_node is not None:
                     refs = []
@@ -3365,8 +3477,7 @@ def _emit_java_parent(base_name: str, rel: str, at_line: int) -> None:
                         ctx = "generic_arg" if role == "generic_arg" else "return_type"
                         target_nid = ensure_named_node(ref_name, line)
                         if target_nid != func_nid:
-                            add_edge(func_nid, target_nid, "references",
-                                     line, context=ctx)
+                            add_edge(func_nid, target_nid, "references", line, context=ctx)
 
             body = _find_body(node, config)
             if body:
@@ -5299,8 +5410,13 @@ def walk(node, scope_nid: str) -> None:
             add_node(struct_nid, struct_name, line)
             add_edge(scope_nid, struct_nid, "defines", line)
             if super_name:
-                add_edge(struct_nid, ensure_named_node(super_name, line),
-                         "inherits", line, confidence="EXTRACTED")
+                add_edge(
+                    struct_nid,
+                    ensure_named_node(super_name, line),
+                    "inherits",
+                    line,
+                    confidence="EXTRACTED",
+                )
             # Field types: each `name::Type` lowers to a typed_expression child of struct_definition
             for child in node.children:
                 if child.type == "typed_expression":
@@ -5309,8 +5425,11 @@ def walk(node, scope_nid: str) -> None:
                         field_line = child.start_point[0] + 1
                         type_name = _read_text(type_ids[-1], source)
                         type_nid = ensure_named_node(type_name, field_line)
-                        edges.append(_semantic_reference_edge(
-                            struct_nid, type_nid, "field", str_path, field_line))
+                        edges.append(
+                            _semantic_reference_edge(
+                                struct_nid, type_nid, "field", str_path, field_line
+                            )
+                        )
             return
 
         # Abstract type
@@ -5886,9 +6005,7 @@ def walk(node) -> None:
                         for field in fdl.children:
                             if field.type != "field_declaration":
                                 continue
-                            has_name = any(
-                                fc.type == "field_identifier" for fc in field.children
-                            )
+                            has_name = any(fc.type == "field_identifier" for fc in field.children)
                             type_node = field.child_by_field_name("type")
                             if type_node is None:
                                 for fc in field.children:
@@ -5902,12 +6019,16 @@ def walk(node) -> None:
                                 if tgt == type_nid:
                                     continue
                                 if not has_name and role == "type":
-                                    add_edge(type_nid, tgt, "embeds",
-                                             field.start_point[0] + 1)
+                                    add_edge(type_nid, tgt, "embeds", field.start_point[0] + 1)
                                 else:
                                     ctx = "generic_arg" if role == "generic_arg" else "field"
-                                    add_edge(type_nid, tgt, "references",
-                                             field.start_point[0] + 1, context=ctx)
+                                    add_edge(
+                                        type_nid,
+                                        tgt,
+                                        "references",
+                                        field.start_point[0] + 1,
+                                        context=ctx,
+                                    )
                 elif type_body.type == "interface_type":
                     for elem in type_body.children:
                         if elem.type != "type_elem":
@@ -5921,11 +6042,15 @@ def walk(node) -> None:
                             if tgt == type_nid:
                                 continue
                             if role == "type":
-                                add_edge(type_nid, tgt, "embeds",
-                                         elem.start_point[0] + 1)
+                                add_edge(type_nid, tgt, "embeds", elem.start_point[0] + 1)
                             else:
-                                add_edge(type_nid, tgt, "references",
-                                         elem.start_point[0] + 1, context="generic_arg")
+                                add_edge(
+                                    type_nid,
+                                    tgt,
+                                    "references",
+                                    elem.start_point[0] + 1,
+                                    context="generic_arg",
+                                )
             return
 
         if t == "import_declaration":
@@ -6244,8 +6369,9 @@ def walk(node, parent_impl_nid: str | None = None) -> None:
                                 if rel == "inherits":
                                     add_edge(item_nid, tgt, "inherits", line)
                                 else:
-                                    add_edge(item_nid, tgt, "references", line,
-                                             context="generic_arg")
+                                    add_edge(
+                                        item_nid, tgt, "references", line, context="generic_arg"
+                                    )
                 if t == "struct_item":
                     for c in node.children:
                         if c.type != "field_declaration_list":
@@ -6256,9 +6382,13 @@ def walk(node, parent_impl_nid: str | None = None) -> None:
                             type_node = field.child_by_field_name("type")
                             if type_node is None:
                                 for fc in field.children:
-                                    if fc.type in ("type_identifier", "generic_type",
-                                                    "scoped_type_identifier",
-                                                    "reference_type", "primitive_type"):
+                                    if fc.type in (
+                                        "type_identifier",
+                                        "generic_type",
+                                        "scoped_type_identifier",
+                                        "reference_type",
+                                        "primitive_type",
+                                    ):
                                         type_node = fc
                                         break
                             refs = []
@@ -6267,8 +6397,13 @@ def walk(node, parent_impl_nid: str | None = None) -> None:
                                 ctx = "generic_arg" if role == "generic_arg" else "field"
                                 tgt = ensure_named_node(ref_name, field.start_point[0] + 1)
                                 if tgt != item_nid:
-                                    add_edge(item_nid, tgt, "references",
-                                             field.start_point[0] + 1, context=ctx)
+                                    add_edge(
+                                        item_nid,
+                                        tgt,
+                                        "references",
+                                        field.start_point[0] + 1,
+                                        context=ctx,
+                                    )
             return
 
         if t == "impl_item":
@@ -6289,8 +6424,13 @@ def walk(node, parent_impl_nid: str | None = None) -> None:
                     if idx == 0:
                         add_edge(impl_nid, tgt, "implements", node.start_point[0] + 1)
                     else:
-                        add_edge(impl_nid, tgt, "references", node.start_point[0] + 1,
-                                 context="generic_arg")
+                        add_edge(
+                            impl_nid,
+                            tgt,
+                            "references",
+                            node.start_point[0] + 1,
+                            context="generic_arg",
+                        )
             body = node.child_by_field_name("body")
             if body:
                 for child in body.children:
@@ -6758,8 +6898,7 @@ def walk(node, parent_class_nid: str | None = None) -> None:
                 line = node.start_point[0] + 1
                 target_nid = ensure_named_node(type_name, line)
                 if target_nid != parent_class_nid:
-                    add_edge(parent_class_nid, target_nid, "references",
-                             line, context="field")
+                    add_edge(parent_class_nid, target_nid, "references", line, context="field")
             return
 
         if t == "class_method_definition":
@@ -6777,30 +6916,37 @@ def walk(node, parent_class_nid: str | None = None) -> None:
                     add_edge(file_nid, method_nid, "contains", line)
                 # Return type: type_literal sibling of simple_name
                 return_type_literal = next(
-                    (c for c in node.children if c.type == "type_literal"), None)
+                    (c for c in node.children if c.type == "type_literal"), None
+                )
                 return_type_name = _ps_type_name(return_type_literal)
                 if return_type_name:
                     target_nid = ensure_named_node(return_type_name, line)
                     if target_nid != method_nid:
-                        add_edge(method_nid, target_nid, "references",
-                                 line, context="return_type")
+                        add_edge(method_nid, target_nid, "references", line, context="return_type")
                 # Parameter types: class_method_parameter_list
                 param_list = next(
-                    (c for c in node.children if c.type == "class_method_parameter_list"), None)
+                    (c for c in node.children if c.type == "class_method_parameter_list"), None
+                )
                 if param_list is not None:
                     for p in param_list.children:
                         if p.type != "class_method_parameter":
                             continue
                         ptype_literal = next(
-                            (c for c in p.children if c.type == "type_literal"), None)
+                            (c for c in p.children if c.type == "type_literal"), None
+                        )
                         ptype_name = _ps_type_name(ptype_literal)
                         if not ptype_name:
                             continue
                         p_line = p.start_point[0] + 1
                         target_nid = ensure_named_node(ptype_name, p_line)
                         if target_nid != method_nid:
-                            add_edge(method_nid, target_nid, "references",
-                                     p_line, context="parameter_type")
+                            add_edge(
+                                method_nid,
+                                target_nid,
+                                "references",
+                                p_line,
+                                context="parameter_type",
+                            )
                 body = _find_script_block_body(node)
                 if body:
                     function_bodies.append((method_nid, body))
@@ -8441,8 +8587,11 @@ def walk(node, parent_nid: str | None = None) -> None:
                             for s in sub.children:
                                 if s.type == "type_identifier":
                                     type_nid = ensure_named_node(_read(s), prop_line)
-                                    edges.append(_semantic_reference_edge(
-                                        cls_nid, type_nid, "field", str_path, prop_line))
+                                    edges.append(
+                                        _semantic_reference_edge(
+                                            cls_nid, type_nid, "field", str_path, prop_line
+                                        )
+                                    )
                                     break
                 elif child.type == "method_declaration":
                     walk(child, cls_nid)
@@ -10862,6 +11011,7 @@ def walk_object(
 # DM identity is path-based (`/datum/object/proc/New()`), not block-based, so
 # the generic class-body walker doesn't fit well.
 
+
 def extract_dm(path: Path) -> dict:
     """Extract types, procs, includes, and calls from a .dm/.dme file."""
     try:
@@ -10888,17 +11038,36 @@ def extract_dm(path: Path) -> dict:
     def add_node(nid: str, label: str, line: int) -> None:
         if nid and nid not in seen_ids:
             seen_ids.add(nid)
-            nodes.append({"id": nid, "label": label, "file_type": "code",
-                          "source_file": str_path, "source_location": f"L{line}"})
+            nodes.append(
+                {
+                    "id": nid,
+                    "label": label,
+                    "file_type": "code",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
 
-    def add_edge(src: str, tgt: str, relation: str, line: int,
-                 confidence: str = "EXTRACTED", weight: float = 1.0,
-                 context: str | None = None) -> None:
+    def add_edge(
+        src: str,
+        tgt: str,
+        relation: str,
+        line: int,
+        confidence: str = "EXTRACTED",
+        weight: float = 1.0,
+        context: str | None = None,
+    ) -> None:
         if not src or not tgt or src == tgt:
             return
-        edge: dict = {"source": src, "target": tgt, "relation": relation,
-                "confidence": confidence, "source_file": str_path,
-                "source_location": f"L{line}", "weight": weight}
+        edge: dict = {
+            "source": src,
+            "target": tgt,
+            "relation": relation,
+            "confidence": confidence,
+            "source_file": str_path,
+            "source_location": f"L{line}",
+            "weight": weight,
+        }
         if context:
             edge["context"] = context
         edges.append(edge)
@@ -10931,8 +11100,9 @@ def _read_include_path(file_node) -> str:
             return "".join(parts)
         return _read_text(file_node, source).strip("'\"")
 
-    def walk(node, parent_type_path: "str | None" = None,
-             parent_type_nid: "str | None" = None) -> None:
+    def walk(
+        node, parent_type_path: "str | None" = None, parent_type_nid: "str | None" = None
+    ) -> None:
         t = node.type
         line = node.start_point[0] + 1
 
@@ -11044,39 +11214,53 @@ def _emit_call(caller_nid: str, callee: str, line: int, is_member: bool) -> None
             if pair in seen_call_pairs:
                 return
             seen_call_pairs.add(pair)
-            edges.append({
-                "source": caller_nid, "target": tgt_nid, "relation": "calls",
-                "context": "call", "confidence": "EXTRACTED",
-                "source_file": str_path, "source_location": f"L{line}", "weight": 1.0,
-            })
+            edges.append(
+                {
+                    "source": caller_nid,
+                    "target": tgt_nid,
+                    "relation": "calls",
+                    "context": "call",
+                    "confidence": "EXTRACTED",
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                    "weight": 1.0,
+                }
+            )
         else:
-            raw_calls.append({
-                "caller_nid": caller_nid, "callee": callee,
-                "is_member_call": is_member, "source_file": str_path,
-                "source_location": f"L{line}",
-            })
+            raw_calls.append(
+                {
+                    "caller_nid": caller_nid,
+                    "callee": callee,
+                    "is_member_call": is_member,
+                    "source_file": str_path,
+                    "source_location": f"L{line}",
+                }
+            )
 
     def walk_calls(body_node, caller_nid: str) -> None:
         if body_node is None:
             return
         t = body_node.type
-        if t in ("proc_definition", "proc_override", "type_proc_definition",
-                 "type_proc_override", "type_definition"):
+        if t in (
+            "proc_definition",
+            "proc_override",
+            "type_proc_definition",
+            "type_proc_override",
+            "type_definition",
+        ):
             return
         if t == "call_expression":
             name_node = body_node.child_by_field_name("name")
             if name_node is not None:
                 callee = _read_text(name_node, source)
                 if callee and callee != "..":
-                    _emit_call(caller_nid, callee, body_node.start_point[0] + 1,
-                               is_member=False)
+                    _emit_call(caller_nid, callee, body_node.start_point[0] + 1, is_member=False)
         elif t == "field_proc_expression":
             proc_field = body_node.child_by_field_name("proc")
             if proc_field is not None:
                 callee = _read_text(proc_field, source)
                 if callee:
-                    _emit_call(caller_nid, callee, body_node.start_point[0] + 1,
-                               is_member=True)
+                    _emit_call(caller_nid, callee, body_node.start_point[0] + 1, is_member=True)
         elif t == "new_expression":
             tp_node = _find_child(body_node, "type_path")
             if tp_node is not None:
@@ -11087,13 +11271,18 @@ def walk_calls(body_node, caller_nid: str) -> None:
                     pair = (caller_nid, tgt_nid)
                     if pair not in seen_call_pairs:
                         seen_call_pairs.add(pair)
-                        edges.append({
-                            "source": caller_nid, "target": tgt_nid,
-                            "relation": "instantiates", "context": "call",
-                            "confidence": "EXTRACTED", "source_file": str_path,
-                            "source_location": f"L{body_node.start_point[0] + 1}",
-                            "weight": 1.0,
-                        })
+                        edges.append(
+                            {
+                                "source": caller_nid,
+                                "target": tgt_nid,
+                                "relation": "instantiates",
+                                "context": "call",
+                                "confidence": "EXTRACTED",
+                                "source_file": str_path,
+                                "source_location": f"L{body_node.start_point[0] + 1}",
+                                "weight": 1.0,
+                            }
+                        )
         for child in body_node.children:
             walk_calls(child, caller_nid)
 
@@ -11108,17 +11297,19 @@ def walk_calls(body_node, caller_nid: str) -> None:
 # metadata. We want the icon state names (icon_state = "X" in DM code
 # references them).
 
+
 def _read_dmi_description(data: bytes) -> str:
     """Pull the BYOND metadata text out of a .dmi PNG, or empty string on failure."""
     import struct
     import zlib as _zlib
+
     if not data.startswith(b"\x89PNG\r\n\x1a\n"):
         return ""
     i = 8
     while i + 8 <= len(data):
-        length = struct.unpack(">I", data[i:i + 4])[0]
-        chunk_type = data[i + 4:i + 8]
-        payload = data[i + 8:i + 8 + length]
+        length = struct.unpack(">I", data[i : i + 4])[0]
+        chunk_type = data[i + 4 : i + 8]
+        payload = data[i + 8 : i + 8 + length]
         if chunk_type in (b"tEXt", b"zTXt"):
             try:
                 null = payload.index(b"\x00")
@@ -11127,8 +11318,12 @@ def _read_dmi_description(data: bytes) -> str:
             keyword = payload[:null]
             if keyword == b"Description":
                 if chunk_type == b"zTXt":
-                    return _zlib.decompressobj().decompress(payload[null + 2:], max_length=1024 * 1024).decode("utf-8", errors="replace")
-                return payload[null + 1:].decode("utf-8", errors="replace")
+                    return (
+                        _zlib.decompressobj()
+                        .decompress(payload[null + 2 :], max_length=1024 * 1024)
+                        .decode("utf-8", errors="replace")
+                    )
+                return payload[null + 1 :].decode("utf-8", errors="replace")
         i += 8 + length + 4
     return ""
 
@@ -11143,8 +11338,15 @@ def extract_dmi(path: Path) -> dict:
     str_path = str(path)
     stem = _file_stem(path)
     file_nid = _make_id(str(path))
-    nodes: list[dict] = [{"id": file_nid, "label": path.name, "file_type": "code",
-                           "source_file": str_path, "source_location": "L1"}]
+    nodes: list[dict] = [
+        {
+            "id": file_nid,
+            "label": path.name,
+            "file_type": "code",
+            "source_file": str_path,
+            "source_location": "L1",
+        }
+    ]
     edges: list[dict] = []
     seen: set[str] = {file_nid}
 
@@ -11169,11 +11371,26 @@ def extract_dmi(path: Path) -> dict:
         if nid in seen:
             continue
         seen.add(nid)
-        nodes.append({"id": nid, "label": f'"{state_name}"', "file_type": "code",
-                      "source_file": str_path, "source_location": f"L{line_no}"})
-        edges.append({"source": file_nid, "target": nid, "relation": "contains",
-                      "confidence": "EXTRACTED", "source_file": str_path,
-                      "source_location": f"L{line_no}", "weight": 1.0})
+        nodes.append(
+            {
+                "id": nid,
+                "label": f'"{state_name}"',
+                "file_type": "code",
+                "source_file": str_path,
+                "source_location": f"L{line_no}",
+            }
+        )
+        edges.append(
+            {
+                "source": file_nid,
+                "target": nid,
+                "relation": "contains",
+                "confidence": "EXTRACTED",
+                "source_file": str_path,
+                "source_location": f"L{line_no}",
+                "weight": 1.0,
+            }
+        )
 
     return {"nodes": nodes, "edges": edges}
 
@@ -11242,12 +11459,19 @@ def extract_dmm(path: Path) -> dict:
 
     str_path = str(path)
     file_nid = _make_id(str(path))
-    nodes: list[dict] = [{"id": file_nid, "label": path.name, "file_type": "code",
-                           "source_file": str_path, "source_location": "L1"}]
+    nodes: list[dict] = [
+        {
+            "id": file_nid,
+            "label": path.name,
+            "file_type": "code",
+            "source_file": str_path,
+            "source_location": "L1",
+        }
+    ]
     edges: list[dict] = []
 
     grid_match = _DMM_GRID_RE.search(text)
-    dict_text = text[:grid_match.start()] if grid_match else text
+    dict_text = text[: grid_match.start()] if grid_match else text
 
     seen_targets: set[str] = set()
     buf: list[str] = []
@@ -11281,7 +11505,7 @@ def extract_dmm(path: Path) -> dict:
             rp = chunk.rfind(")")
             if lp == -1 or rp == -1 or rp <= lp:
                 continue
-            inner = chunk[lp + 1:rp]
+            inner = chunk[lp + 1 : rp]
             for entry in _split_dmm_tile(inner):
                 tpath = _dmm_type_path(entry)
                 if not tpath.startswith("/"):
@@ -11290,10 +11514,18 @@ def extract_dmm(path: Path) -> dict:
                 if tgt in seen_targets:
                     continue
                 seen_targets.add(tgt)
-                edges.append({"source": file_nid, "target": tgt, "relation": "uses",
-                              "context": "map", "confidence": "EXTRACTED",
-                              "source_file": str_path,
-                              "source_location": f"L{open_line}", "weight": 1.0})
+                edges.append(
+                    {
+                        "source": file_nid,
+                        "target": tgt,
+                        "relation": "uses",
+                        "context": "map",
+                        "confidence": "EXTRACTED",
+                        "source_file": str_path,
+                        "source_location": f"L{open_line}",
+                        "weight": 1.0,
+                    }
+                )
 
     return {"nodes": nodes, "edges": edges}
 
@@ -11302,7 +11534,7 @@ def extract_dmm(path: Path) -> dict:
 
 _DMF_WINDOW_RE = re.compile(r'^\s*window\s+"([^"]+)"\s*$')
 _DMF_ELEM_RE = re.compile(r'^\s*elem\s+"([^"]+)"\s*$')
-_DMF_TYPE_RE = re.compile(r'^\s*type\s*=\s*(\S+)\s*$')
+_DMF_TYPE_RE = re.compile(r"^\s*type\s*=\s*(\S+)\s*$")
 
 
 def extract_dmf(path: Path) -> dict:
@@ -11315,8 +11547,15 @@ def extract_dmf(path: Path) -> dict:
     str_path = str(path)
     stem = _file_stem(path)
     file_nid = _make_id(str(path))
-    nodes: list[dict] = [{"id": file_nid, "label": path.name, "file_type": "code",
-                           "source_file": str_path, "source_location": "L1"}]
+    nodes: list[dict] = [
+        {
+            "id": file_nid,
+            "label": path.name,
+            "file_type": "code",
+            "source_file": str_path,
+            "source_location": "L1",
+        }
+    ]
     edges: list[dict] = []
     seen: set[str] = {file_nid}
 
@@ -11331,11 +11570,26 @@ def extract_dmf(path: Path) -> dict:
             nid = _make_id(stem, "window", name)
             if nid not in seen:
                 seen.add(nid)
-                nodes.append({"id": nid, "label": f'window "{name}"', "file_type": "code",
-                              "source_file": str_path, "source_location": f"L{line_idx}"})
-                edges.append({"source": file_nid, "target": nid, "relation": "contains",
-                              "confidence": "EXTRACTED", "source_file": str_path,
-                              "source_location": f"L{line_idx}", "weight": 1.0})
+                nodes.append(
+                    {
+                        "id": nid,
+                        "label": f'window "{name}"',
+                        "file_type": "code",
+                        "source_file": str_path,
+                        "source_location": f"L{line_idx}",
+                    }
+                )
+                edges.append(
+                    {
+                        "source": file_nid,
+                        "target": nid,
+                        "relation": "contains",
+                        "confidence": "EXTRACTED",
+                        "source_file": str_path,
+                        "source_location": f"L{line_idx}",
+                        "weight": 1.0,
+                    }
+                )
             current_window_nid = nid
             current_elem_nid = None
             current_elem_name = None
@@ -11346,12 +11600,26 @@ def extract_dmf(path: Path) -> dict:
             nid = _make_id(stem, "elem", current_window_nid, name)
             if nid not in seen:
                 seen.add(nid)
-                nodes.append({"id": nid, "label": f'elem "{name}"', "file_type": "code",
-                              "source_file": str_path, "source_location": f"L{line_idx}"})
-                edges.append({"source": current_window_nid, "target": nid,
-                              "relation": "contains", "confidence": "EXTRACTED",
-                              "source_file": str_path, "source_location": f"L{line_idx}",
-                              "weight": 1.0})
+                nodes.append(
+                    {
+                        "id": nid,
+                        "label": f'elem "{name}"',
+                        "file_type": "code",
+                        "source_file": str_path,
+                        "source_location": f"L{line_idx}",
+                    }
+                )
+                edges.append(
+                    {
+                        "source": current_window_nid,
+                        "target": nid,
+                        "relation": "contains",
+                        "confidence": "EXTRACTED",
+                        "source_file": str_path,
+                        "source_location": f"L{line_idx}",
+                        "weight": 1.0,
+                    }
+                )
             current_elem_nid = nid
             current_elem_name = name
             continue
diff --git a/graphify/llm.py b/graphify/llm.py
index 5e0bb3e24..343d3e0cd 100644
--- a/graphify/llm.py
+++ b/graphify/llm.py
@@ -274,8 +274,7 @@ def _parse_llm_json(raw: str) -> dict:
                     except json.JSONDecodeError:
                         break
     print(
-        f"[graphify] LLM returned invalid JSON, skipping chunk "
-        f"(first 200 chars: {raw[:200]!r})",
+        f"[graphify] LLM returned invalid JSON, skipping chunk (first 200 chars: {raw[:200]!r})",
         file=sys.stderr,
     )
     return {"nodes": [], "edges": [], "hyperedges": []}
@@ -475,7 +474,9 @@ def _call_openai_compat(
     return result
 
 
-def _call_claude(api_key: str, model: str, user_message: str, max_tokens: int = 8192, *, deep_mode: bool = False) -> dict:
+def _call_claude(
+    api_key: str, model: str, user_message: str, max_tokens: int = 8192, *, deep_mode: bool = False
+) -> dict:
     """Call Anthropic Claude directly (not via OpenAI compat layer)."""
     try:
         anthropic = importlib.import_module("anthropic")
@@ -555,7 +556,8 @@ def _call_claude_cli(user_message: str, max_tokens: int = 8192, *, deep_mode: bo
         claude_cmd, "-p",
         "--output-format", "json",
         "--no-session-persistence",
-        "--system-prompt", _extraction_system(deep=deep_mode),
+        "--system-prompt",
+        _extraction_system(deep=deep_mode),
     ]
     # claude-cli defaults to Opus, which is overkill for the structured-JSON
     # extraction graphify performs. GRAPHIFY_CLAUDE_CLI_MODEL=haiku (or
@@ -608,7 +610,9 @@ def _call_claude_cli(user_message: str, max_tokens: int = 8192, *, deep_mode: bo
     return result
 
 
-def _call_bedrock(model: str, user_message: str, max_tokens: int = 8192, *, deep_mode: bool = False) -> dict:
+def _call_bedrock(
+    model: str, user_message: str, max_tokens: int = 8192, *, deep_mode: bool = False
+) -> dict:
     """Call AWS Bedrock via boto3 Converse API using the standard AWS credential chain."""
     try:
         import boto3
diff --git a/tests/test_analyze.py b/tests/test_analyze.py
index 1cd36339f..e93cd573a 100644
--- a/tests/test_analyze.py
+++ b/tests/test_analyze.py
@@ -980,8 +980,12 @@ def test_god_nodes_multigraph_real_hub_detected():
         nid = f"n{i}"
         G.add_node(nid, label=f"Node{i}", source_file=f"src/n{i}.py", file_type="code")
         G.add_edge(
-            "hub", nid, relation="calls", confidence="EXTRACTED",
-            source_file="src/hub.py", weight=1.0,
+            "hub",
+            nid,
+            relation="calls",
+            confidence="EXTRACTED",
+            source_file="src/hub.py",
+            weight=1.0,
         )
 
     result = god_nodes(G, top_n=5)
@@ -999,26 +1003,56 @@ def test_edge_betweenness_multigraph_does_not_crash():
     # Two communities with intra-community edges + one bridge
     for i in range(5):
         G.add_node(
-            f"a{i}", label=f"A{i}", source_file="single.py",
-            file_type="code", source_location=f"L{i}",
+            f"a{i}",
+            label=f"A{i}",
+            source_file="single.py",
+            file_type="code",
+            source_location=f"L{i}",
         )
     for i in range(5):
         G.add_node(
-            f"b{i}", label=f"B{i}", source_file="single.py",
-            file_type="code", source_location=f"L{i + 10}",
+            f"b{i}",
+            label=f"B{i}",
+            source_file="single.py",
+            file_type="code",
+            source_location=f"L{i + 10}",
         )
     # Dense intra-community edges (some parallel)
     for i in range(4):
-        G.add_edge(f"a{i}", f"a{i+1}", relation="calls", confidence="EXTRACTED",
-                   source_file="single.py", weight=1.0)
-        G.add_edge(f"a{i}", f"a{i+1}", relation="uses", confidence="EXTRACTED",
-                   source_file="single.py", weight=1.0)
+        G.add_edge(
+            f"a{i}",
+            f"a{i + 1}",
+            relation="calls",
+            confidence="EXTRACTED",
+            source_file="single.py",
+            weight=1.0,
+        )
+        G.add_edge(
+            f"a{i}",
+            f"a{i + 1}",
+            relation="uses",
+            confidence="EXTRACTED",
+            source_file="single.py",
+            weight=1.0,
+        )
     for i in range(4):
-        G.add_edge(f"b{i}", f"b{i+1}", relation="calls", confidence="EXTRACTED",
-                   source_file="single.py", weight=1.0)
+        G.add_edge(
+            f"b{i}",
+            f"b{i + 1}",
+            relation="calls",
+            confidence="EXTRACTED",
+            source_file="single.py",
+            weight=1.0,
+        )
     # Bridge edge
-    G.add_edge("a4", "b0", relation="references", confidence="INFERRED",
-               source_file="single.py", weight=0.5)
+    G.add_edge(
+        "a4",
+        "b0",
+        relation="references",
+        confidence="INFERRED",
+        source_file="single.py",
+        weight=0.5,
+    )
 
     # Should not crash -- this is the core regression test
     result = surprising_connections(G, communities=None)
@@ -1035,29 +1069,65 @@ def test_surprising_connections_multigraph_results_valid():
     # Community 1: nodes in file1.py
     for i in range(5):
         G.add_node(
-            f"c1_{i}", label=f"C1_{i}", source_file="repo/file1.py",
-            file_type="code", source_location=f"L{i}",
+            f"c1_{i}",
+            label=f"C1_{i}",
+            source_file="repo/file1.py",
+            file_type="code",
+            source_location=f"L{i}",
         )
     # Community 2: nodes in file2.py
     for i in range(5):
         G.add_node(
-            f"c2_{i}", label=f"C2_{i}", source_file="repo/file2.py",
-            file_type="code", source_location=f"L{i}",
+            f"c2_{i}",
+            label=f"C2_{i}",
+            source_file="repo/file2.py",
+            file_type="code",
+            source_location=f"L{i}",
         )
     # Intra-community edges with parallel edges
     for i in range(4):
-        G.add_edge(f"c1_{i}", f"c1_{i+1}", relation="calls", confidence="EXTRACTED",
-                   source_file="repo/file1.py", weight=1.0)
-        G.add_edge(f"c1_{i}", f"c1_{i+1}", relation="uses", confidence="INFERRED",
-                   source_file="repo/file1.py", weight=0.5)
+        G.add_edge(
+            f"c1_{i}",
+            f"c1_{i + 1}",
+            relation="calls",
+            confidence="EXTRACTED",
+            source_file="repo/file1.py",
+            weight=1.0,
+        )
+        G.add_edge(
+            f"c1_{i}",
+            f"c1_{i + 1}",
+            relation="uses",
+            confidence="INFERRED",
+            source_file="repo/file1.py",
+            weight=0.5,
+        )
     for i in range(4):
-        G.add_edge(f"c2_{i}", f"c2_{i+1}", relation="calls", confidence="EXTRACTED",
-                   source_file="repo/file2.py", weight=1.0)
+        G.add_edge(
+            f"c2_{i}",
+            f"c2_{i + 1}",
+            relation="calls",
+            confidence="EXTRACTED",
+            source_file="repo/file2.py",
+            weight=1.0,
+        )
     # Cross-community bridge with parallel edges
-    G.add_edge("c1_4", "c2_0", relation="references", confidence="AMBIGUOUS",
-               source_file="repo/file1.py", weight=0.3)
-    G.add_edge("c1_4", "c2_0", relation="calls", confidence="INFERRED",
-               source_file="repo/file1.py", weight=0.5)
+    G.add_edge(
+        "c1_4",
+        "c2_0",
+        relation="references",
+        confidence="AMBIGUOUS",
+        source_file="repo/file1.py",
+        weight=0.3,
+    )
+    G.add_edge(
+        "c1_4",
+        "c2_0",
+        relation="calls",
+        confidence="INFERRED",
+        source_file="repo/file1.py",
+        weight=0.5,
+    )
 
     communities = cluster(G)
     result = surprising_connections(G, communities)
diff --git a/tests/test_build.py b/tests/test_build.py
index 3bce5ddb0..52bde5dec 100644
--- a/tests/test_build.py
+++ b/tests/test_build.py
@@ -304,10 +304,20 @@ def test_build_from_json_preserves_first_direction_on_bidirectional_pair(tmp_pat
             {"id": "z_emitter", "label": "z", "file_type": "code", "source_file": "z.ts"},
         ],
         "edges": [
-            {"source": "a_handler", "target": "z_emitter", "relation": "calls",
-             "confidence": "EXTRACTED", "source_file": "a.ts"},
-            {"source": "z_emitter", "target": "a_handler", "relation": "calls",
-             "confidence": "EXTRACTED", "source_file": "z.ts"},
+            {
+                "source": "a_handler",
+                "target": "z_emitter",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "a.ts",
+            },
+            {
+                "source": "z_emitter",
+                "target": "a_handler",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "z.ts",
+            },
         ],
         "input_tokens": 0,
         "output_tokens": 0,
@@ -324,8 +334,9 @@ def test_build_from_json_preserves_first_direction_on_bidirectional_pair(tmp_pat
     graph_path = tmp_path / "graph.json"
     assert to_json(G, {}, str(graph_path), force=True)
     saved = json.loads(graph_path.read_text())
-    saved_calls = [e for e in saved.get("links", saved.get("edges", []))
-                   if e.get("relation") == "calls"]
+    saved_calls = [
+        e for e in saved.get("links", saved.get("edges", [])) if e.get("relation") == "calls"
+    ]
     assert len(saved_calls) == 1
     assert saved_calls[0]["source"] == "a_handler", (
         f"calls edge source flipped on bidirectional collision: "
diff --git a/tests/test_cluster.py b/tests/test_cluster.py
index e63d11854..305eb9345 100644
--- a/tests/test_cluster.py
+++ b/tests/test_cluster.py
@@ -148,9 +148,7 @@ def test_cohesion_multigraph_equals_simple_graph_cohesion():
 
     multi_score = cohesion_score(MG, ["a", "b", "c"])
     simple_score = cohesion_score(SG, ["a", "b", "c"])
-    assert multi_score == simple_score, (
-        f"multi={multi_score} != simple={simple_score}"
-    )
+    assert multi_score == simple_score, f"multi={multi_score} != simple={simple_score}"
 
 
 def test_cluster_multigraph_produces_valid_communities():
@@ -161,12 +159,10 @@ def test_cluster_multigraph_produces_valid_communities():
     # zero-weight panic in some versions).
     for pair in [("a", "b"), ("b", "c"), ("a", "c")]:
         for k in range(3):
-            G.add_edge(pair[0], pair[1], key=f"{pair[0]}{pair[1]}-{k}",
-                       confidence="EXTRACTED")
+            G.add_edge(pair[0], pair[1], key=f"{pair[0]}{pair[1]}-{k}", confidence="EXTRACTED")
     for pair in [("d", "e"), ("e", "f"), ("d", "f")]:
         for k in range(3):
-            G.add_edge(pair[0], pair[1], key=f"{pair[0]}{pair[1]}-{k}",
-                       confidence="EXTRACTED")
+            G.add_edge(pair[0], pair[1], key=f"{pair[0]}{pair[1]}-{k}", confidence="EXTRACTED")
     G.add_edge("c", "d", key="bridge", confidence="AMBIGUOUS")
 
     communities = cluster(G)
@@ -184,8 +180,9 @@ def test_cluster_multigraph_does_not_crash():
     for i in range(len(nodes)):
         for j in range(i + 1, min(i + 3, len(nodes))):
             for k in range(4):
-                G.add_edge(nodes[i], nodes[j], key=f"{nodes[i]}-{nodes[j]}-{k}",
-                           confidence="EXTRACTED")
+                G.add_edge(
+                    nodes[i], nodes[j], key=f"{nodes[i]}-{nodes[j]}-{k}", confidence="EXTRACTED"
+                )
     # Must not raise
     communities = cluster(G)
     assert isinstance(communities, dict)
diff --git a/tests/test_languages.py b/tests/test_languages.py
index 1c38a9845..4828c029f 100644
--- a/tests/test_languages.py
+++ b/tests/test_languages.py
@@ -733,6 +733,7 @@ def test_swift_extension_does_not_duplicate_type_node():
     config_nodes = [n for n in r["nodes"] if n["label"] == "Config"]
     assert len(config_nodes) == 1, f"Config should appear once, got {len(config_nodes)}"
 
+
 def test_swift_protocol_conformance_emits_implements():
     r = extract_swift(FIXTURES / "sample.swift")
     assert ("DataProcessor", "Processor") in _edge_labels(r, "implements")
@@ -1110,6 +1111,7 @@ def test_fortran_capital_F_parses_preprocessed():
 
 # ── PowerShell ───────────────────────────────────────────────────────────────
 
+
 def test_powershell_no_error():
     r = extract_powershell(FIXTURES / "sample.ps1")
     assert "error" not in r
@@ -1442,38 +1444,45 @@ def test_groovy_spock_no_dangling_edges():
 
 # ── DM (BYOND DreamMaker) ────────────────────────────────────────────────────
 
+
 def test_dm_no_error():
     r = extract_dm(FIXTURES / "sample.dm")
     assert "error" not in r
 
+
 def test_dm_finds_global_proc():
     r = extract_dm(FIXTURES / "sample.dm")
     labels = _labels(r)
     assert any(label == "log_event()" for label in labels)
     assert any(label == "RunTest()" for label in labels)
 
+
 def test_dm_finds_type_definition():
     r = extract_dm(FIXTURES / "sample.dm")
     labels = _labels(r)
     assert "/datum/weapon" in labels
     assert "/datum/weapon/sword" in labels
 
+
 def test_dm_qualifies_proc_with_type_path():
     r = extract_dm(FIXTURES / "sample.dm")
     labels = _labels(r)
     assert "/datum/weapon/attack()" in labels
     assert "/datum/weapon/sword/attack()" in labels
 
+
 def test_dm_finds_path_form_proc_definition():
     r = extract_dm(FIXTURES / "sample.dm")
     assert "/datum/weapon/sword/sharpen()" in _labels(r)
 
+
 def test_dm_emits_include_edge():
     r = extract_dm(FIXTURES / "sample.dm")
     import_edges = _edges_with_relation(r, "imports", "imports_from")
     assert import_edges
     assert all(e.get("context") == "import" for e in import_edges)
 
+
 def test_dm_unresolved_include_flagged_external():
     r = extract_dm(FIXTURES / "sample.dm")
     import_edges = _edges_with_relation(r, "imports", "imports_from")
@@ -1481,39 +1490,47 @@ def test_dm_unresolved_include_flagged_external():
     assert helpers
     assert all(e.get("external") is True for e in helpers)
 
+
 def test_dm_resolves_in_file_calls():
     r = extract_dm(FIXTURES / "sample.dm")
     calls = _calls(r)
     assert any(callee == "log_event()" for _, callee in calls)
     assert ("/datum/weapon/sword/attack()", "/datum/weapon/sword/sharpen()") in calls
 
+
 def test_dm_ambiguous_member_call_left_unresolved():
     r = extract_dm(FIXTURES / "sample.dm")
     calls = _calls(r)
-    runtest_to_attack = [c for s, c in calls
-                         if s == "RunTest()" and "attack" in c]
+    runtest_to_attack = [c for s, c in calls if s == "RunTest()" and "attack" in c]
     assert not runtest_to_attack
     assert any(rc["callee"] == "attack" for rc in r.get("raw_calls", []))
 
+
 def test_dm_emits_new_as_instantiates():
     r = extract_dm(FIXTURES / "sample.dm")
     node_by_id = {n["id"]: n["label"] for n in r["nodes"]}
-    inst = [(node_by_id.get(e["source"]), node_by_id.get(e["target"]))
-            for e in r["edges"] if e["relation"] == "instantiates"]
+    inst = [
+        (node_by_id.get(e["source"]), node_by_id.get(e["target"]))
+        for e in r["edges"]
+        if e["relation"] == "instantiates"
+    ]
     assert ("RunTest()", "/datum/weapon/sword") in inst
 
+
 def test_dm_call_edges_have_call_context():
     r = extract_dm(FIXTURES / "sample.dm")
     call_edges = _edges_with_relation(r, "calls", "instantiates")
     assert call_edges
     assert all(e.get("context") == "call" for e in call_edges)
 
+
 def test_dm_no_dangling_edges():
     r = extract_dm(FIXTURES / "sample.dm")
     node_ids = {n["id"] for n in r["nodes"]}
     for e in r["edges"]:
         assert e["source"] in node_ids
 
+
 def test_dm_super_call_not_emitted():
     r = extract_dm(FIXTURES / "sample.dm")
     calls = _calls(r)
@@ -1523,29 +1540,37 @@ def test_dm_super_call_not_emitted():
 
 # ── DMI (BYOND icon sheets) ──────────────────────────────────────────────────
 
+
 def test_dmi_no_error():
     r = extract_dmi(FIXTURES / "sample.dmi")
     assert "error" not in r
 
+
 def test_dmi_emits_state_nodes():
     r = extract_dmi(FIXTURES / "sample.dmi")
     labels = _labels(r)
     assert any(label == '"mob"' for label in labels)
 
+
 def test_dmi_state_contained_by_file():
     r = extract_dmi(FIXTURES / "sample.dmi")
     node_by_id = {n["id"]: n["label"] for n in r["nodes"]}
-    contains = [(node_by_id.get(e["source"]), node_by_id.get(e["target"]))
-                for e in r["edges"] if e["relation"] == "contains"]
+    contains = [
+        (node_by_id.get(e["source"]), node_by_id.get(e["target"]))
+        for e in r["edges"]
+        if e["relation"] == "contains"
+    ]
     assert ("sample.dmi", '"mob"') in contains
 
 
 # ── DMM (BYOND map files) ────────────────────────────────────────────────────
 
+
 def test_dmm_no_error():
     r = extract_dmm(FIXTURES / "sample.dmm")
     assert "error" not in r
 
+
 def test_dmm_extracts_type_paths_as_uses_edges():
     r = extract_dmm(FIXTURES / "sample.dmm")
     targets = {e["target"] for e in r["edges"] if e["relation"] == "uses"}
@@ -1553,17 +1578,20 @@ def test_dmm_extracts_type_paths_as_uses_edges():
     assert "obj_structure_table" in targets
     assert "obj_item_weapon_sword" in targets
 
+
 def test_dmm_strips_var_overrides():
     r = extract_dmm(FIXTURES / "sample.dmm")
     targets = {e["target"] for e in r["edges"] if e["relation"] == "uses"}
     assert not any("{" in t for t in targets)
     assert "obj_item_weapon_sword" in targets
 
+
 def test_dmm_handles_multiline_tile_definition():
     r = extract_dmm(FIXTURES / "sample.dmm")
     targets = {e["target"] for e in r["edges"] if e["relation"] == "uses"}
     assert "area_station_maintenance" in targets
 
+
 def test_dmm_skips_grid_section():
     r = extract_dmm(FIXTURES / "sample.dmm")
     targets = {e["target"] for e in r["edges"] if e["relation"] == "uses"}
@@ -1572,28 +1600,36 @@ def test_dmm_skips_grid_section():
 
 # ── DMF (BYOND interface forms) ──────────────────────────────────────────────
 
+
 def test_dmf_no_error():
     r = extract_dmf(FIXTURES / "sample.dmf")
     assert "error" not in r
 
+
 def test_dmf_extracts_windows():
     r = extract_dmf(FIXTURES / "sample.dmf")
     labels = _labels(r)
     assert 'window "mapwindow"' in labels
     assert 'window "infowindow"' in labels
 
+
 def test_dmf_elem_labels_carry_control_type():
     r = extract_dmf(FIXTURES / "sample.dmf")
     labels = _labels(r)
     assert 'elem "map" [MAP]' in labels
 
+
 def test_dmf_elem_under_window():
     r = extract_dmf(FIXTURES / "sample.dmf")
     node_by_id = {n["id"]: n["label"] for n in r["nodes"]}
-    contains = [(node_by_id.get(e["source"]), node_by_id.get(e["target"]))
-                for e in r["edges"] if e["relation"] == "contains"]
+    contains = [
+        (node_by_id.get(e["source"]), node_by_id.get(e["target"]))
+        for e in r["edges"]
+        if e["relation"] == "contains"
+    ]
     assert ('window "mapwindow"', 'elem "map" [MAP]') in contains
 
+
 def test_dmf_no_dangling_edges():
     r = extract_dmf(FIXTURES / "sample.dmf")
     node_ids = {n["id"] for n in r["nodes"]}
diff --git a/tests/test_llm_parser.py b/tests/test_llm_parser.py
index ad8bd07dd..6956a1c84 100644
--- a/tests/test_llm_parser.py
+++ b/tests/test_llm_parser.py
@@ -6,6 +6,7 @@
 - The switch from --append-system-prompt to --system-prompt
 - The GRAPHIFY_CLAUDE_CLI_MODEL env-var passthrough
 """
+
 from __future__ import annotations
 
 import json
@@ -23,10 +24,7 @@ def test_preamble_then_fence_is_parsed():
     so any preamble caused json.loads to fail and the chunk to be
     dropped as a hollow response. The robust parser handles fences
     anywhere in the text."""
-    raw = (
-        "Here are the extracted entities:\n\n"
-        '```json\n{"nodes": [{"id": "a"}], "edges": []}\n```'
-    )
+    raw = 'Here are the extracted entities:\n\n```json\n{"nodes": [{"id": "a"}], "edges": []}\n```'
     result = llm._parse_llm_json(raw)
     assert result["nodes"] == [{"id": "a"}]
     assert result["edges"] == []
@@ -35,10 +33,7 @@ def test_preamble_then_fence_is_parsed():
 def test_prose_wrapped_json_without_fence_is_parsed():
     """Some models return prose around bare JSON with no markdown fence.
     The balanced-brace fallback extracts the first complete object."""
-    raw = (
-        'The extracted graph is {"nodes": [{"id": "b"}], "edges": []}. '
-        "Hope this helps!"
-    )
+    raw = 'The extracted graph is {"nodes": [{"id": "b"}], "edges": []}. Hope this helps!'
     result = llm._parse_llm_json(raw)
     assert result["nodes"] == [{"id": "b"}]
 
@@ -85,16 +80,22 @@ def test_empty_response_returns_empty_fragment():
 
 
 def _make_envelope(result_obj: dict) -> str:
-    return json.dumps({
-        "type": "result",
-        "subtype": "success",
-        "is_error": False,
-        "result": json.dumps(result_obj),
-        "usage": {"input_tokens": 1, "output_tokens": 1,
-                  "cache_creation_input_tokens": 0, "cache_read_input_tokens": 0},
-        "modelUsage": {"claude-opus-4-7": {}},
-        "stop_reason": "end_turn",
-    })
+    return json.dumps(
+        {
+            "type": "result",
+            "subtype": "success",
+            "is_error": False,
+            "result": json.dumps(result_obj),
+            "usage": {
+                "input_tokens": 1,
+                "output_tokens": 1,
+                "cache_creation_input_tokens": 0,
+                "cache_read_input_tokens": 0,
+            },
+            "modelUsage": {"claude-opus-4-7": {}},
+            "stop_reason": "end_turn",
+        }
+    )
 
 
 @patch("shutil.which", return_value="/usr/local/bin/claude")
diff --git a/tests/test_multilang.py b/tests/test_multilang.py
index 9496a9664..7bdeb8752 100644
--- a/tests/test_multilang.py
+++ b/tests/test_multilang.py
@@ -197,13 +197,15 @@ def test_go_method_declaration_emits_refs_only_when_name_present():
     def _find_branch(root: ast.AST, type_literal: str) -> ast.If | None:
         """Return the `if t == '<type_literal>':` branch inside the walk function."""
         for child in ast.walk(root):
-            if (isinstance(child, ast.If)
-                    and isinstance(child.test, ast.Compare)
-                    and isinstance(child.test.left, ast.Name)
-                    and child.test.left.id == "t"
-                    and len(child.test.comparators) == 1
-                    and isinstance(child.test.comparators[0], ast.Constant)
-                    and child.test.comparators[0].value == type_literal):
+            if (
+                isinstance(child, ast.If)
+                and isinstance(child.test, ast.Compare)
+                and isinstance(child.test.left, ast.Name)
+                and child.test.left.id == "t"
+                and len(child.test.comparators) == 1
+                and isinstance(child.test.comparators[0], ast.Constant)
+                and child.test.comparators[0].value == type_literal
+            ):
                 return child
         return None
 
@@ -258,10 +260,12 @@ def _is_guarded(use: ast.AST) -> bool:
             for stmt, siblings in _stmt_chain(use):
                 parent = parents.get(id(stmt))
                 # Case 1: lexically nested under `if name_node:` body
-                if (isinstance(parent, ast.If)
-                        and isinstance(parent.test, ast.Name)
-                        and parent.test.id == "name_node"
-                        and stmt in parent.body):
+                if (
+                    isinstance(parent, ast.If)
+                    and isinstance(parent.test, ast.Name)
+                    and parent.test.id == "name_node"
+                    and stmt in parent.body
+                ):
                     return True
                 # Case 2: a preceding sibling is `if not name_node: return`
                 idx = siblings.index(stmt)
diff --git a/tests/test_rationale.py b/tests/test_rationale.py
index 8ab29d157..8ca6e463d 100644
--- a/tests/test_rationale.py
+++ b/tests/test_rationale.py
@@ -218,7 +218,9 @@ def test_decorated_method_node_id_is_class_qualified(tmp_path):
     docstring's edge target. The mismatch caused ``build_from_json`` to drop
     the rationale_for edge as dangling, orphaning the docstring node.
     """
-    path = _write_py(tmp_path, '''
+    path = _write_py(
+        tmp_path,
+        '''
         class Bar:
             @property
             def baz(self) -> int:
@@ -238,13 +240,13 @@ def factory(cls) -> "Bar":
             def normal(self) -> int:
                 """A normal instance method documented for comparison."""
                 return 3
-    ''')
+    ''',
+    )
     result = extract_python(path)
     nodes_by_id = {n["id"]: n for n in result["nodes"]}
 
     # The plain method's id is the baseline: stem + class + name.
-    normal_ids = [nid for nid, n in nodes_by_id.items()
-                  if n.get("label") == ".normal()"]
+    normal_ids = [nid for nid, n in nodes_by_id.items() if n.get("label") == ".normal()"]
     assert len(normal_ids) == 1, "expected exactly one ``.normal()`` method node"
     normal_id = normal_ids[0]
     assert normal_id.endswith("_bar_normal"), normal_id
@@ -252,16 +254,16 @@ def normal(self) -> int:
     # Each decorated method must share the same class-qualified id shape so the
     # rationale_for edge target matches the method node id.
     for decorated_name in ("baz", "helper", "factory"):
-        matches = [nid for nid, n in nodes_by_id.items()
-                   if n.get("label") == f".{decorated_name}()"]
+        matches = [
+            nid for nid, n in nodes_by_id.items() if n.get("label") == f".{decorated_name}()"
+        ]
         assert len(matches) == 1, (
             f"expected exactly one ``.{decorated_name}()`` method node, got {matches}"
         )
         method_id = matches[0]
         assert method_id.endswith(f"_bar_{decorated_name}"), method_id
         # Unqualified id (the buggy form) must NOT also be present.
-        unqualified_buggy_id = method_id.replace(f"_bar_{decorated_name}",
-                                                  f"_{decorated_name}")
+        unqualified_buggy_id = method_id.replace(f"_bar_{decorated_name}", f"_{decorated_name}")
         assert unqualified_buggy_id not in nodes_by_id, (
             f"buggy unqualified id {unqualified_buggy_id} should not exist alongside "
             f"the class-qualified id"
@@ -281,16 +283,11 @@ def normal(self) -> int:
     g = build_from_json(result)
     for decorated_name in ("baz", "helper", "factory", "normal"):
         method_id = next(
-            nid for nid, n in nodes_by_id.items()
-            if n.get("label") == f".{decorated_name}()"
+            nid for nid, n in nodes_by_id.items() if n.get("label") == f".{decorated_name}()"
         )
         # Find rationale node attached to this method.
-        attached_rationale = [
-            e["source"] for e in rationale_edges if e["target"] == method_id
-        ]
-        assert attached_rationale, (
-            f"no rationale_for edge found for ``.{decorated_name}()`` method"
-        )
+        attached_rationale = [e["source"] for e in rationale_edges if e["target"] == method_id]
+        assert attached_rationale, f"no rationale_for edge found for ``.{decorated_name}()`` method"
         for r_id in attached_rationale:
             assert r_id in g.nodes, f"rationale node {r_id} missing from graph"
             assert g.degree(r_id) > 0, (
diff --git a/tests/test_report.py b/tests/test_report.py
index 8fccf296c..0f21fd40d 100644
--- a/tests/test_report.py
+++ b/tests/test_report.py
@@ -100,6 +100,7 @@ def test_report_shows_raw_cohesion_scores():
 
 # --- Helpers for edge-count tests ---
 
+
 def _minimal_report(G):
     """Generate a report from a graph with minimal scaffolding."""
     communities = {0: list(G.nodes())}
@@ -110,13 +111,22 @@ def _minimal_report(G):
     detection = {"total_files": 1, "total_words": 100, "needs_graph": True, "warning": None}
     tokens = {"input": 100, "output": 50}
     return generate(
-        G, communities, cohesion, labels, god_list, surprise_list, detection, tokens, "./test",
+        G,
+        communities,
+        cohesion,
+        labels,
+        god_list,
+        surprise_list,
+        detection,
+        tokens,
+        "./test",
         min_community_size=1,
     )
 
 
 # --- PR 4B: Edge count reporting tests ---
 
+
 def test_report_multigraph_edge_count_distinguishes_pairs():
     """MultiDiGraph with parallel edges: report must show both total and unique pair count."""
     G = nx.MultiDiGraph()

From 0c5b402ef22509e32ee554967174ccade4860e3f Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Fri, 29 May 2026 02:31:12 -0500
Subject: [PATCH 06/21] feat(multigraph): PR 5 query/path/explain/MCP surfaces
 show bundled relationships

User-facing read surfaces no longer silently show only the first parallel
edge on a MultiDiGraph. Adds a stable, capped relationship envelope routed
through explicit projections (preserving the analytics/report projection
boundary from PR 4B):

- projections.py: relationship_envelope() (structured bundle for MCP) and
  format_relationship_envelope() (capped text "a, b, c (+K more, N total)")
  with DEFAULT_RELATIONSHIP_CAP=3. directed_only flag isolates a single
  direction for arrow rendering.
- serve.py: _subgraph_to_text, _neighbors_text, _shortest_path_text bundle
  all parallel relations per hop/neighbor (directed_only=True) instead of
  edge_data() first-edge-only. Historical single-relation format preserved
  byte-for-byte across all three; envelope bundle only for multi-relation.
- __main__.py: path and explain handlers render directional bundles via
  directed_only=True so reverse-edge relations never bleed onto an arrow.

edge_data() in build.py kept unchanged for non-display callers. Simple-graph
output stays byte-stable (pinned by format-regression tests). 32 new tests
incl. directional-isolation, capped-summary, and exact-format-pin regressions.
Full suite 1624 passed.

gost
---
 graphify/__main__.py           |  65 ++++--
 graphify/projections.py        | 131 +++++++++++-
 graphify/serve.py              | 268 ++++++++++++++++---------
 tests/test_explain_cli.py      |  90 +++++++++
 tests/test_path_cli.py         |  83 ++++++++
 tests/test_projections.py      | 222 +++++++++++++++++++++
 tests/test_serve_multigraph.py | 355 +++++++++++++++++++++++++++++++++
 7 files changed, 1106 insertions(+), 108 deletions(-)
 create mode 100644 tests/test_serve_multigraph.py

diff --git a/graphify/__main__.py b/graphify/__main__.py
index 2d916b911..94f852881 100644
--- a/graphify/__main__.py
+++ b/graphify/__main__.py
@@ -2270,25 +2270,42 @@ def main() -> None:
         hops = len(path_nodes) - 1
         segments = []
         from graphify.build import edge_data
+        from graphify.projections import (
+            format_relationship_envelope,
+            relationship_envelope,
+        )
 
         for i in range(len(path_nodes) - 1):
             u, v = path_nodes[i], path_nodes[i + 1]
             # Check which direction the stored edge points.
             if G.has_edge(u, v):
-                edata = edge_data(G, u, v)
                 forward = True
+                src, tgt = u, v
             else:
-                edata = edge_data(G, v, u)
                 forward = False
-            rel = edata.get("relation", "")
-            conf = edata.get("confidence", "")
-            conf_str = f" [{conf}]" if conf else ""
+                src, tgt = v, u
+            # Bundle every parallel relationship on this hop so a MultiDiGraph
+            # never silently shows only the first edge (#PR5 go/no-go gate).
+            # directed_only=True isolates this hop's stored direction (src->tgt)
+            # so a reverse edge (tgt->src) never bleeds into the arrow's bundle.
+            env = relationship_envelope(G, src, tgt, directed_only=True)
+            if len(env["relations"]) > 1:
+                # Multiple parallel relations: render the capped bundle; the
+                # envelope omits per-relation confidence for stability.
+                rel_str = format_relationship_envelope(G, src, tgt, directed_only=True)
+            else:
+                # Single relation (always true for simple DiGraph/Graph): keep
+                # the historical "rel [CONFIDENCE]" form byte-for-byte stable.
+                edata = edge_data(G, src, tgt)
+                rel = edata.get("relation", "")
+                conf = edata.get("confidence", "")
+                rel_str = f"{rel} [{conf}]" if conf else rel
             if i == 0:
                 segments.append(G.nodes[u].get("label", u))
             if forward:
-                segments.append(f"--{rel}{conf_str}--> {G.nodes[v].get('label', v)}")
+                segments.append(f"--{rel_str}--> {G.nodes[v].get('label', v)}")
             else:
-                segments.append(f"<--{rel}{conf_str}-- {G.nodes[v].get('label', v)}")
+                segments.append(f"<--{rel_str}-- {G.nodes[v].get('label', v)}")
         print(f"Shortest path ({hops} hops):\n  " + " ".join(segments))
 
     elif cmd == "explain":
@@ -2331,20 +2348,40 @@ def main() -> None:
         print(f"  Community: {d.get('community', '')}")
         print(f"  Degree:    {G.degree(nid)}")
         from graphify.build import edge_data
+        from graphify.projections import (
+            format_relationship_envelope,
+            relationship_envelope,
+        )
 
-        connections: list[tuple[str, str, dict]] = []  # (direction, neighbor_id, edge_data)
+        # (direction, neighbor_id, edge_src, edge_tgt) — src/tgt preserve the
+        # stored edge direction so the relationship envelope reads the correct
+        # parallel-edge bundle for this neighbor.
+        connections: list[tuple[str, str, str, str]] = []
         for nb in G.successors(nid):
-            connections.append(("out", nb, edge_data(G, nid, nb)))
+            connections.append(("out", nb, nid, nb))
         for nb in G.predecessors(nid):
-            connections.append(("in", nb, edge_data(G, nb, nid)))
+            connections.append(("in", nb, nb, nid))
         if connections:
             print(f"\nConnections ({len(connections)}):")
             connections.sort(key=lambda c: G.degree(c[1]), reverse=True)
-            for direction, nb, edata in connections[:20]:
-                rel = edata.get("relation", "")
-                conf = edata.get("confidence", "")
+            for direction, nb, e_src, e_tgt in connections[:20]:
                 arrow = "-->" if direction == "out" else "<--"
-                print(f"  {arrow} {G.nodes[nb].get('label', nb)} [{rel}] [{conf}]")
+                # Bundle every parallel relationship to this neighbor so a
+                # MultiDiGraph never shows only the first edge (#PR5 gate).
+                # directed_only=True isolates this connection's stored direction
+                # (e_src->e_tgt) so an "out" arrow never merges the reverse "in"
+                # relations and vice versa.
+                env = relationship_envelope(G, e_src, e_tgt, directed_only=True)
+                if len(env["relations"]) > 1:
+                    rel_block = f"[{format_relationship_envelope(G, e_src, e_tgt, directed_only=True)}]"
+                else:
+                    # Single relation (always true for simple DiGraph/Graph):
+                    # keep the historical "[rel] [conf]" form byte-stable.
+                    edata = edge_data(G, e_src, e_tgt)
+                    rel = edata.get("relation", "")
+                    conf = edata.get("confidence", "")
+                    rel_block = f"[{rel}] [{conf}]"
+                print(f"  {arrow} {G.nodes[nb].get('label', nb)} {rel_block}")
             if len(connections) > 20:
                 print(f"  ... and {len(connections) - 20} more")
 
diff --git a/graphify/projections.py b/graphify/projections.py
index a591538bd..e811d098e 100644
--- a/graphify/projections.py
+++ b/graphify/projections.py
@@ -9,6 +9,11 @@
 
 WeightMode = Literal["confidence", "count", "sum"]
 
+# Stable default for the capped relationship display envelope. Surfaces (CLI text,
+# MCP structured arrays) may override per call, but this is the canonical default
+# so "A relates to B through N relationships" renders consistently everywhere.
+DEFAULT_RELATIONSHIP_CAP = 3
+
 _CONFIDENCE_SCORE = {
     "EXTRACTED": 1.0,
     "INFERRED": 0.5,
@@ -153,8 +158,18 @@ def project_for_context(G: nx.Graph, *, contexts: Iterable[str] | str | None = N
     return H
 
 
-def edge_records_between(G: nx.Graph, u: Hashable, v: Hashable) -> list[dict[str, Any]]:
-    """Return shallow copies of all edge records connecting two nodes."""
+def edge_records_between(
+    G: nx.Graph, u: Hashable, v: Hashable, *, directed_only: bool = False
+) -> list[dict[str, Any]]:
+    """Return shallow copies of all edge records connecting two nodes.
+
+    By default (``directed_only=False``) a directed graph contributes records in
+    BOTH directions (``u->v`` and ``v->u``), which is correct for symmetric
+    "how are A and B related" queries. Set ``directed_only=True`` to collect only
+    the ``u->v`` direction, which is what directional arrow surfaces need (path
+    hops ``A-->B``, explain "out"/"in" connections). On undirected graphs the
+    flag is a no-op, as there is no separate reverse direction to collect.
+    """
     records: list[dict[str, Any]] = []
 
     def collect(src: Hashable, tgt: Hashable) -> None:
@@ -169,7 +184,7 @@ def collect(src: Hashable, tgt: Hashable) -> None:
             records.append(dict(raw))
 
     collect(u, v)
-    if G.is_directed() and u != v:
+    if not directed_only and G.is_directed() and u != v:
         collect(v, u)
     return sorted(records, key=_edge_sort_key)
 
@@ -190,6 +205,116 @@ def edge_summary_between(G: nx.Graph, u: Hashable, v: Hashable) -> dict[str, Any
     }
 
 
+def relationship_envelope(
+    G: nx.Graph,
+    u: Hashable,
+    v: Hashable,
+    *,
+    cap: int = DEFAULT_RELATIONSHIP_CAP,
+    directed_only: bool = False,
+) -> dict[str, Any]:
+    """Bundle all parallel relationships between two nodes into a capped envelope.
+
+    Returns a structured dict suitable for MCP serialization (arrays, not a
+    representative-only dict) and as the basis for text rendering::
+
+        {
+            "count": int,            # total parallel relationships (u->v plus v->u if directed)
+            "shown": list[dict],     # up to ``cap`` full edge-record dicts, in edge_records_between order
+            "truncated": int,        # max(0, count - len(shown))
+            "relations": list[str],  # ALL unique relations across every record, sorted
+            "confidences": list[str],# ALL unique confidences across every record, sorted
+        }
+
+    ``relations``/``confidences`` summarize the FULL set even when ``shown`` is
+    capped, so callers can say "calls, imports, +2 more (5 total)" accurately.
+
+    A ``cap`` below 1 shows zero records (``shown == []``) while still reporting
+    the full ``count``/``relations``/``confidences``; negative caps are clamped to
+    zero rather than slicing from the tail.
+
+    ``directed_only`` is threaded through to :func:`edge_records_between`: with the
+    default ``False`` a directed graph bundles both directions (symmetric view),
+    while ``True`` restricts the envelope to the ``u->v`` direction for directional
+    arrow surfaces. It is a no-op on undirected graphs.
+    """
+    records = edge_records_between(G, u, v, directed_only=directed_only)
+    effective_cap = cap if cap > 0 else 0
+    shown = records[:effective_cap]
+    return {
+        "count": len(records),
+        "shown": shown,
+        "truncated": max(0, len(records) - len(shown)),
+        "relations": sorted(
+            {str(record.get("relation")) for record in records if record.get("relation")}
+        ),
+        "confidences": sorted(
+            {str(record.get("confidence")) for record in records if record.get("confidence")}
+        ),
+    }
+
+
+def format_relationship_envelope(
+    G: nx.Graph,
+    u: Hashable,
+    v: Hashable,
+    *,
+    cap: int = DEFAULT_RELATIONSHIP_CAP,
+    directed_only: bool = False,
+) -> str:
+    """Render a stable one-line summary of all relationships between two nodes.
+
+    Examples::
+
+        single relation:        "calls"
+        single with confidence: "calls (EXTRACTED)"  # confidence shown only if present
+        multiple within cap:    "calls, contains, imports"
+        capped:                 "calls, contains, imports (+2 more, 5 total)"
+        none:                   ""  # empty string when no edge exists
+
+    The displayed list is the unique sorted relations from the envelope, so a
+    relation appearing on several parallel edges is listed once. When the number
+    of UNIQUE relations exceeds ``cap``, the first ``cap`` relations are shown
+    followed by ``(+K more, N total)`` where ``N`` is the total relationship count
+    (edge records, not unique relations) and ``K`` is ``unique_relations - cap``.
+
+    Confidence is only appended in the single-relation case where exactly one
+    confidence value exists. For multi-relation lines per-relation confidence is
+    omitted to keep the line stable and deterministic; the full confidence set
+    remains available via :func:`relationship_envelope` for structured consumers.
+    A ``cap`` below 1 is clamped to zero (no leading relations are shown).
+
+    ``directed_only`` is threaded through to :func:`relationship_envelope`: with
+    the default ``False`` directed graphs render the symmetric both-direction
+    summary, while ``True`` restricts the line to the ``u->v`` direction for
+    directional arrow surfaces. It is a no-op on undirected graphs.
+    """
+    envelope = relationship_envelope(G, u, v, cap=cap, directed_only=directed_only)
+    count = envelope["count"]
+    if count == 0:
+        return ""
+
+    relations: list[str] = envelope["relations"]
+    confidences: list[str] = envelope["confidences"]
+
+    if len(relations) == 1:
+        relation = relations[0]
+        if len(confidences) == 1:
+            return f"{relation} ({confidences[0]})"
+        return relation
+
+    effective_cap = cap if cap > 0 else 0
+    if len(relations) <= effective_cap:
+        return ", ".join(relations)
+
+    shown_relations = relations[:effective_cap]
+    more = len(relations) - len(shown_relations)
+    suffix = f"(+{more} more, {count} total)"
+    if not shown_relations:
+        return suffix
+    return f"{', '.join(shown_relations)} {suffix}"
+
+
 def distinct_neighbor_degree(G: nx.Graph, node: Hashable) -> int:
     """Count unique adjacent nodes without inflating parallel edges."""
     if node not in G:
diff --git a/graphify/serve.py b/graphify/serve.py
index 9fa455bac..0b90cea35 100644
--- a/graphify/serve.py
+++ b/graphify/serve.py
@@ -9,7 +9,10 @@
 import networkx as nx
 from networkx.readwrite import json_graph
 from graphify.security import sanitize_label, check_graph_file_size_cap
-from graphify.build import edge_data
+from graphify.projections import (
+    format_relationship_envelope,
+    relationship_envelope,
+)
 
 try:
     import jieba as _jieba  # type: ignore[import-untyped]
@@ -379,18 +382,36 @@ def _subgraph_to_text(
         lines.append(line)
     for u, v in edges:
         if u in nodes and v in nodes:
-            raw = G[u][v]
-            d = (
-                next(iter(raw.values()), {})
-                if isinstance(G, (nx.MultiGraph, nx.MultiDiGraph))
-                else raw
-            )
-            context = d.get("context")
-            context_suffix = f" context={sanitize_label(str(context))}" if context else ""
+            # _bfs/_dfs collect edges via G.neighbors(n), which yields SUCCESSORS
+            # on directed graphs, so (u, v) is already a real forward edge. Bundle
+            # with directed_only=True so the directional EDGE u-->v arrow reports
+            # only u->v relations and never bleeds in the reverse v->u edge.
+            # Compute the envelope ONCE and derive both the count gate and the
+            # text from it (no redundant second traversal).
+            env = relationship_envelope(G, u, v, directed_only=True)
+            if env["count"] == 0:
+                continue
+            if len(env["relations"]) <= 1:
+                # Single relation: reconstruct the historical EDGE format
+                # byte-for-byte (confidence + optional context inside one square
+                # bracket group) so downstream EDGE-line parsers and the
+                # path/explain surfaces stay consistent.
+                d = env["shown"][0] if env["shown"] else {}
+                context = d.get("context")
+                context_suffix = (
+                    f" context={sanitize_label(str(context))}" if context else ""
+                )
+                relation = sanitize_label(str(d.get("relation", "")))
+                confidence = sanitize_label(str(d.get("confidence", "")))
+                relation_segment = f"{relation} [{confidence}{context_suffix}]"
+            else:
+                # Multiple relations: bundle via the capped envelope text.
+                relation_segment = sanitize_label(
+                    format_relationship_envelope(G, u, v, directed_only=True)
+                )
             line = (
                 f"EDGE {sanitize_label(G.nodes[u].get('label', u))} "
-                f"--{sanitize_label(str(d.get('relation', '')))} "
-                f"[{sanitize_label(str(d.get('confidence', '')))}{context_suffix}]--> "
+                f"--{relation_segment}--> "
                 f"{sanitize_label(G.nodes[v].get('label', v))}"
             )
             lines.append(line)
@@ -468,6 +489,140 @@ def _find_node(G: nx.Graph, label: str) -> list[str]:
     return exact + prefix + substring
 
 
+def _neighbors_text(G: nx.Graph, label: str, *, relation_filter: str = "") -> str:
+    """Render all direct neighbors of a node with their full relationship bundles.
+
+    Each neighbor appears once per direction (--> outgoing, <-- incoming) with
+    every parallel relationship for that direction bundled via the relationship
+    envelope, never collapsed to a first-edge-only representative. On a simple
+    DiGraph/Graph the bundle is a single relation, so output is unchanged in
+    substance from the pre-envelope rendering.
+    """
+    rel_filter = relation_filter.lower()
+    matches = _find_node(G, label.lower())
+    if not matches:
+        return f"No node matching '{label.lower()}' found."
+    nid = matches[0]
+    lines = [f"Neighbors of {sanitize_label(G.nodes[nid].get('label', nid))}:"]
+    directed_graph = cast(Any, G)
+
+    def _passes_filter(relations: list[str]) -> bool:
+        # Honour relation_filter against EVERY parallel relation, not just a
+        # first-edge representative, so a matching relation hidden behind a
+        # parallel edge is still surfaced.
+        if not rel_filter:
+            return True
+        return any(rel_filter in rel.lower() for rel in relations)
+
+    def _relation_bracket(src: str, tgt: str, env: dict[str, Any]) -> str:
+        # Single relation (always true for a simple DiGraph/Graph): reconstruct
+        # the historical two-bracket form `[rel] [conf]` byte-for-byte so the MCP
+        # output stays consistent with the path/explain surfaces. Multiple
+        # relations: emit the capped directional bundle.
+        if len(env["relations"]) <= 1:
+            d = env["shown"][0] if env["shown"] else {}
+            rel = sanitize_label(str(d.get("relation", "")))
+            confidence = sanitize_label(str(d.get("confidence", "")))
+            return f"[{rel}] [{confidence}]"
+        return f"[{sanitize_label(format_relationship_envelope(G, src, tgt, directed_only=True))}]"
+
+    # successors()/predecessors() yield each neighbour once even on a
+    # MultiDiGraph, so each appears a single time per direction with its full
+    # bundle of parallel relationships rather than one line per parallel edge.
+    # The envelope is computed ONCE per neighbour and reused for both the
+    # relation filter and the rendering.
+    for nb in directed_graph.successors(nid):
+        env = relationship_envelope(G, nid, nb, directed_only=True)
+        if not _passes_filter(env["relations"]):
+            continue
+        lines.append(
+            f"  --> {sanitize_label(G.nodes[nb].get('label', nb))} "
+            f"{_relation_bracket(nid, nb, env)}"
+        )
+    for nb in directed_graph.predecessors(nid):
+        env = relationship_envelope(G, nb, nid, directed_only=True)
+        if not _passes_filter(env["relations"]):
+            continue
+        lines.append(
+            f"  <-- {sanitize_label(G.nodes[nb].get('label', nb))} "
+            f"{_relation_bracket(nb, nid, env)}"
+        )
+    return "\n".join(lines)
+
+
+def _shortest_path_text(G: nx.Graph, source: str, target: str, *, max_hops: int = 8) -> str:
+    """Render the shortest path between two concepts with bundled per-hop relations.
+
+    Each hop shows ALL parallel relationships for the traversed pair (capped)
+    via the relationship envelope instead of a single first-edge representative,
+    so a hop carrying e.g. both ``calls`` and ``contains`` is fully visible.
+    """
+    src_scored = _score_nodes(G, [t.lower() for t in source.split()])
+    tgt_scored = _score_nodes(G, [t.lower() for t in target.split()])
+    if not src_scored:
+        return f"No node matching source '{source}' found."
+    if not tgt_scored:
+        return f"No node matching target '{target}' found."
+    src_nid, tgt_nid = src_scored[0][1], tgt_scored[0][1]
+    # Ambiguity guard: when both queries resolve to the same node, the
+    # shortest path is trivially zero hops, which is almost never what the
+    # caller wanted (see bug #828).
+    if src_nid == tgt_nid:
+        return (
+            f"'{source}' and '{target}' both resolved to "
+            f"the same node '{src_nid}'. Use a more specific label or the exact node ID."
+        )
+    warnings: list[str] = []
+    for name, scored in (("source", src_scored), ("target", tgt_scored)):
+        if len(scored) >= 2:
+            top, runner = scored[0][0], scored[1][0]
+            if top > 0 and (top - runner) / top < 0.10:
+                warnings.append(
+                    f"warning: {name} match was ambiguous "
+                    f"(top score {top:g}, runner-up {runner:g})"
+                )
+    try:
+        # Use undirected view for path-finding (works regardless of query src/tgt order)
+        path_nodes = nx.shortest_path(G.to_undirected(as_view=True), src_nid, tgt_nid)
+    except (nx.NetworkXNoPath, nx.NodeNotFound):
+        return (
+            f"No path found between '{G.nodes[src_nid].get('label', src_nid)}' "
+            f"and '{G.nodes[tgt_nid].get('label', tgt_nid)}'."
+        )
+    hops = len(path_nodes) - 1
+    if hops > max_hops:
+        return f"Path exceeds max_hops={max_hops} ({hops} hops found)."
+    segments = []
+    for i in range(len(path_nodes) - 1):
+        u, v = path_nodes[i], path_nodes[i + 1]
+        # Bundle the forward direction of the traversed pair. shortest_path walks
+        # an undirected view, so resolve which stored direction actually carries
+        # the edge, then compute its directional envelope ONCE.
+        if G.has_edge(u, v):
+            src, tgt, forward = u, v, True
+        else:
+            src, tgt, forward = v, u, False
+        env = relationship_envelope(G, src, tgt, directed_only=True)
+        # Single relation (always true for a simple DiGraph/Graph): reconstruct
+        # the historical `{rel} [conf]` hop label byte-for-byte (no confidence
+        # bracket when confidence is absent). Multiple relations: capped bundle.
+        if len(env["relations"]) <= 1:
+            d = env["shown"][0] if env["shown"] else {}
+            rel = d.get("relation", "")
+            conf = d.get("confidence", "")
+            summary = f"{rel} [{conf}]" if conf else f"{rel}"
+        else:
+            summary = format_relationship_envelope(G, src, tgt, directed_only=True)
+        if i == 0:
+            segments.append(G.nodes[u].get("label", u))
+        if forward:
+            segments.append(f"--{summary}--> {G.nodes[v].get('label', v)}")
+        else:
+            segments.append(f"<--{summary}-- {G.nodes[v].get('label', v)}")
+    prefix = ("\n".join(warnings) + "\n") if warnings else ""
+    return prefix + f"Shortest path ({hops} hops):\n  " + " ".join(segments)
+
+
 def _filter_blank_stdin() -> None:
     """Filter blank lines from stdin before MCP reads it.
 
@@ -765,33 +920,11 @@ def _tool_get_node(arguments: dict) -> str:
         )
 
     def _tool_get_neighbors(arguments: dict) -> str:
-        label = arguments["label"].lower()
-        rel_filter = arguments.get("relation_filter", "").lower()
-        matches = _find_node(G, label)
-        if not matches:
-            return f"No node matching '{label}' found."
-        nid = matches[0]
-        lines = [f"Neighbors of {sanitize_label(G.nodes[nid].get('label', nid))}:"]
-        directed_graph = cast(Any, G)
-        for nb in directed_graph.successors(nid):
-            d = edge_data(G, nid, nb)
-            rel = d.get("relation", "")
-            if rel_filter and rel_filter not in rel.lower():
-                continue
-            lines.append(
-                f"  --> {sanitize_label(G.nodes[nb].get('label', nb))} "
-                f"[{sanitize_label(str(rel))}] [{sanitize_label(str(d.get('confidence', '')))}]"
-            )
-        for nb in directed_graph.predecessors(nid):
-            d = edge_data(G, nb, nid)
-            rel = d.get("relation", "")
-            if rel_filter and rel_filter not in rel.lower():
-                continue
-            lines.append(
-                f"  <-- {sanitize_label(G.nodes[nb].get('label', nb))} "
-                f"[{sanitize_label(str(rel))}] [{sanitize_label(str(d.get('confidence', '')))}]"
-            )
-        return "\n".join(lines)
+        return _neighbors_text(
+            G,
+            arguments["label"],
+            relation_filter=arguments.get("relation_filter", ""),
+        )
 
     def _tool_get_community(arguments: dict) -> str:
         cid = int(arguments["community_id"])
@@ -829,59 +962,12 @@ def _tool_graph_stats(_: dict) -> str:
         )
 
     def _tool_shortest_path(arguments: dict) -> str:
-        src_scored = _score_nodes(G, [t.lower() for t in arguments["source"].split()])
-        tgt_scored = _score_nodes(G, [t.lower() for t in arguments["target"].split()])
-        if not src_scored:
-            return f"No node matching source '{arguments['source']}' found."
-        if not tgt_scored:
-            return f"No node matching target '{arguments['target']}' found."
-        src_nid, tgt_nid = src_scored[0][1], tgt_scored[0][1]
-        # Ambiguity guard: when both queries resolve to the same node, the
-        # shortest path is trivially zero hops, which is almost never what the
-        # caller wanted (see bug #828).
-        if src_nid == tgt_nid:
-            return (
-                f"'{arguments['source']}' and '{arguments['target']}' both resolved to "
-                f"the same node '{src_nid}'. Use a more specific label or the exact node ID."
-            )
-        warnings: list[str] = []
-        for name, scored in (("source", src_scored), ("target", tgt_scored)):
-            if len(scored) >= 2:
-                top, runner = scored[0][0], scored[1][0]
-                if top > 0 and (top - runner) / top < 0.10:
-                    warnings.append(
-                        f"warning: {name} match was ambiguous "
-                        f"(top score {top:g}, runner-up {runner:g})"
-                    )
-        max_hops = int(arguments.get("max_hops", 8))
-        try:
-            # Use undirected view for path-finding (works regardless of query src/tgt order)
-            path_nodes = nx.shortest_path(G.to_undirected(as_view=True), src_nid, tgt_nid)
-        except (nx.NetworkXNoPath, nx.NodeNotFound):
-            return f"No path found between '{G.nodes[src_nid].get('label', src_nid)}' and '{G.nodes[tgt_nid].get('label', tgt_nid)}'."
-        hops = len(path_nodes) - 1
-        if hops > max_hops:
-            return f"Path exceeds max_hops={max_hops} ({hops} hops found)."
-        segments = []
-        for i in range(len(path_nodes) - 1):
-            u, v = path_nodes[i], path_nodes[i + 1]
-            if G.has_edge(u, v):
-                edata = edge_data(G, u, v)
-                forward = True
-            else:
-                edata = edge_data(G, v, u)
-                forward = False
-            rel = edata.get("relation", "")
-            conf = edata.get("confidence", "")
-            conf_str = f" [{conf}]" if conf else ""
-            if i == 0:
-                segments.append(G.nodes[u].get("label", u))
-            if forward:
-                segments.append(f"--{rel}{conf_str}--> {G.nodes[v].get('label', v)}")
-            else:
-                segments.append(f"<--{rel}{conf_str}-- {G.nodes[v].get('label', v)}")
-        prefix = ("\n".join(warnings) + "\n") if warnings else ""
-        return prefix + f"Shortest path ({hops} hops):\n  " + " ".join(segments)
+        return _shortest_path_text(
+            G,
+            arguments["source"],
+            arguments["target"],
+            max_hops=int(arguments.get("max_hops", 8)),
+        )
 
     def _tool_list_prs(arguments: dict) -> str:
         from graphify.prs import fetch_prs, fetch_worktrees, format_prs_text, _detect_default_branch
diff --git a/tests/test_explain_cli.py b/tests/test_explain_cli.py
index f96b896cd..98c42fdbd 100644
--- a/tests/test_explain_cli.py
+++ b/tests/test_explain_cli.py
@@ -86,3 +86,93 @@ def test_caller_shows_callee_as_outbound(monkeypatch, tmp_path, capsys):
     out = _run(monkeypatch, p, "createPatchHandler", capsys)
     assert "--> validateSanitySession() [calls]" in out
     assert "<-- " not in out
+
+
+def _write_multigraph(tmp_path, relations):
+    """Node 'a' with `relations` parallel edges to neighbor 'b' (MultiDiGraph)."""
+    links = [
+        {"source": "a", "target": "b", "relation": rel, "key": idx}
+        for idx, rel in enumerate(relations)
+    ]
+    graph_data = {
+        "directed": True,
+        "multigraph": True,
+        "graph": {},
+        "nodes": [
+            {"id": "a", "label": "alpha()", "source_file": "a.py", "community": 0},
+            {"id": "b", "label": "beta()", "source_file": "b.py", "community": 0},
+        ],
+        "links": links,
+    }
+    p = tmp_path / "graph.json"
+    p.write_text(json.dumps(graph_data))
+    return p
+
+
+def test_explain_multigraph_neighbor_bundles_relations(monkeypatch, tmp_path, capsys):
+    """PR5 gate: a neighbor reached by 4 parallel edges shows the bundle, not one."""
+    p = _write_multigraph(tmp_path, ["calls", "imports", "contains", "reads"])
+    out = _run(monkeypatch, p, "alpha()", capsys)
+    # 4 unique relations exceeds the default cap (3), so a capped bundle renders
+    # the bundle for that neighbor rather than a single first-edge relation.
+    assert "--> beta() [calls, contains, imports (+1 more, 4 total)]" in out
+    # First-edge-only regression guard: a lone "[calls] [...]" block must NOT appear.
+    assert "--> beta() [calls] [" not in out
+
+
+def test_explain_multigraph_capped_summary(monkeypatch, tmp_path, capsys):
+    """A neighbor pair with >3 unique relations renders the capped (+K more, N total) form."""
+    p = _write_multigraph(tmp_path, ["gamma", "alpha", "epsilon", "beta", "delta"])
+    out = _run(monkeypatch, p, "alpha()", capsys)
+    # sorted unique: alpha, beta, delta, epsilon, gamma -> first 3 + capped suffix.
+    assert "--> beta() [alpha, beta, delta (+2 more, 5 total)]" in out
+
+
+def test_explain_simple_graph_output_regression(monkeypatch, tmp_path, capsys):
+    """Simple DiGraph explain output is unchanged: '[rel] [conf]' per neighbor."""
+    p = _write_graph(tmp_path)
+    out = _run(monkeypatch, p, "validateSanitySession", capsys)
+    # Byte-stable bracketed form, matching test_callee_shows_callers_as_inbound.
+    assert "<-- createPatchHandler() [calls]" in out
+    assert "<-- createEditHandler() [calls]" in out
+    assert "--> stableStringify() [calls]" in out
+
+
+def _write_bidirectional_multigraph(tmp_path):
+    """A<->B with different relations each way: A->B 'calls', B->A 'imports'."""
+    graph_data = {
+        "directed": True,
+        "multigraph": True,
+        "graph": {},
+        "nodes": [
+            {"id": "a", "label": "alpha()", "source_file": "a.py", "community": 0},
+            {"id": "b", "label": "beta()", "source_file": "b.py", "community": 0},
+        ],
+        "links": [
+            {"source": "a", "target": "b", "relation": "calls", "confidence": "EXTRACTED", "key": 0},
+            {"source": "b", "target": "a", "relation": "imports", "confidence": "EXTRACTED", "key": 0},
+        ],
+    }
+    p = tmp_path / "graph.json"
+    p.write_text(json.dumps(graph_data))
+    return p
+
+
+def test_explain_directional_isolation(monkeypatch, tmp_path, capsys):
+    """Out and in connections to the same neighbor stay isolated by direction.
+
+    Regression for the directed_only fix: relationship_envelope merges both
+    directions by default, which would wrongly show 'calls, imports' on both
+    the out (-->) and in (<--) arrows. directed_only=True isolates each
+    connection's own stored direction.
+    """
+    p = _write_bidirectional_multigraph(tmp_path)
+    out = _run(monkeypatch, p, "alpha()", capsys)
+    # Outgoing A->B shows ONLY 'calls'; incoming B->A shows ONLY 'imports'.
+    assert "--> beta() [calls] [EXTRACTED]" in out
+    assert "<-- beta() [imports] [EXTRACTED]" in out
+    # Neither arrow may merge the opposite direction's relation.
+    assert "--> beta() [calls, imports" not in out
+    assert "<-- beta() [calls, imports" not in out
+    assert "--> beta() [imports" not in out
+    assert "<-- beta() [calls]" not in out
diff --git a/tests/test_path_cli.py b/tests/test_path_cli.py
index 57ae3ebdd..7fd505bee 100644
--- a/tests/test_path_cli.py
+++ b/tests/test_path_cli.py
@@ -60,3 +60,86 @@ def test_reverse_arrow(monkeypatch, tmp_path, capsys):
     assert "Shortest path (1 hops):" in out
     assert "validateSanitySession() <--calls [EXTRACTED]-- createPatchHandler()" in out
     assert "validateSanitySession() --calls [EXTRACTED]--> createPatchHandler()" not in out
+
+
+def _write_multigraph(tmp_path):
+    """A->B with 3 parallel relations, B->C with a single relation."""
+    graph_data = {
+        "directed": True,
+        "multigraph": True,
+        "graph": {},
+        "nodes": [
+            {"id": "a", "label": "alpha()", "source_file": "a.py", "community": 0},
+            {"id": "b", "label": "beta()", "source_file": "b.py", "community": 0},
+            {"id": "c", "label": "gamma()", "source_file": "c.py", "community": 0},
+        ],
+        "links": [
+            {"source": "a", "target": "b", "relation": "calls", "confidence": "EXTRACTED", "key": 0},
+            {"source": "a", "target": "b", "relation": "imports", "confidence": "EXTRACTED", "key": 1},
+            {"source": "a", "target": "b", "relation": "contains", "confidence": "EXTRACTED", "key": 2},
+            {"source": "b", "target": "c", "relation": "returns", "confidence": "INFERRED", "key": 0},
+        ],
+    }
+    p = tmp_path / "graph.json"
+    p.write_text(json.dumps(graph_data))
+    return p
+
+
+def test_path_multigraph_hop_shows_all_relations(monkeypatch, tmp_path, capsys):
+    """PR5 gate: a MultiDiGraph hop bundles all parallel relations, never first-only."""
+    p = _write_multigraph(tmp_path)
+    out = _run(monkeypatch, p, "alpha", "gamma", capsys)
+    assert "Shortest path (2 hops):" in out
+    # The A->B hop carries 3 parallel relations: all must appear (sorted, unique).
+    assert "--calls, contains, imports--> beta()" in out
+    # First-edge-only regression guard: the lone "calls" hop form must NOT appear.
+    assert "--calls [EXTRACTED]--> beta()" not in out
+    # The single-relation B->C hop stays byte-stable.
+    assert "--returns [INFERRED]--> gamma()" in out
+
+
+def test_path_simple_graph_output_regression(monkeypatch, tmp_path, capsys):
+    """Simple DiGraph path output is unchanged: single relation per hop."""
+    p = _write_graph(tmp_path)
+    out = _run(monkeypatch, p, "createPatchHandler", "validateSanitySession", capsys)
+    # Byte-stable single-relation form, matching test_forward_arrow exactly.
+    assert "createPatchHandler() --calls [EXTRACTED]--> validateSanitySession()" in out
+
+
+def _write_bidirectional_multigraph(tmp_path):
+    """A->B 'calls', B->A 'imports' (opposite relations), plus B->C so the
+    shortest A->C path renders the A->B hop in its stored forward direction."""
+    graph_data = {
+        "directed": True,
+        "multigraph": True,
+        "graph": {},
+        "nodes": [
+            {"id": "a", "label": "alpha()", "source_file": "a.py", "community": 0},
+            {"id": "b", "label": "beta()", "source_file": "b.py", "community": 0},
+            {"id": "c", "label": "gamma()", "source_file": "c.py", "community": 0},
+        ],
+        "links": [
+            {"source": "a", "target": "b", "relation": "calls", "confidence": "EXTRACTED", "key": 0},
+            {"source": "b", "target": "a", "relation": "imports", "confidence": "EXTRACTED", "key": 0},
+            {"source": "b", "target": "c", "relation": "returns", "confidence": "INFERRED", "key": 0},
+        ],
+    }
+    p = tmp_path / "graph.json"
+    p.write_text(json.dumps(graph_data))
+    return p
+
+
+def test_path_directional_isolation(monkeypatch, tmp_path, capsys):
+    """A->B hop renders only the forward 'calls' relation, never the reverse 'imports'.
+
+    Regression for the directed_only fix: relationship_envelope merges both
+    directions by default, which would wrongly bundle B->A 'imports' onto the
+    A-->B arrow. directed_only=True must isolate the stored hop direction.
+    """
+    p = _write_bidirectional_multigraph(tmp_path)
+    out = _run(monkeypatch, p, "alpha", "gamma", capsys)
+    assert "Shortest path (2 hops):" in out
+    # Forward hop shows ONLY 'calls' (byte-stable single-relation form).
+    assert "alpha() --calls [EXTRACTED]--> beta()" in out
+    # The reverse-direction 'imports' must NOT bleed into the forward arrow.
+    assert "imports" not in out
diff --git a/tests/test_projections.py b/tests/test_projections.py
index 140274eb5..e362ad701 100644
--- a/tests/test_projections.py
+++ b/tests/test_projections.py
@@ -5,14 +5,17 @@
 from typing import Any, cast
 
 from graphify.projections import (
+    DEFAULT_RELATIONSHIP_CAP,
     distinct_neighbor_degree,
     edge_records_between,
     edge_summary_between,
+    format_relationship_envelope,
     normalize_to_multidigraph,
     project_for_callflow,
     project_for_community,
     project_for_context,
     project_for_path,
+    relationship_envelope,
 )
 
 
@@ -200,3 +203,222 @@ def test_normalize_to_multidigraph_preserves_parallel_keys_and_simple_edges() ->
     assert isinstance(simple_normalized, nx.MultiDiGraph)
     assert simple_normalized.number_of_edges("x", "y") == 1
     assert next(iter(simple_normalized["x"]["y"].values()))["relation"] == "uses"
+
+
+# ---------------------------------------------------------------------------
+# relationship_envelope / format_relationship_envelope
+# ---------------------------------------------------------------------------
+
+
+def _multidigraph_with_parallel_relations(
+    relations: list[str], *, confidence: str | None = None
+) -> nx.MultiDiGraph:
+    """Build A->B with one parallel edge per supplied relation."""
+    graph = nx.MultiDiGraph()
+    graph.add_node("a", label="A")
+    graph.add_node("b", label="B")
+    for index, relation in enumerate(relations):
+        attrs: dict[str, Any] = {"relation": relation}
+        if confidence is not None:
+            attrs["confidence"] = confidence
+        graph.add_edge("a", "b", key=f"{relation}-{index}", **attrs)
+    return graph
+
+
+def test_relationship_envelope_single_edge() -> None:
+    graph = nx.DiGraph()
+    graph.add_edge("a", "b", relation="calls", confidence="EXTRACTED")
+
+    envelope = relationship_envelope(graph, "a", "b")
+
+    assert envelope["count"] == 1
+    assert len(envelope["shown"]) == 1
+    assert envelope["shown"][0]["relation"] == "calls"
+    assert envelope["truncated"] == 0
+    assert envelope["relations"] == ["calls"]
+    assert envelope["confidences"] == ["EXTRACTED"]
+
+
+def test_relationship_envelope_multidigraph_bundles_all() -> None:
+    graph = _multidigraph_with_parallel_relations(["calls", "imports", "contains"])
+
+    envelope = relationship_envelope(graph, "a", "b")
+
+    assert envelope["count"] == 3
+    assert envelope["relations"] == ["calls", "contains", "imports"]
+    assert len(envelope["shown"]) == 3  # default cap == 3 fits all
+    assert envelope["truncated"] == 0
+    # shown records mirror edge_records_between ordering
+    assert envelope["shown"] == edge_records_between(graph, "a", "b")
+
+
+def test_relationship_envelope_caps_shown() -> None:
+    graph = _multidigraph_with_parallel_relations(["r1", "r2", "r3", "r4", "r5"])
+
+    envelope = relationship_envelope(graph, "a", "b", cap=3)
+
+    assert envelope["count"] == 5
+    assert len(envelope["shown"]) == 3
+    assert envelope["truncated"] == 2
+    assert envelope["relations"] == ["r1", "r2", "r3", "r4", "r5"]
+    # shown is the leading slice of the full sorted record list
+    assert envelope["shown"] == edge_records_between(graph, "a", "b")[:3]
+
+
+def test_relationship_envelope_cap_zero_or_negative() -> None:
+    graph = _multidigraph_with_parallel_relations(
+        ["calls", "imports", "contains"], confidence="EXTRACTED"
+    )
+
+    zero = relationship_envelope(graph, "a", "b", cap=0)
+    assert zero["shown"] == []
+    assert zero["truncated"] == zero["count"] == 3
+    assert zero["relations"] == ["calls", "contains", "imports"]
+    assert zero["confidences"] == ["EXTRACTED"]
+
+    negative = relationship_envelope(graph, "a", "b", cap=-1)
+    assert negative["shown"] == []
+    assert negative["truncated"] == negative["count"] == 3
+    assert negative["relations"] == ["calls", "contains", "imports"]
+
+
+def test_relationship_envelope_directed_both_directions() -> None:
+    graph = nx.DiGraph()
+    graph.add_edge("a", "b", relation="calls", confidence="EXTRACTED")
+    graph.add_edge("b", "a", relation="returns", confidence="INFERRED")
+
+    envelope = relationship_envelope(graph, "a", "b")
+
+    assert envelope["count"] == 2
+    assert envelope["relations"] == ["calls", "returns"]
+    assert envelope["confidences"] == ["EXTRACTED", "INFERRED"]
+    assert envelope["shown"] == edge_records_between(graph, "a", "b")
+
+
+def test_relationship_envelope_no_edge() -> None:
+    graph = nx.DiGraph()
+    graph.add_node("a")
+    graph.add_node("b")
+
+    envelope = relationship_envelope(graph, "a", "b")
+
+    assert envelope["count"] == 0
+    assert envelope["shown"] == []
+    assert envelope["truncated"] == 0
+    assert envelope["relations"] == []
+    assert envelope["confidences"] == []
+
+
+def test_format_relationship_envelope_single() -> None:
+    without_confidence = nx.DiGraph()
+    without_confidence.add_edge("a", "b", relation="calls")
+    assert format_relationship_envelope(without_confidence, "a", "b") == "calls"
+
+    with_confidence = nx.DiGraph()
+    with_confidence.add_edge("a", "b", relation="calls", confidence="EXTRACTED")
+    assert format_relationship_envelope(with_confidence, "a", "b") == "calls (EXTRACTED)"
+
+
+def test_format_relationship_envelope_multiple_within_cap() -> None:
+    graph = _multidigraph_with_parallel_relations(
+        ["imports", "calls", "contains"], confidence="EXTRACTED"
+    )
+
+    # 3 unique relations within the default cap; confidence omitted for multi-relation lines
+    assert format_relationship_envelope(graph, "a", "b") == "calls, contains, imports"
+
+
+def test_format_relationship_envelope_capped() -> None:
+    graph = _multidigraph_with_parallel_relations(
+        ["gamma", "alpha", "epsilon", "beta", "delta"]
+    )
+
+    # sorted unique relations: alpha, beta, delta, epsilon, gamma -> first 3 shown
+    assert (
+        format_relationship_envelope(graph, "a", "b", cap=3)
+        == "alpha, beta, delta (+2 more, 5 total)"
+    )
+
+
+def test_format_relationship_envelope_empty() -> None:
+    graph = nx.DiGraph()
+    graph.add_node("a")
+    graph.add_node("b")
+
+    assert format_relationship_envelope(graph, "a", "b") == ""
+
+
+def test_relationship_envelope_simple_graph_regression() -> None:
+    graph = nx.DiGraph()
+    graph.add_edge("a", "b", relation="calls")
+    graph.add_edge("a", "c", relation="imports")
+
+    # Plain DiGraph: no parallel edges, so the envelope between a single pair
+    # reflects exactly the one edge and shown == all records (cap unreached).
+    assert DEFAULT_RELATIONSHIP_CAP == 3
+    envelope = relationship_envelope(graph, "a", "b")
+    assert envelope["count"] == graph.number_of_edges("a", "b") == 1
+    assert envelope["shown"] == edge_records_between(graph, "a", "b")
+    assert envelope["truncated"] == 0
+
+
+def _bidirectional_digraph() -> nx.DiGraph:
+    """Directed A->B (calls) plus the reverse B->A (imports)."""
+    graph = nx.DiGraph()
+    graph.add_edge("a", "b", relation="calls", confidence="EXTRACTED")
+    graph.add_edge("b", "a", relation="imports", confidence="INFERRED")
+    return graph
+
+
+def test_edge_records_between_directed_only_excludes_reverse() -> None:
+    graph = _bidirectional_digraph()
+
+    both = edge_records_between(graph, "a", "b")
+    assert len(both) == 2
+    assert {record["relation"] for record in both} == {"calls", "imports"}
+
+    forward = edge_records_between(graph, "a", "b", directed_only=True)
+    assert len(forward) == 1
+    assert forward[0]["relation"] == "calls"
+
+
+def test_relationship_envelope_directed_only() -> None:
+    graph = _bidirectional_digraph()
+
+    envelope = relationship_envelope(graph, "a", "b", directed_only=True)
+
+    assert envelope["count"] == 1
+    assert envelope["relations"] == ["calls"]
+    assert "imports" not in envelope["relations"]
+    assert [record["relation"] for record in envelope["shown"]] == ["calls"]
+
+
+def test_format_relationship_envelope_directed_only() -> None:
+    graph = _bidirectional_digraph()
+
+    # Single forward relation with confidence present -> "calls (EXTRACTED)".
+    rendered = format_relationship_envelope(graph, "a", "b", directed_only=True)
+    assert rendered == "calls (EXTRACTED)"
+    assert "imports" not in rendered
+
+    # Without confidence the single forward relation renders bare.
+    plain = nx.DiGraph()
+    plain.add_edge("a", "b", relation="calls")
+    plain.add_edge("b", "a", relation="imports")
+    assert format_relationship_envelope(plain, "a", "b", directed_only=True) == "calls"
+
+
+def test_directed_only_noop_on_undirected() -> None:
+    graph = nx.Graph()
+    graph.add_edge("a", "b", relation="calls", confidence="EXTRACTED")
+    graph.add_edge("a", "b", relation="imports")  # simple graph: overwrites attrs, single edge
+
+    assert edge_records_between(graph, "a", "b", directed_only=True) == edge_records_between(
+        graph, "a", "b"
+    )
+    assert relationship_envelope(graph, "a", "b", directed_only=True) == relationship_envelope(
+        graph, "a", "b"
+    )
+    assert format_relationship_envelope(
+        graph, "a", "b", directed_only=True
+    ) == format_relationship_envelope(graph, "a", "b")
diff --git a/tests/test_serve_multigraph.py b/tests/test_serve_multigraph.py
new file mode 100644
index 000000000..e05bf755d
--- /dev/null
+++ b/tests/test_serve_multigraph.py
@@ -0,0 +1,355 @@
+"""MultiDiGraph display tests for serve.py surfaces (PR 5 first-edge-only gate).
+
+These verify that serve.py's read/query/path surfaces never collapse parallel
+edges between two nodes to a single first-edge representative. Every surface
+must show all relevant relationships or an explicit capped summary, while plain
+DiGraph output stays unchanged in substance (single relation per pair, no
+``(+K more)`` marker).
+"""
+
+from __future__ import annotations
+
+import re
+
+import networkx as nx
+
+from graphify.serve import (
+    _bfs,
+    _neighbors_text,
+    _query_graph_text,
+    _shortest_path_text,
+    _subgraph_to_text,
+)
+
+# Pattern for the capped-summary marker, e.g. "(+3 more, 6 total)".
+_CAPPED_MARKER = re.compile(r"\(\+\d+ more, \d+ total\)")
+
+
+def _multidigraph_hop(
+    relations: list[str], *, confidence: str = "EXTRACTED"
+) -> nx.MultiDiGraph:
+    """Build A(Alpha) -> B(Beta) with one parallel edge per supplied relation."""
+    graph = nx.MultiDiGraph()
+    graph.add_node("a", label="Alpha", source_file="a.py", source_location="L1", community=0)
+    graph.add_node("b", label="Beta", source_file="b.py", source_location="L1", community=0)
+    for index, relation in enumerate(relations):
+        graph.add_edge(
+            "a", "b", key=f"{relation}-{index}", relation=relation, confidence=confidence
+        )
+    return graph
+
+
+# ---------------------------------------------------------------------------
+# 1. query/subgraph text shows all relations on a multi-relation hop
+# ---------------------------------------------------------------------------
+
+
+def test_query_text_multigraph_shows_all_relations():
+    """A hop A->B carrying calls/imports/contains shows ALL three, not just the
+    first parallel edge."""
+    graph = _multidigraph_hop(["calls", "imports", "contains"])
+    nodes, edges = _bfs(graph, ["a"], depth=1)
+
+    text = _subgraph_to_text(graph, nodes, edges)
+
+    # Exactly one EDGE line for the pair (bundled, not one line per parallel edge)
+    edge_lines = [line for line in text.splitlines() if line.startswith("EDGE ")]
+    assert len(edge_lines) == 1, edge_lines
+    edge_line = edge_lines[0]
+    assert "calls" in edge_line
+    assert "imports" in edge_line
+    assert "contains" in edge_line
+    # Three relations fit under the default cap of 3 -> no capped marker
+    assert not _CAPPED_MARKER.search(edge_line)
+
+
+def test_query_graph_text_multigraph_end_to_end_shows_all_relations():
+    """Full _query_graph_text pipeline (the path query_graph MCP tool uses) shows
+    every parallel relation for the matched hop."""
+    graph = _multidigraph_hop(["calls", "imports", "contains"])
+
+    text = _query_graph_text(graph, "Alpha", mode="bfs", depth=1)
+
+    assert "No matching nodes found." not in text
+    edge_lines = [line for line in text.splitlines() if line.startswith("EDGE ")]
+    assert len(edge_lines) == 1, edge_lines
+    assert "calls" in edge_lines[0]
+    assert "imports" in edge_lines[0]
+    assert "contains" in edge_lines[0]
+
+
+def test_subgraph_to_text_directional_isolation():
+    """The directional EDGE arrow must report only the forward (u->v) relations.
+
+    With A->B 'calls' and B->A 'imports', the A-->B line shows 'calls' and NOT
+    'imports', and the B-->A line shows 'imports' and NOT 'calls'. Without
+    directed_only=True the envelope would merge the reverse edge into both lines.
+    """
+    graph = nx.MultiDiGraph()
+    graph.add_node("a", label="Alpha")
+    graph.add_node("b", label="Beta")
+    graph.add_edge("a", "b", key="k1", relation="calls", confidence="EXTRACTED")
+    graph.add_edge("b", "a", key="k2", relation="imports", confidence="EXTRACTED")
+
+    text = _subgraph_to_text(graph, {"a", "b"}, [("a", "b"), ("b", "a")])
+
+    ab_line = next(line for line in text.splitlines() if line.startswith("EDGE Alpha "))
+    ba_line = next(line for line in text.splitlines() if line.startswith("EDGE Beta "))
+    assert "calls" in ab_line and "imports" not in ab_line, ab_line
+    assert "imports" in ba_line and "calls" not in ba_line, ba_line
+
+
+def test_subgraph_to_text_single_relation_format_pinned():
+    """Pin the EXACT single-relation EDGE line so the historical
+    ``--{rel} [{conf} context={ctx}]-->`` square-bracket format cannot silently
+    regress (it must match path/explain and any downstream EDGE-line parser)."""
+    graph = nx.DiGraph()
+    graph.add_node("a", label="alpha()")
+    graph.add_node("b", label="beta()")
+    graph.add_edge("a", "b", relation="calls", confidence="EXTRACTED", context="call")
+
+    text = _subgraph_to_text(graph, {"a", "b"}, [("a", "b")])
+
+    edge_line = next(line for line in text.splitlines() if line.startswith("EDGE "))
+    assert edge_line == "EDGE alpha() --calls [EXTRACTED context=call]--> beta()"
+
+    # And the no-context variant keeps the bare [conf] bracket group.
+    graph2 = nx.DiGraph()
+    graph2.add_node("a", label="alpha")
+    graph2.add_node("b", label="beta")
+    graph2.add_edge("a", "b", relation="uses", confidence="INFERRED")
+    line2 = next(
+        line
+        for line in _subgraph_to_text(graph2, {"a", "b"}, [("a", "b")]).splitlines()
+        if line.startswith("EDGE ")
+    )
+    assert line2 == "EDGE alpha --uses [INFERRED]--> beta"
+
+
+# ---------------------------------------------------------------------------
+# 2. get_neighbors bundles parallel edges to one neighbor
+# ---------------------------------------------------------------------------
+
+
+def test_get_neighbors_multigraph_bundles_parallel_edges():
+    """A node with 3 parallel edges to one neighbor lists that neighbor ONCE with
+    all relations bundled, never 3 lines and never first-edge-only."""
+    graph = _multidigraph_hop(["calls", "imports", "contains"])
+
+    text = _neighbors_text(graph, "Alpha")
+
+    neighbor_lines = [line for line in text.splitlines() if line.strip().startswith("-->")]
+    assert len(neighbor_lines) == 1, neighbor_lines
+    line = neighbor_lines[0]
+    assert "Beta" in line
+    assert "calls" in line
+    assert "imports" in line
+    assert "contains" in line
+
+
+def test_get_neighbors_multigraph_directional_isolation():
+    """Outgoing (-->) and incoming (<--) lines bundle only their own direction:
+    a->b 'calls' must not leak into the <-- incoming line and vice versa."""
+    graph = nx.MultiDiGraph()
+    graph.add_node("a", label="Alpha")
+    graph.add_node("b", label="Beta")
+    graph.add_edge("a", "b", key="k1", relation="calls", confidence="EXTRACTED")
+    graph.add_edge("a", "b", key="k2", relation="imports", confidence="EXTRACTED")
+    graph.add_edge("b", "a", key="k3", relation="returns", confidence="INFERRED")
+
+    text = _neighbors_text(graph, "Alpha")
+
+    out_line = next(line for line in text.splitlines() if line.strip().startswith("-->"))
+    in_line = next(line for line in text.splitlines() if line.strip().startswith("<--"))
+    # Outgoing bundle: calls + imports, NOT returns
+    assert "calls" in out_line and "imports" in out_line
+    assert "returns" not in out_line
+    # Incoming bundle: returns only, NOT the outgoing relations
+    assert "returns" in in_line
+    assert "calls" not in in_line and "imports" not in in_line
+
+
+def test_get_neighbors_single_relation_format_pinned():
+    """Pin the EXACT single-relation neighbor lines so the historical
+    ``[rel] [conf]`` two-bracket form cannot regress to the envelope ``(conf)``
+    parens form — MCP get_neighbors must stay consistent with path/explain."""
+    graph = nx.DiGraph()
+    graph.add_node("a", label="alpha")
+    graph.add_node("b", label="beta")
+    graph.add_node("c", label="gamma")
+    graph.add_edge("a", "b", relation="calls", confidence="EXTRACTED")
+    graph.add_edge("c", "a", relation="imports", confidence="INFERRED")
+
+    text = _neighbors_text(graph, "alpha")
+
+    out_line = next(line for line in text.splitlines() if line.strip().startswith("-->"))
+    in_line = next(line for line in text.splitlines() if line.strip().startswith("<--"))
+    assert out_line == "  --> beta [calls] [EXTRACTED]"
+    assert in_line == "  <-- gamma [imports] [INFERRED]"
+
+    # No-confidence variant keeps the empty second bracket group.
+    graph2 = nx.DiGraph()
+    graph2.add_node("a", label="a")
+    graph2.add_node("b", label="b")
+    graph2.add_edge("a", "b", relation="rel")
+    line2 = next(
+        line
+        for line in _neighbors_text(graph2, "a").splitlines()
+        if line.strip().startswith("-->")
+    )
+    assert line2 == "  --> b [rel] []"
+
+
+def test_get_neighbors_multigraph_relation_filter_checks_all_parallel():
+    """relation_filter matches a relation even when it is on a non-first parallel
+    edge (first-edge-only filtering would miss it)."""
+    graph = _multidigraph_hop(["calls", "imports", "contains"])
+
+    # "contains" is not the first sorted relation; filter must still surface it.
+    text = _neighbors_text(graph, "Alpha", relation_filter="contains")
+
+    neighbor_lines = [line for line in text.splitlines() if line.strip().startswith("-->")]
+    assert len(neighbor_lines) == 1, neighbor_lines
+    assert "Beta" in neighbor_lines[0]
+
+    # A relation present on no edge filters the neighbor out entirely.
+    empty = _neighbors_text(graph, "Alpha", relation_filter="nonexistent")
+    assert not [line for line in empty.splitlines() if line.strip().startswith("-->")]
+
+
+# ---------------------------------------------------------------------------
+# 3. shortest_path shows bundled hops
+# ---------------------------------------------------------------------------
+
+
+def test_shortest_path_multigraph_shows_bundled_hops():
+    """A path hop carrying multiple parallel relations shows the bundle per hop,
+    not a single first-edge representative."""
+    graph = nx.MultiDiGraph()
+    graph.add_node("a", label="Alpha", source_file="a.py", community=0)
+    graph.add_node("b", label="Beta", source_file="b.py", community=0)
+    graph.add_node("c", label="Gamma", source_file="c.py", community=0)
+    for index, relation in enumerate(["calls", "imports", "contains"]):
+        graph.add_edge("a", "b", key=f"{relation}-{index}", relation=relation, confidence="EXTRACTED")
+    graph.add_edge("b", "c", key="uses-0", relation="uses", confidence="EXTRACTED")
+
+    text = _shortest_path_text(graph, "Alpha", "Gamma")
+
+    assert "Shortest path" in text
+    # The A->B hop must show all three parallel relations.
+    assert "calls" in text
+    assert "imports" in text
+    assert "contains" in text
+    assert "uses" in text  # the B->C hop
+
+
+def test_shortest_path_single_relation_format_pinned():
+    """Pin the EXACT single-relation hop format ``--{rel} [{conf}]-->`` so it
+    cannot regress to the envelope ``(conf)`` parens form."""
+    graph = nx.DiGraph()
+    graph.add_node("a", label="alpha")
+    graph.add_node("b", label="beta")
+    graph.add_node("c", label="gamma")
+    graph.add_edge("a", "b", relation="calls", confidence="EXTRACTED")
+    graph.add_edge("b", "c", relation="imports", confidence="INFERRED")
+
+    text = _shortest_path_text(graph, "alpha", "gamma")
+
+    assert text == (
+        "Shortest path (2 hops):\n"
+        "  alpha --calls [EXTRACTED]--> beta --imports [INFERRED]--> gamma"
+    )
+
+    # No-confidence hop drops the confidence bracket entirely (historical form).
+    graph2 = nx.DiGraph()
+    graph2.add_node("a", label="a")
+    graph2.add_node("b", label="b")
+    graph2.add_edge("a", "b", relation="rel")
+    assert _shortest_path_text(graph2, "a", "b") == (
+        "Shortest path (1 hops):\n  a --rel--> b"
+    )
+
+
+# ---------------------------------------------------------------------------
+# 4. capped summary for a noisy pair (bounded output)
+# ---------------------------------------------------------------------------
+
+
+def test_query_capped_summary_for_noisy_pair():
+    """A pair with 6 parallel relations renders the capped '(+K more, N total)'
+    form, proving output is bounded rather than unbounded enumeration."""
+    graph = _multidigraph_hop(["alpha", "beta", "gamma", "delta", "epsilon", "zeta"])
+    nodes, edges = _bfs(graph, ["a"], depth=1)
+
+    text = _subgraph_to_text(graph, nodes, edges)
+
+    edge_line = next(line for line in text.splitlines() if line.startswith("EDGE "))
+    match = _CAPPED_MARKER.search(edge_line)
+    assert match, edge_line
+    # N total counts edge records (6), not unique relations.
+    assert "6 total" in match.group(0)
+    # First cap=3 sorted relations are shown, the rest summarised as "+K more".
+    assert "alpha" in edge_line and "beta" in edge_line and "delta" in edge_line
+    assert "+3 more" in edge_line
+
+    # get_neighbors applies the same bounded cap.
+    neighbors = _neighbors_text(graph, "Alpha")
+    nbr_line = next(line for line in neighbors.splitlines() if line.strip().startswith("-->"))
+    assert _CAPPED_MARKER.search(nbr_line), nbr_line
+
+
+# ---------------------------------------------------------------------------
+# 5. simple-graph regression gate
+# ---------------------------------------------------------------------------
+
+
+def _simple_digraph() -> nx.DiGraph:
+    graph = nx.DiGraph()
+    graph.add_node("a", label="Alpha", source_file="a.py", source_location="L1", community=0)
+    graph.add_node("b", label="Beta", source_file="b.py", source_location="L1", community=0)
+    graph.add_node("c", label="Gamma", source_file="c.py", source_location="L1", community=0)
+    graph.add_edge("a", "b", relation="calls", confidence="EXTRACTED", context="call")
+    graph.add_edge("b", "c", relation="imports", confidence="EXTRACTED")
+    return graph
+
+
+def test_serve_simple_graph_output_regression():
+    """A plain DiGraph produces single-relation-per-pair output across query,
+    neighbors, and path surfaces with NO capped '(+K more)' marker — the
+    simple-graph regression gate."""
+    graph = _simple_digraph()
+
+    # --- subgraph / query text ---
+    nodes, edges = _bfs(graph, ["a"], depth=2)
+    sub = _subgraph_to_text(graph, nodes, edges)
+    assert not _CAPPED_MARKER.search(sub)
+    ab_edge = next(line for line in sub.splitlines() if line.startswith("EDGE Alpha "))
+    assert "calls" in ab_edge
+    # Single relation per pair: no comma-joined relation list on the hop.
+    relation_segment = ab_edge.split("--", 1)[1].split("-->", 1)[0]
+    assert "," not in relation_segment
+    # Edge context is still emitted exactly as before.
+    assert "context=call" in ab_edge
+
+    # --- get_neighbors ---
+    neighbors = _neighbors_text(graph, "Alpha")
+    assert not _CAPPED_MARKER.search(neighbors)
+    out_line = next(line for line in neighbors.splitlines() if line.strip().startswith("-->"))
+    assert "Beta" in out_line and "calls" in out_line
+    assert "," not in out_line.split("[", 1)[1]  # single relation in the bracket
+
+    # --- shortest_path ---
+    path = _shortest_path_text(graph, "Alpha", "Gamma")
+    assert not _CAPPED_MARKER.search(path)
+    assert "Shortest path (2 hops):" in path
+    assert "calls" in path and "imports" in path
+
+
+def test_serve_simple_graph_query_cli_text_unchanged():
+    """The CLI-facing _query_graph_text path on a simple graph keeps its header
+    and per-hop format (single relation, no capping)."""
+    graph = _simple_digraph()
+    text = _query_graph_text(graph, "Alpha", mode="bfs", depth=2)
+    assert "Traversal: BFS depth=2" in text
+    assert not _CAPPED_MARKER.search(text)
+    assert "calls" in text

From 6f9bef24cb55dc37e0788898fbf03906cf8aac05 Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Fri, 29 May 2026 02:33:12 -0500
Subject: [PATCH 07/21] test: isolate backend-detection tests from ambient API
 keys

test_ollama and test_incremental assumed a clean environment but did not
clear all backend env vars detect_backend() checks (DEEPSEEK_API_KEY,
GEMINI/OPENAI/AWS), so they failed in any shell exporting one. No-waiver fix
makes them hermetic:

- test_ollama.py: autouse fixture strips every backend key (derived from
  BACKENDS + _backend_env_keys) plus AWS/OLLAMA before each test.
- test_incremental.py: _run passes a sanitized env to the extract subprocess
  so the "no LLM API key" path triggers regardless of ambient keys.

Verified green with DEEPSEEK_API_KEY set. Full suite 1624 passed.

gost
---
 tests/test_incremental.py | 15 +++++++++++++++
 tests/test_ollama.py      | 15 ++++++++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/tests/test_incremental.py b/tests/test_incremental.py
index 47006c695..13c47e048 100644
--- a/tests/test_incremental.py
+++ b/tests/test_incremental.py
@@ -2,20 +2,35 @@
 
 from __future__ import annotations
 import json
+import os
 import subprocess
 import sys
 from pathlib import Path
 
+from graphify.llm import BACKENDS, _backend_env_keys
+
 
 PYTHON = sys.executable
 
 
+def _clean_env() -> dict:
+    """Return os.environ with every backend API key stripped out."""
+    env = dict(os.environ)
+    for backend in BACKENDS:
+        for env_key in _backend_env_keys(backend):
+            env.pop(env_key, None)
+    for extra in ("AWS_PROFILE", "AWS_REGION", "AWS_DEFAULT_REGION", "OLLAMA_BASE_URL", "OLLAMA_API_KEY"):
+        env.pop(extra, None)
+    return env
+
+
 def _run(args: list[str], cwd: Path) -> subprocess.CompletedProcess:
     return subprocess.run(
         [PYTHON, "-m", "graphify"] + args,
         cwd=cwd,
         capture_output=True,
         text=True,
+        env=_clean_env(),
     )
 
 
diff --git a/tests/test_ollama.py b/tests/test_ollama.py
index 7336dd5fe..213ef5207 100644
--- a/tests/test_ollama.py
+++ b/tests/test_ollama.py
@@ -2,7 +2,20 @@
 
 from __future__ import annotations
 
-from graphify.llm import detect_backend, BACKENDS
+import pytest
+
+from graphify.llm import detect_backend, BACKENDS, _backend_env_keys
+
+
+@pytest.fixture(autouse=True)
+def _isolate_backend_env(monkeypatch):
+    """Strip every ambient backend API key so detect_backend() tests are hermetic."""
+    for backend in BACKENDS:
+        for env_key in _backend_env_keys(backend):
+            monkeypatch.delenv(env_key, raising=False)
+    for extra in ("AWS_PROFILE", "AWS_REGION", "AWS_DEFAULT_REGION", "OLLAMA_BASE_URL", "OLLAMA_API_KEY"):
+        monkeypatch.delenv(extra, raising=False)
+    yield
 
 
 def test_ollama_in_backends():

From a1811e64627a3d4f597084d0ad20d09ec17151e3 Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Fri, 29 May 2026 02:57:41 -0500
Subject: [PATCH 08/21] style: format PR 5 surfaces

---
 graphify/__main__.py           |  4 ++-
 graphify/serve.py              |  7 ++---
 tests/test_explain_cli.py      | 16 ++++++++--
 tests/test_incremental.py      |  8 ++++-
 tests/test_ollama.py           |  8 ++++-
 tests/test_path_cli.py         | 56 +++++++++++++++++++++++++++++-----
 tests/test_projections.py      |  4 +--
 tests/test_serve_multigraph.py | 19 +++++-------
 8 files changed, 90 insertions(+), 32 deletions(-)

diff --git a/graphify/__main__.py b/graphify/__main__.py
index 94f852881..b824b910d 100644
--- a/graphify/__main__.py
+++ b/graphify/__main__.py
@@ -2373,7 +2373,9 @@ def main() -> None:
                 # relations and vice versa.
                 env = relationship_envelope(G, e_src, e_tgt, directed_only=True)
                 if len(env["relations"]) > 1:
-                    rel_block = f"[{format_relationship_envelope(G, e_src, e_tgt, directed_only=True)}]"
+                    rel_block = (
+                        f"[{format_relationship_envelope(G, e_src, e_tgt, directed_only=True)}]"
+                    )
                 else:
                     # Single relation (always true for simple DiGraph/Graph):
                     # keep the historical "[rel] [conf]" form byte-stable.
diff --git a/graphify/serve.py b/graphify/serve.py
index 0b90cea35..6798c3959 100644
--- a/graphify/serve.py
+++ b/graphify/serve.py
@@ -398,9 +398,7 @@ def _subgraph_to_text(
                 # path/explain surfaces stay consistent.
                 d = env["shown"][0] if env["shown"] else {}
                 context = d.get("context")
-                context_suffix = (
-                    f" context={sanitize_label(str(context))}" if context else ""
-                )
+                context_suffix = f" context={sanitize_label(str(context))}" if context else ""
                 relation = sanitize_label(str(d.get("relation", "")))
                 confidence = sanitize_label(str(d.get("confidence", "")))
                 relation_segment = f"{relation} [{confidence}{context_suffix}]"
@@ -578,8 +576,7 @@ def _shortest_path_text(G: nx.Graph, source: str, target: str, *, max_hops: int
             top, runner = scored[0][0], scored[1][0]
             if top > 0 and (top - runner) / top < 0.10:
                 warnings.append(
-                    f"warning: {name} match was ambiguous "
-                    f"(top score {top:g}, runner-up {runner:g})"
+                    f"warning: {name} match was ambiguous (top score {top:g}, runner-up {runner:g})"
                 )
     try:
         # Use undirected view for path-finding (works regardless of query src/tgt order)
diff --git a/tests/test_explain_cli.py b/tests/test_explain_cli.py
index 98c42fdbd..e84390fb7 100644
--- a/tests/test_explain_cli.py
+++ b/tests/test_explain_cli.py
@@ -149,8 +149,20 @@ def _write_bidirectional_multigraph(tmp_path):
             {"id": "b", "label": "beta()", "source_file": "b.py", "community": 0},
         ],
         "links": [
-            {"source": "a", "target": "b", "relation": "calls", "confidence": "EXTRACTED", "key": 0},
-            {"source": "b", "target": "a", "relation": "imports", "confidence": "EXTRACTED", "key": 0},
+            {
+                "source": "a",
+                "target": "b",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "key": 0,
+            },
+            {
+                "source": "b",
+                "target": "a",
+                "relation": "imports",
+                "confidence": "EXTRACTED",
+                "key": 0,
+            },
         ],
     }
     p = tmp_path / "graph.json"
diff --git a/tests/test_incremental.py b/tests/test_incremental.py
index 13c47e048..6858c7520 100644
--- a/tests/test_incremental.py
+++ b/tests/test_incremental.py
@@ -19,7 +19,13 @@ def _clean_env() -> dict:
     for backend in BACKENDS:
         for env_key in _backend_env_keys(backend):
             env.pop(env_key, None)
-    for extra in ("AWS_PROFILE", "AWS_REGION", "AWS_DEFAULT_REGION", "OLLAMA_BASE_URL", "OLLAMA_API_KEY"):
+    for extra in (
+        "AWS_PROFILE",
+        "AWS_REGION",
+        "AWS_DEFAULT_REGION",
+        "OLLAMA_BASE_URL",
+        "OLLAMA_API_KEY",
+    ):
         env.pop(extra, None)
     return env
 
diff --git a/tests/test_ollama.py b/tests/test_ollama.py
index 213ef5207..e6e412002 100644
--- a/tests/test_ollama.py
+++ b/tests/test_ollama.py
@@ -13,7 +13,13 @@ def _isolate_backend_env(monkeypatch):
     for backend in BACKENDS:
         for env_key in _backend_env_keys(backend):
             monkeypatch.delenv(env_key, raising=False)
-    for extra in ("AWS_PROFILE", "AWS_REGION", "AWS_DEFAULT_REGION", "OLLAMA_BASE_URL", "OLLAMA_API_KEY"):
+    for extra in (
+        "AWS_PROFILE",
+        "AWS_REGION",
+        "AWS_DEFAULT_REGION",
+        "OLLAMA_BASE_URL",
+        "OLLAMA_API_KEY",
+    ):
         monkeypatch.delenv(extra, raising=False)
     yield
 
diff --git a/tests/test_path_cli.py b/tests/test_path_cli.py
index 7fd505bee..f8e7770e5 100644
--- a/tests/test_path_cli.py
+++ b/tests/test_path_cli.py
@@ -74,10 +74,34 @@ def _write_multigraph(tmp_path):
             {"id": "c", "label": "gamma()", "source_file": "c.py", "community": 0},
         ],
         "links": [
-            {"source": "a", "target": "b", "relation": "calls", "confidence": "EXTRACTED", "key": 0},
-            {"source": "a", "target": "b", "relation": "imports", "confidence": "EXTRACTED", "key": 1},
-            {"source": "a", "target": "b", "relation": "contains", "confidence": "EXTRACTED", "key": 2},
-            {"source": "b", "target": "c", "relation": "returns", "confidence": "INFERRED", "key": 0},
+            {
+                "source": "a",
+                "target": "b",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "key": 0,
+            },
+            {
+                "source": "a",
+                "target": "b",
+                "relation": "imports",
+                "confidence": "EXTRACTED",
+                "key": 1,
+            },
+            {
+                "source": "a",
+                "target": "b",
+                "relation": "contains",
+                "confidence": "EXTRACTED",
+                "key": 2,
+            },
+            {
+                "source": "b",
+                "target": "c",
+                "relation": "returns",
+                "confidence": "INFERRED",
+                "key": 0,
+            },
         ],
     }
     p = tmp_path / "graph.json"
@@ -119,9 +143,27 @@ def _write_bidirectional_multigraph(tmp_path):
             {"id": "c", "label": "gamma()", "source_file": "c.py", "community": 0},
         ],
         "links": [
-            {"source": "a", "target": "b", "relation": "calls", "confidence": "EXTRACTED", "key": 0},
-            {"source": "b", "target": "a", "relation": "imports", "confidence": "EXTRACTED", "key": 0},
-            {"source": "b", "target": "c", "relation": "returns", "confidence": "INFERRED", "key": 0},
+            {
+                "source": "a",
+                "target": "b",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "key": 0,
+            },
+            {
+                "source": "b",
+                "target": "a",
+                "relation": "imports",
+                "confidence": "EXTRACTED",
+                "key": 0,
+            },
+            {
+                "source": "b",
+                "target": "c",
+                "relation": "returns",
+                "confidence": "INFERRED",
+                "key": 0,
+            },
         ],
     }
     p = tmp_path / "graph.json"
diff --git a/tests/test_projections.py b/tests/test_projections.py
index e362ad701..305f5627b 100644
--- a/tests/test_projections.py
+++ b/tests/test_projections.py
@@ -329,9 +329,7 @@ def test_format_relationship_envelope_multiple_within_cap() -> None:
 
 
 def test_format_relationship_envelope_capped() -> None:
-    graph = _multidigraph_with_parallel_relations(
-        ["gamma", "alpha", "epsilon", "beta", "delta"]
-    )
+    graph = _multidigraph_with_parallel_relations(["gamma", "alpha", "epsilon", "beta", "delta"])
 
     # sorted unique relations: alpha, beta, delta, epsilon, gamma -> first 3 shown
     assert (
diff --git a/tests/test_serve_multigraph.py b/tests/test_serve_multigraph.py
index e05bf755d..d5509a6c3 100644
--- a/tests/test_serve_multigraph.py
+++ b/tests/test_serve_multigraph.py
@@ -25,9 +25,7 @@
 _CAPPED_MARKER = re.compile(r"\(\+\d+ more, \d+ total\)")
 
 
-def _multidigraph_hop(
-    relations: list[str], *, confidence: str = "EXTRACTED"
-) -> nx.MultiDiGraph:
+def _multidigraph_hop(relations: list[str], *, confidence: str = "EXTRACTED") -> nx.MultiDiGraph:
     """Build A(Alpha) -> B(Beta) with one parallel edge per supplied relation."""
     graph = nx.MultiDiGraph()
     graph.add_node("a", label="Alpha", source_file="a.py", source_location="L1", community=0)
@@ -193,9 +191,7 @@ def test_get_neighbors_single_relation_format_pinned():
     graph2.add_node("b", label="b")
     graph2.add_edge("a", "b", relation="rel")
     line2 = next(
-        line
-        for line in _neighbors_text(graph2, "a").splitlines()
-        if line.strip().startswith("-->")
+        line for line in _neighbors_text(graph2, "a").splitlines() if line.strip().startswith("-->")
     )
     assert line2 == "  --> b [rel] []"
 
@@ -230,7 +226,9 @@ def test_shortest_path_multigraph_shows_bundled_hops():
     graph.add_node("b", label="Beta", source_file="b.py", community=0)
     graph.add_node("c", label="Gamma", source_file="c.py", community=0)
     for index, relation in enumerate(["calls", "imports", "contains"]):
-        graph.add_edge("a", "b", key=f"{relation}-{index}", relation=relation, confidence="EXTRACTED")
+        graph.add_edge(
+            "a", "b", key=f"{relation}-{index}", relation=relation, confidence="EXTRACTED"
+        )
     graph.add_edge("b", "c", key="uses-0", relation="uses", confidence="EXTRACTED")
 
     text = _shortest_path_text(graph, "Alpha", "Gamma")
@@ -256,8 +254,7 @@ def test_shortest_path_single_relation_format_pinned():
     text = _shortest_path_text(graph, "alpha", "gamma")
 
     assert text == (
-        "Shortest path (2 hops):\n"
-        "  alpha --calls [EXTRACTED]--> beta --imports [INFERRED]--> gamma"
+        "Shortest path (2 hops):\n  alpha --calls [EXTRACTED]--> beta --imports [INFERRED]--> gamma"
     )
 
     # No-confidence hop drops the confidence bracket entirely (historical form).
@@ -265,9 +262,7 @@ def test_shortest_path_single_relation_format_pinned():
     graph2.add_node("a", label="a")
     graph2.add_node("b", label="b")
     graph2.add_edge("a", "b", relation="rel")
-    assert _shortest_path_text(graph2, "a", "b") == (
-        "Shortest path (1 hops):\n  a --rel--> b"
-    )
+    assert _shortest_path_text(graph2, "a", "b") == ("Shortest path (1 hops):\n  a --rel--> b")
 
 
 # ---------------------------------------------------------------------------

From 2e84cc3879f37cb177daa1ee19cefb5f721467ad Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Fri, 29 May 2026 03:07:21 -0500
Subject: [PATCH 09/21] fix: fall back when process pool is unavailable

---
 graphify/extract.py   | 12 ++++++++++++
 tests/conftest.py     |  2 +-
 tests/test_extract.py | 22 ++++++++++++++++++++++
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/graphify/extract.py b/graphify/extract.py
index 5bd556c87..42b90c6e3 100644
--- a/graphify/extract.py
+++ b/graphify/extract.py
@@ -11838,6 +11838,18 @@ def _extract_parallel(
             flush=True,
         )
         return False
+    except OSError as exc:
+        # Some restricted runtimes block the semaphore/system-limit probes that
+        # ProcessPoolExecutor performs while starting. That is recoverable for
+        # extraction: fall back to in-process sequential extraction instead of
+        # failing the whole pipeline.
+        print(
+            f"  warning: parallel extraction unavailable ({exc.__class__.__name__}: {exc}); "
+            "falling back to sequential. Pass parallel=False to extract() to skip the pool "
+            "entirely.",
+            flush=True,
+        )
+        return False
     if total_files >= _PROGRESS_INTERVAL:
         print(
             f"  AST extraction: {total_files}/{total_files} files (100%) [{max_workers} workers]",
diff --git a/tests/conftest.py b/tests/conftest.py
index 835ff5e52..a32b423cf 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -14,7 +14,7 @@
 
 def pytest_collection_modifyitems(items: list[Any]) -> None:
     for item in items:
-        if item.path.name != "test_analyze.py":
+        if item.path.name not in {"test_analyze.py", "test_pipeline.py"}:
             continue
         for warning_filter in _ANALYZE_WARNING_FILTERS:
             item.add_marker(pytest.mark.filterwarnings(warning_filter))
diff --git a/tests/test_extract.py b/tests/test_extract.py
index 80a68602b..2712a6ba3 100644
--- a/tests/test_extract.py
+++ b/tests/test_extract.py
@@ -590,6 +590,28 @@ def submit(self, *a, **kw):
     assert "__main__" in out, "warning must hint at the Windows __main__ guard idiom"
 
 
+def test_extract_parallel_returns_false_when_pool_unavailable(tmp_path, monkeypatch, capsys):
+    """ProcessPoolExecutor setup OSErrors must fall back to sequential extraction."""
+    import concurrent.futures
+    from graphify import extract as extract_mod
+
+    def raise_permission_error(*args, **kwargs):
+        raise PermissionError("semaphore probe denied")
+
+    monkeypatch.setattr(concurrent.futures, "ProcessPoolExecutor", raise_permission_error)
+
+    uncached = [(0, FIXTURES / "sample.py")]
+    per_file: list = [None]
+
+    ok = extract_mod._extract_parallel(uncached, per_file, tmp_path, 2, 1)
+
+    assert ok is False
+    out = capsys.readouterr().out
+    assert "parallel extraction unavailable" in out
+    assert "PermissionError" in out
+    assert "falling back to sequential" in out
+
+
 def test_extract_parallel_worker_warning_handles_sparse_file_indexes(tmp_path, monkeypatch, capsys):
     """Worker-failure warnings must not index work_items by original file index."""
     import concurrent.futures

From ae8b4725fe82e067fa23cc3aedd716596d5c57f2 Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Fri, 29 May 2026 03:54:37 -0500
Subject: [PATCH 10/21] chore: keep private test tooling out of public metadata

---
 pyproject.toml |  1 -
 uv.lock        | 15 ---------------
 2 files changed, 16 deletions(-)

diff --git a/pyproject.toml b/pyproject.toml
index 7acee8e66..256cd3afe 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -74,7 +74,6 @@ graphify = "graphify.__main__:main"
 dev = [
     "bandit>=1.9.4",
     "build>=1.5.0",
-    "hypothesis>=6.152.7",
     "nuitka>=4.1",
     "patchelf>=0.17.2.4 ; sys_platform != 'win32'",
     "pip-audit>=2.10.0",
diff --git a/uv.lock b/uv.lock
index a93755d00..17d022cdf 100644
--- a/uv.lock
+++ b/uv.lock
@@ -1225,7 +1225,6 @@ watch = [
 dev = [
     { name = "bandit" },
     { name = "build" },
-    { name = "hypothesis" },
     { name = "nuitka" },
     { name = "patchelf", marker = "sys_platform != 'win32'" },
     { name = "pip-audit" },
@@ -1318,7 +1317,6 @@ provides-extras = ["mcp", "neo4j", "pdf", "watch", "svg", "leiden", "office", "g
 dev = [
     { name = "bandit", specifier = ">=1.9.4" },
     { name = "build", specifier = ">=1.5.0" },
-    { name = "hypothesis", specifier = ">=6.152.7" },
     { name = "nuitka", specifier = ">=4.1" },
     { name = "patchelf", marker = "sys_platform != 'win32'", specifier = ">=0.17.2.4" },
     { name = "pip-audit", specifier = ">=2.10.0" },
@@ -1473,19 +1471,6 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/6e/11/0b64cc9024329b76d7547c19a67604a61d21d3ba678a69d1b220c29d5112/huggingface_hub-1.15.0-py3-none-any.whl", hash = "sha256:a4a59af04cbc41a3fe3fec429b171ef994ef8c971eda10136746f408dd4e3744", size = 663602, upload-time = "2026-05-15T11:42:50.487Z" },
 ]
 
-[[package]]
-name = "hypothesis"
-version = "6.153.0"
-source = { registry = "https://pypi.org/simple" }
-dependencies = [
-    { name = "exceptiongroup", marker = "python_full_version < '3.11'" },
-    { name = "sortedcontainers" },
-]
-sdist = { url = "https://files.pythonhosted.org/packages/b1/92/918fb03318c7ff9a271d7cad8eceb359d1069f17e84f5191d52c2970f18f/hypothesis-6.153.0.tar.gz", hash = "sha256:11616e5158fc485d62bae19d9cc69333237faa8050ad44a45218254a1ef272bb", size = 474030, upload-time = "2026-05-26T05:19:05.468Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/2e/20/96dc2387cf29a0ec75b427d62d3dde1f44c924719503babaac4c96806223/hypothesis-6.153.0-py3-none-any.whl", hash = "sha256:2aeda9bbb44ae0ee0bfa67ef744a25be05c1f804dca4eb6479c63518dc9f2900", size = 540326, upload-time = "2026-05-26T05:19:02.861Z" },
-]
-
 [[package]]
 name = "hyppo"
 version = "0.5.2"

From 014b673fcc60b61729d1b1e11dfa986d6251ffc5 Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Fri, 29 May 2026 20:13:51 -0500
Subject: [PATCH 11/21] fix: restore lint-clean style in test_languages.py
 after upstream rebase

The upstream/v8 rebase resolved test_languages.py conflicts by taking
upstream's behavioral assertions (#1077 markdown code-block drop), which
reintroduced the ambiguous `l` loop variable (E741) and a combined
`import tempfile, os` (E401) that our earlier no-waiver cleanup had fixed.
Re-apply lint-clean style on top of upstream's behavior: l -> label, split
imports. Ruff E4/E7/E9/F clean; test_languages 218 passed.

gost
---
 tests/test_languages.py | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/tests/test_languages.py b/tests/test_languages.py
index 4828c029f..da2d3c167 100644
--- a/tests/test_languages.py
+++ b/tests/test_languages.py
@@ -1265,7 +1265,7 @@ def test_js_module_level_arrow_produces_node_and_call_edges(tmp_path):
     labels = _labels(r)
     relations = _relations(r)
 
-    assert any("handler" in l for l in labels), f"module-level arrow 'handler' missing: {labels}"
+    assert any("handler" in label for label in labels), f"module-level arrow 'handler' missing: {labels}"
     assert "calls" in relations, f"expected 'calls' edge from handler->helper: {relations}"
 
 
@@ -1323,8 +1323,8 @@ def test_markdown_skips_fenced_code_blocks():
     """
     r = extract_markdown(FIXTURES / "deploy_guide.md")
     labels = _labels(r)
-    assert not any(l.startswith("code:") for l in labels), \
-        f"Expected no code:* nodes after #1077 fix, got: {[l for l in labels if l.startswith('code:')]}"
+    assert not any(label.startswith("code:") for label in labels), \
+        f"Expected no code:* nodes after #1077 fix, got: {[label for label in labels if label.startswith('code:')]}"
 
 def test_markdown_contains_edges():
     """Headings should be connected via 'contains' edges (file->h, h->h)."""
@@ -1342,7 +1342,8 @@ def test_markdown_fenced_heading_not_parsed():
     The fence-toggle skips over fenced contents so interior markdown syntax
     is not misread as document structure.
     """
-    import tempfile, os
+    import os
+    import tempfile
     src = (
         "# Real Heading\n"
         "\n"
@@ -1362,9 +1363,9 @@ def test_markdown_fenced_heading_not_parsed():
     finally:
         os.unlink(fpath)
 
-    assert any("Real Heading" in l for l in labels), f"'Real Heading' missing: {labels}"
-    assert any("Another Real Heading" in l for l in labels), f"'Another Real Heading' missing: {labels}"
-    assert not any("Not A Heading" in l for l in labels), \
+    assert any("Real Heading" in label for label in labels), f"'Real Heading' missing: {labels}"
+    assert any("Another Real Heading" in label for label in labels), f"'Another Real Heading' missing: {labels}"
+    assert not any("Not A Heading" in label for label in labels), \
         f"fenced '## Not A Heading' was incorrectly parsed as a node: {labels}"
 
 

From 52df5847319b471bd897437f882e75601ad17df1 Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Fri, 29 May 2026 20:28:51 -0500
Subject: [PATCH 12/21] feat(multigraph): PR 6 export/visualization surfaces
 preserve parallel edges
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

External outputs no longer silently collapse MultiDiGraph parallel edges.
Per the gate, each surface either preserves every parallel edge or documents
and tests an intentional summarization:

PRESERVE-ALL (structural):
- to_json / to_graphml: native multi-edge support (round-trip tests). Fixed
  to_graphml crash: strip non-scalar graph-level attrs before write_graphml.
- to_cypher: each parallel edge gets a distinct edge_key (positional key,
  int or str, unique per (u,v) pair) so Neo4j MERGE never dedups parallels —
  even when relation/source_file/source_location are identical.
- to_canvas: unique keyed edge IDs (e_u_v_idx) replace colliding endpoint-only
  IDs. Visual cap is additive: real edges fill the 200-cap first by weight,
  then overflow-summary edges append only for surviving pairs (never evict a
  real edge).

INTENTIONAL-SUMMARIZE (display, documented):
- to_obsidian / wiki: bundle all parallel relations per neighbor instead of
  edge_data() first-edge-only.
- to_html / to_svg: cap parallel edges drawn per pair + summary label.

report.py verified already-preserving (no change; regression test added).
Simple-graph output byte-stable (pinned). 19 new tests incl. integer-key
collision and >200-edge canvas-cap regressions. Full suite 1658 passed;
ruff + pyright clean.

gost
---
 graphify/export.py              | 287 ++++++++++++++---
 graphify/wiki.py                |  40 ++-
 tests/test_export_multigraph.py | 545 ++++++++++++++++++++++++++++++++
 tests/test_report.py            |  53 ++++
 tests/test_wiki.py              |  55 ++++
 5 files changed, 921 insertions(+), 59 deletions(-)
 create mode 100644 tests/test_export_multigraph.py

diff --git a/graphify/export.py b/graphify/export.py
index b8726092e..6aae689cd 100644
--- a/graphify/export.py
+++ b/graphify/export.py
@@ -16,7 +16,13 @@
 from networkx.readwrite import json_graph
 from graphify.security import sanitize_label
 from graphify.analyze import _node_community_map
-from graphify.build import edge_data
+from graphify.build import edge_data, edge_datas
+from graphify.edge_identity import make_stable_key
+from graphify.projections import (
+    DEFAULT_RELATIONSHIP_CAP,
+    format_relationship_envelope,
+    relationship_envelope,
+)
 
 
 # Artifacts worth preserving across rebuilds (non-regenerable without LLM or curation).
@@ -617,6 +623,36 @@ def _cypher_label(raw: str, fallback: str) -> str:
     return cleaned
 
 
+def _edge_distinguishing_key(data: dict, explicit_key: object | None = None) -> str:
+    """Return a stable per-edge key that distinguishes parallel edges.
+
+    MultiDiGraph keyed edges carry their key as the positional ``key`` of
+    ``G.edges(keys=True, data=True)`` rather than inside the attribute dict, so
+    callers that already hold the positional key pass it as ``explicit_key``.
+    NetworkX guarantees that positional key is UNIQUE within a ``(u, v)`` pair —
+    which is exactly the scope Neo4j MERGE deduplicates over — and it may be an
+    INTEGER (0, 1, 2…) when no explicit string key was set. We therefore accept
+    any non-None positional key and stringify it; narrowing to ``str`` would
+    silently drop integer keys and let two parallel edges with identical
+    (relation, source_file, source_location) collapse to the same edge_key.
+
+    When no positional key is available (simple graphs — one edge per pair, or a
+    stray ``key`` left in attrs), derive a deterministic ``edge:v1:<sha256>`` key
+    from the edge's semantic identity fields via :func:`make_stable_key`.
+    """
+    if explicit_key is not None:
+        # int or str positional key — unique per (u, v), which is the MERGE scope.
+        return str(explicit_key)
+    in_attrs = data.get("key")
+    if isinstance(in_attrs, str) and in_attrs:
+        return in_attrs
+    return make_stable_key(
+        data.get("relation"),
+        data.get("source_file"),
+        data.get("source_location"),
+    )
+
+
 def to_cypher(G: nx.Graph, output_path: str) -> None:
     lines = ["// Neo4j Cypher import - generated by /graphify", ""]
     for node_id, data in G.nodes(data=True):
@@ -628,17 +664,33 @@ def to_cypher(G: nx.Graph, output_path: str) -> None:
         )
         lines.append(f"MERGE (n:{ftype} {{id: '{node_id_esc}', label: '{label}'}});")
     lines.append("")
-    for u, v, data in G.edges(data=True):
+    # Preserve EVERY parallel edge (PR 6 go/no-go gate). Neo4j MERGE deduplicates
+    # on the relationship pattern, so two parallel edges between the same (a, b)
+    # with the same relation type would collapse to one unless we give each a
+    # distinguishing property inside the MERGE pattern. We emit a stable
+    # `edge_key` (the MultiDiGraph positional key when present, else a derived
+    # make_stable_key) so distinct keys -> distinct relationships. For simple
+    # graphs this adds one `edge_key` property to the existing single MERGE per
+    # edge — required for correctness, harmless for re-runs (MERGE is idempotent
+    # on the now-richer pattern). All values flow through `_cypher_escape`.
+    is_multi = isinstance(G, (nx.MultiGraph, nx.MultiDiGraph))
+    edge_iter = (
+        G.edges(keys=True, data=True)
+        if is_multi
+        else ((u, v, None, data) for u, v, data in G.edges(data=True))
+    )
+    for u, v, ekey, data in edge_iter:
         rel = _cypher_label(
             (data.get("relation", "RELATES_TO") or "RELATES_TO").upper(),
             "RELATES_TO",
         )
         conf = _cypher_escape(data.get("confidence", "EXTRACTED"))
+        edge_key = _cypher_escape(_edge_distinguishing_key(data, ekey))
         u_esc = _cypher_escape(u)
         v_esc = _cypher_escape(v)
         lines.append(
             f"MATCH (a {{id: '{u_esc}'}}), (b {{id: '{v_esc}'}}) "
-            f"MERGE (a)-[:{rel} {{confidence: '{conf}'}}]->(b);"
+            f"MERGE (a)-[:{rel} {{edge_key: '{edge_key}', confidence: '{conf}'}}]->(b);"
         )
     with open(output_path, "w", encoding="utf-8") as f:  # nosec
         f.write("\n".join(lines))
@@ -789,24 +841,56 @@ def to_html(
     # (stashed by build.py for exactly this reason): undirected NetworkX
     # canonicalizes endpoint order, which would otherwise flip the arrow
     # for `calls` and `rationale_for` in the rendered graph (#563).
+    #
+    # Visual-noise cap (PR 6): at most DEFAULT_RELATIONSHIP_CAP parallel edges
+    # are drawn per (u, v) pair; any overflow is collapsed into ONE summary edge
+    # labelled "(+K more, N total)" from the relationship envelope. This is an
+    # intentional, documented summarization — every parallel edge is still
+    # preserved losslessly by to_json / to_graphml. Simple graphs (one edge per
+    # pair) are unaffected: shown == the single edge, no summary edge added.
     vis_edges = []
-    for u, v, data in G.edges(data=True):
-        confidence = data.get("confidence", "EXTRACTED")
-        relation = data.get("relation", "")
-        true_src = data.get("_src", u)
-        true_tgt = data.get("_tgt", v)
-        vis_edges.append(
-            {
-                "from": true_src,
-                "to": true_tgt,
-                "label": relation,
-                "title": _html.escape(f"{relation} [{confidence}]"),
-                "dashes": confidence != "EXTRACTED",
-                "width": 2 if confidence == "EXTRACTED" else 1,
-                "color": {"opacity": 0.7 if confidence == "EXTRACTED" else 0.35},
-                "confidence": confidence,
-            }
-        )
+    cap = DEFAULT_RELATIONSHIP_CAP
+    seen_pairs: set[tuple[Any, Any]] = set()
+    for u, v in G.edges():
+        if (u, v) in seen_pairs:
+            continue  # edge_datas returns all parallels for the pair at once
+        seen_pairs.add((u, v))
+        records = edge_datas(G, u, v)
+        shown = records[:cap]
+        for data in shown:
+            confidence = data.get("confidence", "EXTRACTED")
+            relation = data.get("relation", "")
+            true_src = data.get("_src", u)
+            true_tgt = data.get("_tgt", v)
+            vis_edges.append(
+                {
+                    "from": true_src,
+                    "to": true_tgt,
+                    "label": relation,
+                    "title": _html.escape(f"{relation} [{confidence}]"),
+                    "dashes": confidence != "EXTRACTED",
+                    "width": 2 if confidence == "EXTRACTED" else 1,
+                    "color": {"opacity": 0.7 if confidence == "EXTRACTED" else 0.35},
+                    "confidence": confidence,
+                }
+            )
+        if len(records) > cap:
+            summary = format_relationship_envelope(G, u, v, cap=cap, directed_only=True)
+            rep = shown[0] if shown else (records[0] if records else {})
+            true_src = rep.get("_src", u)
+            true_tgt = rep.get("_tgt", v)
+            vis_edges.append(
+                {
+                    "from": true_src,
+                    "to": true_tgt,
+                    "label": summary,
+                    "title": _html.escape(summary),
+                    "dashes": True,
+                    "width": 1,
+                    "color": {"opacity": 0.35},
+                    "confidence": "SUMMARY",
+                }
+            )
 
     # Build community legend data
     legend_data = []
@@ -970,16 +1054,29 @@ def _dominant_confidence(node_id: str) -> str:
             lines.append(f"  - {tag}")
         lines += ["---", "", f"# {label}", ""]
 
-        # Outgoing edges as wikilinks
+        # Outgoing edges as wikilinks. Render the FULL bundled relation summary
+        # per neighbor (PR 6 gate + PR 5 read-surface consistency) instead of
+        # only the first parallel edge. Gate on unique-relation count exactly
+        # like PR 5: a single relation keeps the historical byte-stable
+        # `` `{relation}` [{confidence}] `` form (so simple-graph vaults are
+        # unchanged), while multiple relations render the capped envelope
+        # bundle (e.g. "calls, imports, contains" or "... (+K more, N total)").
         neighbors = list(G.neighbors(node_id))
         if neighbors:
             lines.append("## Connections")
             for neighbor in sorted(neighbors, key=lambda n: G.nodes[n].get("label", n)):
-                edata = edge_data(G, node_id, neighbor)
                 neighbor_label = node_filename[neighbor]
-                relation = edata.get("relation", "")
-                confidence = edata.get("confidence", "EXTRACTED")
-                lines.append(f"- [[{neighbor_label}]] - `{relation}` [{confidence}]")
+                envelope = relationship_envelope(G, node_id, neighbor, directed_only=True)
+                if len(envelope["relations"]) <= 1:
+                    edata = edge_data(G, node_id, neighbor)
+                    relation = edata.get("relation", "")
+                    confidence = edata.get("confidence", "EXTRACTED")
+                    lines.append(f"- [[{neighbor_label}]] - `{relation}` [{confidence}]")
+                else:
+                    summary = format_relationship_envelope(
+                        G, node_id, neighbor, directed_only=True
+                    )
+                    lines.append(f"- [[{neighbor_label}]] - {summary}")
             lines.append("")
 
         # Inline tags at bottom of note body (for Obsidian tag panel)
@@ -1273,27 +1370,78 @@ def safe_name(label: str) -> str:
                 }
             )
 
-    # Generate edges - only between nodes both in canvas, cap at 200 highest-weight
-    all_edges_weighted: list[tuple[float, str, str, str]] = []
-    for u, v, edata in G.edges(data=True):
-        if u in all_canvas_nodes and v in all_canvas_nodes:
+    # Generate edges - only between nodes both in canvas, cap at 200 highest-weight.
+    #
+    # Obsidian Canvas requires GLOBALLY UNIQUE edge ids; the previous endpoint-only
+    # `e_{u}_{v}` id silently collapsed parallel edges to one. We now emit a unique
+    # `e_{u}_{v}_{idx}` per drawn parallel edge. To bound visual noise (PR 6
+    # requirement) we draw at most DEFAULT_RELATIONSHIP_CAP parallel edges per
+    # (u, v) pair; when more exist we draw the capped set PLUS one summary edge
+    # labelled "(+K more, N total)" via the relationship envelope. This is an
+    # intentional, documented summarization — the full edge set still survives
+    # losslessly in to_json / to_graphml.
+    pair_records: dict[tuple[str, str], list[dict]] = {}
+    for u, v in G.edges():
+        if u not in all_canvas_nodes or v not in all_canvas_nodes:
+            continue
+        if (u, v) in pair_records:
+            continue  # edge_datas returns all parallels for the pair at once
+        pair_records[(u, v)] = edge_datas(G, u, v)
+
+    cap = DEFAULT_RELATIONSHIP_CAP
+    # Two-phase selection so synthetic summary edges are strictly ADDITIVE and
+    # never displace real edges from the 200-edge global cap:
+    #   1. Build the REAL drawn edges (at most `cap` parallels per pair), sort by
+    #      weight desc, and truncate to the top 200. This preserves the original
+    #      "200 highest-weight real edges" contract exactly.
+    #   2. AFTER truncation, append one overflow summary edge for each (u, v) pair
+    #      that (a) had > cap parallels AND (b) still has at least one real edge in
+    #      the surviving top-200 set. Summaries describe already-counted overflow,
+    #      so they must not consume a real-edge slot; a previously-displaced real
+    #      edge could otherwise be evicted by a `float("inf")` summary (the bug
+    #      this replaces). Summaries are not weight-ranked and are not subject to
+    #      the 200-cap themselves.
+    real_weighted: list[tuple[float, str, str, int, str]] = []
+    overflow_pairs: dict[tuple[str, str], int] = {}
+    for (u, v), records in sorted(
+        pair_records.items(), key=lambda kv: (str(kv[0][0]), str(kv[0][1]))
+    ):
+        for idx, edata in enumerate(records[:cap]):
             weight = edata.get("weight", 1.0)
             relation = edata.get("relation", "")
             conf = edata.get("confidence", "EXTRACTED")
             label = f"{relation} [{conf}]" if relation else f"[{conf}]"
-            all_edges_weighted.append((weight, u, v, label))
+            real_weighted.append((weight, u, v, idx, label))
+        if len(records) > cap:
+            overflow_pairs[(u, v)] = len(records)
 
-    all_edges_weighted.sort(key=lambda x: -x[0])
-    for weight, u, v, label in all_edges_weighted[:200]:
+    real_weighted.sort(key=lambda x: (-x[0], x[1], x[2], x[3]))
+    surviving_real = real_weighted[:200]
+    for weight, u, v, idx, label in surviving_real:
         canvas_edges.append(
             {
-                "id": f"e_{u}_{v}",
+                "id": f"e_{u}_{v}_{idx}",
                 "fromNode": f"n_{u}",
                 "toNode": f"n_{v}",
                 "label": label,
             }
         )
 
+    # Append summary edges only for overflow pairs that survived the 200-cap.
+    surviving_pairs = {(u, v) for _w, u, v, _idx, _lbl in surviving_real}
+    for (u, v) in sorted(overflow_pairs, key=lambda p: (str(p[0]), str(p[1]))):
+        if (u, v) not in surviving_pairs:
+            continue  # pair fully displaced by the 200-cap — no summary needed
+        summary_label = format_relationship_envelope(G, u, v, cap=cap, directed_only=True)
+        canvas_edges.append(
+            {
+                "id": f"e_{u}_{v}_summary",
+                "fromNode": f"n_{u}",
+                "toNode": f"n_{v}",
+                "label": summary_label,
+            }
+        )
+
     canvas_data = {"nodes": canvas_nodes, "edges": canvas_edges}
     Path(output_path).write_text(json.dumps(canvas_data, indent=2), encoding="utf-8")  # nosec
 
@@ -1380,6 +1528,18 @@ def to_graphml(
     node_community = _node_community_map(communities)
     for node_id in H.nodes():
         H.nodes[node_id]["community"] = node_community.get(node_id, -1)
+    # GraphML only serializes scalar (str/int/float/bool) data values. The
+    # multigraph build path stashes a `graphify_multigraph_diagnostics` dict on
+    # G.graph, which would raise "GraphML does not support type <class 'dict'>"
+    # and abort the write (losing ALL edges, parallel ones included). Drop any
+    # non-scalar graph-level attrs so multigraph exports succeed losslessly;
+    # simple graphs carry no such attrs and are unaffected (byte-stable).
+    for attr_name in [
+        name
+        for name, value in H.graph.items()
+        if not isinstance(value, (str, int, float, bool))
+    ]:
+        del H.graph[attr_name]
     nx.write_graphml(H, output_path)
 
 
@@ -1422,22 +1582,51 @@ def to_svg(
     ]
     node_sizes = [300 + 1200 * (degree.get(n, 1) / max_deg) for n in G.nodes()]
 
-    # Draw edges - dashed for non-EXTRACTED
-    for u, v, data in G.edges(data=True):
-        conf = data.get("confidence", "EXTRACTED")
-        style = "solid" if conf == "EXTRACTED" else "dashed"
-        alpha = 0.6 if conf == "EXTRACTED" else 0.3
-        x0, y0 = pos[u]
-        x1, y1 = pos[v]
-        ax.plot(
-            [x0, x1],
-            [y0, y1],
-            color="#aaaaaa",
-            linewidth=0.8,
-            linestyle=style,
-            alpha=alpha,
-            zorder=1,
-        )
+    # Draw edges - dashed for non-EXTRACTED.
+    #
+    # Visual-noise cap (PR 6): parallel edges between the same pair overlap
+    # exactly on the spring layout, so drawing all of them is pure clutter. We
+    # draw at most DEFAULT_RELATIONSHIP_CAP per (u, v) pair and, when more exist,
+    # add ONE summary text label "(+K more, N total)" at the edge midpoint from
+    # the relationship envelope. Intentional, documented summarization — the full
+    # edge set still survives losslessly in to_json / to_graphml. Simple graphs
+    # (one edge per pair) draw exactly as before with no summary label.
+    cap = DEFAULT_RELATIONSHIP_CAP
+    seen_pairs: set[tuple[Any, Any]] = set()
+    for u, v in G.edges():
+        if (u, v) in seen_pairs:
+            continue  # edge_datas returns all parallels for the pair at once
+        seen_pairs.add((u, v))
+        records = edge_datas(G, u, v)
+        for data in records[:cap]:
+            conf = data.get("confidence", "EXTRACTED")
+            style = "solid" if conf == "EXTRACTED" else "dashed"
+            alpha = 0.6 if conf == "EXTRACTED" else 0.3
+            x0, y0 = pos[u]
+            x1, y1 = pos[v]
+            ax.plot(
+                [x0, x1],
+                [y0, y1],
+                color="#aaaaaa",
+                linewidth=0.8,
+                linestyle=style,
+                alpha=alpha,
+                zorder=1,
+            )
+        if len(records) > cap:
+            x0, y0 = pos[u]
+            x1, y1 = pos[v]
+            summary = format_relationship_envelope(G, u, v, cap=cap, directed_only=True)
+            ax.text(
+                (x0 + x1) / 2,
+                (y0 + y1) / 2,
+                summary,
+                color="#cccccc",
+                fontsize=6,
+                ha="center",
+                va="center",
+                zorder=2,
+            )
 
     nx.draw_networkx_nodes(G, pos, ax=ax, node_color=node_colors, node_size=node_sizes, alpha=0.9)
     nx.draw_networkx_labels(
diff --git a/graphify/wiki.py b/graphify/wiki.py
index 2b5158104..89272e001 100644
--- a/graphify/wiki.py
+++ b/graphify/wiki.py
@@ -5,7 +5,7 @@
 from pathlib import Path
 import networkx as nx
 
-from graphify.build import edge_data
+from graphify.build import edge_datas
 
 
 def _safe_filename(name: str) -> str:
@@ -53,12 +53,18 @@ def _community_article(
     top_nodes = sorted(nodes, key=lambda n: G.degree(n), reverse=True)[:25]
     cross = _cross_community_links(G, nodes, cid, labels, node_community or {})
 
-    # Edge confidence breakdown
+    # Edge confidence breakdown. On a MultiDiGraph a neighbor can be reached by
+    # several parallel edges (calls/imports/contains) with distinct confidences;
+    # count EVERY parallel edge so the audit-trail percentages reflect the full
+    # edge population, not just the first edge per neighbor. On a simple graph
+    # edge_datas() returns the single edge dict, so the historical
+    # count-once-per-directed-neighbor behavior (each undirected edge counted from
+    # both endpoints) is preserved byte-for-byte.
     conf_counts: Counter = Counter()
     for nid in nodes:
         for neighbor in G.neighbors(nid):
-            ed = edge_data(G, nid, neighbor)
-            conf_counts[ed.get("confidence", "EXTRACTED")] += 1
+            for ed in edge_datas(G, nid, neighbor):
+                conf_counts[ed.get("confidence", "EXTRACTED")] += 1
     total_edges = sum(conf_counts.values()) or 1
 
     sources = sorted({G.nodes[n].get("source_file") or "" for n in nodes} - {""})
@@ -125,16 +131,30 @@ def _god_node_article(
     if community_name:
         lines += [f"**Community:** [[{community_name}]]", ""]
 
-    # Group neighbors by relation type
+    # Group neighbors by relation type. A neighbor reached by several parallel
+    # edges (calls/imports/contains) must appear under EVERY distinct relation,
+    # not just the first edge's relation — otherwise parallel relationships are
+    # silently dropped from the wiki. Enumerate all parallel records via
+    # edge_datas() and file the neighbor once per distinct relation, preserving
+    # the historical per-relation `[conf]` suffix for the single-edge case
+    # (exactly one confidence on that relation → show it; multiple distinct
+    # confidences on the same relation → omit, to keep the line deterministic).
     by_relation: dict[str, list[str]] = {}
     for neighbor in sorted(G.neighbors(nid), key=lambda n: G.degree(n), reverse=True):
         nd = G.nodes[neighbor]
-        ed = edge_data(G, nid, neighbor)
-        rel = ed.get("relation", "related")
         neighbor_label = nd.get("label", neighbor)
-        conf = ed.get("confidence", "")
-        conf_str = f" `{conf}`" if conf else ""
-        by_relation.setdefault(rel, []).append(f"[[{neighbor_label}]]{conf_str}")
+        # Map each distinct relation on this neighbor to the set of confidences
+        # carried by the parallel edges under that relation.
+        rel_confs: dict[str, set[str]] = {}
+        for ed in edge_datas(G, nid, neighbor):
+            rel = ed.get("relation", "related")
+            conf = ed.get("confidence", "")
+            rel_confs.setdefault(rel, set())
+            if conf:
+                rel_confs[rel].add(conf)
+        for rel, confs in rel_confs.items():
+            conf_str = f" `{next(iter(confs))}`" if len(confs) == 1 else ""
+            by_relation.setdefault(rel, []).append(f"[[{neighbor_label}]]{conf_str}")
 
     lines += ["## Connections by Relation", ""]
     for rel, targets in sorted(by_relation.items()):
diff --git a/tests/test_export_multigraph.py b/tests/test_export_multigraph.py
new file mode 100644
index 000000000..734d18eed
--- /dev/null
+++ b/tests/test_export_multigraph.py
@@ -0,0 +1,545 @@
+"""Export round-trip and parallel-edge fidelity tests for MultiDiGraph (PR 6).
+
+PR 6 go/no-go gate: "Every export either preserves every parallel edge OR
+documents and tests an intentional projection/summarization."
+
+These tests exercise the four fixed exporters (``to_cypher``, ``to_obsidian``,
+``to_canvas``, ``to_html``/``to_svg``) plus the natively-lossless ``to_json`` /
+``to_graphml`` round-trips, and pin the simple-graph regression strings so the
+single-relation path stays byte-stable against the pre-PR6 output.
+
+Fixture style mirrors ``tests/test_export.py`` (tempfile + ``build_from_json``).
+"""
+
+import json
+import re
+import tempfile
+from pathlib import Path
+
+import networkx as nx
+from networkx.readwrite import json_graph
+
+from graphify.build import build_from_json
+from graphify.edge_identity import make_stable_key
+from graphify.export import (
+    to_canvas,
+    to_cypher,
+    to_graphml,
+    to_html,
+    to_json,
+    to_obsidian,
+    to_svg,
+)
+from graphify.projections import DEFAULT_RELATIONSHIP_CAP
+
+# Relations on the A->B pair (3 parallel edges, distinct source_location).
+AB_RELATIONS = ["calls", "imports", "contains"]
+# Relations on the C->D pair (5 parallel edges, above DEFAULT_RELATIONSHIP_CAP).
+CD_RELATIONS = ["calls", "imports", "contains", "extends", "uses"]
+
+
+def make_multigraph() -> nx.MultiDiGraph:
+    """Build a MultiDiGraph with three pairs:
+
+    - ``A->B``: 3 parallel edges (calls/imports/contains), distinct locations.
+    - ``C->D``: 5 parallel edges (> cap), distinct locations.
+    - ``E->F``: a single-edge simple-graph control inside the multigraph.
+    """
+    nodes = [
+        {
+            "id": n,
+            "label": n.upper(),
+            "file_type": "code",
+            "source_file": f"{n}.py",
+            "source_location": "L1",
+        }
+        for n in ("a", "b", "c", "d", "e", "f")
+    ]
+    edges = (
+        [
+            {
+                "source": "a",
+                "target": "b",
+                "relation": rel,
+                "confidence": "EXTRACTED",
+                "source_file": "a.py",
+                "source_location": f"L{i}",
+            }
+            for i, rel in enumerate(AB_RELATIONS)
+        ]
+        + [
+            {
+                "source": "c",
+                "target": "d",
+                "relation": rel,
+                "confidence": "EXTRACTED",
+                "source_file": "c.py",
+                "source_location": f"L{i}",
+            }
+            for i, rel in enumerate(CD_RELATIONS)
+        ]
+        + [
+            {
+                "source": "e",
+                "target": "f",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "e.py",
+                "source_location": "L1",
+            }
+        ]
+    )
+    G = build_from_json({"nodes": nodes, "edges": edges}, multigraph=True)
+    assert isinstance(G, nx.MultiDiGraph)
+    # Sanity: 3 + 5 + 1 = 9 parallel edges preserved at build time.
+    assert G.number_of_edges() == 9
+    return G
+
+
+def make_simple_digraph() -> nx.DiGraph:
+    """Single-relation directed control graph for byte-stability regression."""
+    extraction = {
+        "nodes": [
+            {
+                "id": "A",
+                "label": "Alpha",
+                "file_type": "code",
+                "source_file": "a.py",
+                "source_location": "L1",
+            },
+            {
+                "id": "B",
+                "label": "Beta",
+                "file_type": "code",
+                "source_file": "b.py",
+                "source_location": "L2",
+            },
+        ],
+        "edges": [
+            {
+                "source": "A",
+                "target": "B",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "a.py",
+                "source_location": "L1",
+            }
+        ],
+    }
+    G = build_from_json(extraction, directed=True)
+    assert isinstance(G, nx.DiGraph)
+    return G
+
+
+COMMUNITIES = {0: ["a", "b", "c", "d", "e", "f"]}
+
+
+# ── Lossless round-trips (preserve every parallel edge) ──────────────────────
+
+
+def test_json_roundtrip_preserves_all_parallel_edges():
+    """to_json -> node_link_graph reconstructs every parallel edge."""
+    G = make_multigraph()
+    original = G.number_of_edges()
+    with tempfile.TemporaryDirectory() as tmp:
+        out = Path(tmp) / "graph.json"
+        to_json(G, COMMUNITIES, str(out), force=True)
+        data = json.loads(out.read_text())
+        # node_link_data stamps multigraph/directed flags so the loader
+        # reconstructs a MultiDiGraph automatically.
+        assert data.get("multigraph") is True
+        assert data.get("directed") is True
+        G2 = json_graph.node_link_graph(data, edges="links")
+    assert isinstance(G2, nx.MultiDiGraph)
+    assert G2.number_of_edges() == original == 9
+
+
+def test_graphml_roundtrip_preserves_parallel_edges():
+    """write_graphml -> read_graphml preserves the parallel edge count."""
+    G = make_multigraph()
+    original = G.number_of_edges()
+    with tempfile.TemporaryDirectory() as tmp:
+        out = Path(tmp) / "graph.graphml"
+        # Must not raise on the multigraph diagnostics graph-attr (dict value).
+        to_graphml(G, COMMUNITIES, str(out))
+        G2 = nx.read_graphml(out)
+    assert G2.is_multigraph()
+    assert G2.number_of_edges() == original == 9
+
+
+# ── Cypher: one distinct relationship per parallel edge ──────────────────────
+
+
+def test_cypher_emits_distinct_edge_per_parallel():
+    """Each parallel edge produces its own MERGE with a distinct edge_key."""
+    G = make_multigraph()
+    with tempfile.TemporaryDirectory() as tmp:
+        out = Path(tmp) / "cypher.txt"
+        to_cypher(G, str(out))
+        content = out.read_text()
+
+    merge_lines = [ln for ln in content.splitlines() if ln.startswith("MATCH")]
+    # One MERGE per parallel edge — no Neo4j-side collapse.
+    assert len(merge_lines) == G.number_of_edges() == 9
+
+    edge_keys = re.findall(r"edge_key: '([^']+)'", content)
+    assert len(edge_keys) == 9
+    # Every emitted relationship carries a globally distinct distinguishing key.
+    assert len(set(edge_keys)) == 9
+
+    # The three A->B parallel edges all sit between the same endpoints but keep
+    # distinct keys, so MERGE treats them as three relationships, not one.
+    ab_lines = [
+        ln
+        for ln in merge_lines
+        if "{id: 'a'}" in ln and "{id: 'b'}" in ln
+    ]
+    assert len(ab_lines) == 3
+    ab_keys = set()
+    for ln in ab_lines:
+        m = re.search(r"edge_key: '([^']+)'", ln)
+        assert m is not None
+        ab_keys.add(m.group(1))
+    assert len(ab_keys) == 3
+
+
+# ── Canvas: globally unique edge ids + visual cap summary ────────────────────
+
+
+def test_canvas_edge_ids_unique():
+    """Every canvas edge id is unique (no parallel-edge id collisions)."""
+    G = make_multigraph()
+    with tempfile.TemporaryDirectory() as tmp:
+        out = Path(tmp) / "graph.canvas"
+        to_canvas(G, COMMUNITIES, str(out))
+        data = json.loads(out.read_text())
+
+    edge_ids = [e["id"] for e in data["edges"]]
+    assert edge_ids, "canvas should contain edges"
+    assert len(edge_ids) == len(set(edge_ids)), "canvas edge ids must be unique"
+
+    # Golden / deterministic ordering for the A->B trio (3 <= cap, all drawn).
+    ab_ids = sorted(
+        e["id"]
+        for e in data["edges"]
+        if e["fromNode"] == "n_a" and e["toNode"] == "n_b"
+    )
+    assert ab_ids == ["e_a_b_0", "e_a_b_1", "e_a_b_2"]
+
+
+def test_canvas_visual_cap_summary():
+    """A >cap pair draws at most cap+1 canvas edges with an overflow summary."""
+    G = make_multigraph()
+    cap = DEFAULT_RELATIONSHIP_CAP
+    with tempfile.TemporaryDirectory() as tmp:
+        out = Path(tmp) / "graph.canvas"
+        to_canvas(G, COMMUNITIES, str(out))
+        data = json.loads(out.read_text())
+
+    cd_edges = [
+        e for e in data["edges"] if e["fromNode"] == "n_c" and e["toNode"] == "n_d"
+    ]
+    # 5 parallel edges -> cap drawn + 1 summary edge.
+    assert len(cd_edges) == cap + 1
+    cd_ids = sorted(e["id"] for e in cd_edges)
+    assert cd_ids == ["e_c_d_0", "e_c_d_1", "e_c_d_2", "e_c_d_summary"]
+
+    summary = next(e for e in cd_edges if e["id"] == "e_c_d_summary")
+    # Envelope overflow text: "(+K more, N total)".
+    assert "more" in summary["label"]
+    assert "5 total" in summary["label"]
+
+
+# ── Obsidian: all relations per neighbor (capped when > cap) ─────────────────
+
+
+def test_obsidian_shows_all_relations():
+    """to_obsidian lists every relation to a neighbor, capped when above cap."""
+    G = make_multigraph()
+    with tempfile.TemporaryDirectory() as tmp:
+        out = Path(tmp)
+        to_obsidian(G, COMMUNITIES, str(out))
+        a_note = (out / "A.md").read_text()
+        c_note = (out / "C.md").read_text()
+
+    a_conn = [ln for ln in a_note.splitlines() if ln.startswith("- [[")]
+    assert len(a_conn) == 1
+    # All three A->B relations are listed, not just the first edge. Assert on the
+    # SET of relations present (and the wikilink prefix) rather than a pinned
+    # joined order, so a future envelope ordering change does not false-positive.
+    assert a_conn[0].startswith("- [[B]] - ")
+    for rel in AB_RELATIONS:
+        assert rel in a_conn[0]
+    # No overflow marker — 3 relations is within DEFAULT_RELATIONSHIP_CAP.
+    assert "more" not in a_conn[0]
+
+    # The 5-relation C->D bundle renders the capped envelope form.
+    c_conn = [ln for ln in c_note.splitlines() if ln.startswith("- [[")]
+    assert len(c_conn) == 1
+    assert "more" in c_conn[0]
+    assert "5 total" in c_conn[0]
+
+
+# ── HTML / SVG: visual cap + summary label ───────────────────────────────────
+
+
+def test_html_svg_visual_cap():
+    """HTML and SVG cap parallel edges and surface an overflow summary label."""
+    G = make_multigraph()
+    cap = DEFAULT_RELATIONSHIP_CAP
+    with tempfile.TemporaryDirectory() as tmp:
+        html_out = Path(tmp) / "graph.html"
+        to_html(G, COMMUNITIES, str(html_out))
+        html = html_out.read_text()
+
+        svg_out = Path(tmp) / "graph.svg"
+        to_svg(G, COMMUNITIES, str(svg_out), community_labels={0: "Group 0"})
+        svg = svg_out.read_text()
+
+    # Summary label for the 5-parallel C->D pair appears in both surfaces.
+    assert "5 total" in html
+    assert f"+{len(CD_RELATIONS) - cap} more" in html
+    assert "5 total" in svg
+
+    # The HTML edge dataset draws at most cap "real" C->D edges plus one summary.
+    # Parse RAW_EDGES out of the embedded script to count C->D draws precisely.
+    m = re.search(r"const RAW_EDGES = (\[.*?\]);", html, re.DOTALL)
+    assert m, "RAW_EDGES array must be embedded in the HTML"
+    raw_edges = json.loads(m.group(1))
+    cd_real = [
+        e
+        for e in raw_edges
+        if e.get("from") == "c"
+        and e.get("to") == "d"
+        and e.get("confidence") != "SUMMARY"
+    ]
+    cd_summary = [
+        e
+        for e in raw_edges
+        if e.get("from") == "c"
+        and e.get("to") == "d"
+        and e.get("confidence") == "SUMMARY"
+    ]
+    assert len(cd_real) == cap
+    assert len(cd_summary) == 1
+
+
+# ── Regression: canvas summary edges must not evict real edges (BLOCK 1) ──────
+
+
+def test_canvas_summary_does_not_displace_real_edges_over_cap():
+    """With > 200 real edges, the 200-cap keeps the highest-weight REAL edges and
+    summary edges are strictly additive (never evict a real edge).
+
+    Reproduces the priority-inversion bug: summary edges were pushed into the
+    weighted top-200 selection with ``float("inf")`` weight, sorting to the FRONT
+    and displacing the 201st-highest-weight real edge. A graph with 210 ascending-
+    weight single-edge pairs PLUS one low-weight 5-parallel overflow pair must:
+      - emit exactly 200 real edges (no summary stealing a real slot),
+      - retain the highest-weight real edge,
+      - drop the lowest-weight real edge (legitimately over the 200-cap).
+    """
+    G = nx.MultiDiGraph()
+    members: list[str] = []
+    for i in range(210):
+        a, b = f"a{i}", f"b{i}"
+        G.add_node(a, label=a)
+        G.add_node(b, label=b)
+        # Ascending weights 1..210 so ordering is unambiguous.
+        G.add_edge(
+            a,
+            b,
+            relation="calls",
+            confidence="EXTRACTED",
+            source_file="f.py",
+            source_location=f"L{i}",
+            weight=float(i + 1),
+        )
+        members += [a, b]
+    # Low-weight overflow pair (5 parallels) — its reals are below the cap line.
+    G.add_node("X", label="X")
+    G.add_node("Y", label="Y")
+    for j in range(5):
+        G.add_edge(
+            "X",
+            "Y",
+            relation=f"r{j}",
+            confidence="EXTRACTED",
+            source_file="x.py",
+            source_location=f"LX{j}",
+            weight=0.1,
+        )
+    members += ["X", "Y"]
+
+    with tempfile.TemporaryDirectory() as tmp:
+        out = Path(tmp) / "graph.canvas"
+        to_canvas(G, {0: members}, str(out))
+        edges = json.loads(out.read_text())["edges"]
+
+    real = [e for e in edges if not e["id"].endswith("_summary")]
+    # Real edges are capped at EXACTLY 200 — a summary never consumed a real slot.
+    assert len(real) == 200
+    # Highest-weight real edge survives; lowest-weight one is legitimately dropped.
+    assert any(e["fromNode"] == "n_a209" and e["toNode"] == "n_b209" for e in real)
+    assert not any(e["fromNode"] == "n_a0" and e["toNode"] == "n_b0" for e in real)
+    # All ids remain globally unique.
+    ids = [e["id"] for e in edges]
+    assert len(ids) == len(set(ids))
+
+
+def test_canvas_summary_additive_when_overflow_pair_survives():
+    """When a high-weight overflow pair survives the 200-cap, its summary edge is
+    ADDED on top of the 200 real edges (total > 200), not in place of one."""
+    G = nx.MultiDiGraph()
+    members: list[str] = []
+    for i in range(199):
+        a, b = f"a{i}", f"b{i}"
+        G.add_node(a, label=a)
+        G.add_node(b, label=b)
+        G.add_edge(
+            a,
+            b,
+            relation="calls",
+            confidence="EXTRACTED",
+            source_file="f.py",
+            source_location=f"L{i}",
+            weight=1.0,
+        )
+        members += [a, b]
+    G.add_node("X", label="X")
+    G.add_node("Y", label="Y")
+    for j in range(5):
+        G.add_edge(
+            "X",
+            "Y",
+            relation=f"r{j}",
+            confidence="EXTRACTED",
+            source_file="x.py",
+            source_location=f"LX{j}",
+            weight=100.0,  # high weight -> overflow pair's reals survive the cap
+        )
+    members += ["X", "Y"]
+
+    with tempfile.TemporaryDirectory() as tmp:
+        out = Path(tmp) / "graph.canvas"
+        to_canvas(G, {0: members}, str(out))
+        edges = json.loads(out.read_text())["edges"]
+
+    real = [e for e in edges if not e["id"].endswith("_summary")]
+    summary = [e for e in edges if e["id"].endswith("_summary")]
+    assert len(real) == 200  # real edges still capped at 200
+    assert len(summary) == 1  # summary additive (201 total)
+    xy_summary = [e for e in summary if e["fromNode"] == "n_X" and e["toNode"] == "n_Y"]
+    assert len(xy_summary) == 1
+    assert "5 total" in xy_summary[0]["label"]
+    ids = [e["id"] for e in edges]
+    assert len(ids) == len(set(ids))
+
+
+# ── Regression: integer positional keys distinguish parallels (BLOCK 2) ───────
+
+
+def test_cypher_distinguishes_parallels_with_identical_identity_fields():
+    """Parallel edges that share IDENTICAL relation/source_file/source_location
+    still get DISTINCT edge_keys, so Neo4j MERGE preserves all of them.
+
+    Reproduces the integer-key drop bug: a directly-constructed MultiDiGraph
+    yields INTEGER positional keys (0, 1, 2…). The old ``isinstance(key, str)``
+    guard discarded them and fell back to make_stable_key(relation, file,
+    location) — identical for every edge here — collapsing them to ONE edge_key
+    and letting MERGE dedup the parallels. The fix accepts any non-None
+    positional key (stringified), which NetworkX guarantees unique per (u, v).
+    """
+    G = nx.MultiDiGraph()
+    G.add_node("A", label="Alpha", file_type="code")
+    G.add_node("B", label="Beta", file_type="code")
+    # Three parallel edges, byte-identical semantic identity fields.
+    for _ in range(3):
+        G.add_edge(
+            "A",
+            "B",
+            relation="calls",
+            confidence="EXTRACTED",
+            source_file="a.py",
+            source_location="L1",
+        )
+    # Positional keys are integers (NetworkX default).
+    positional_keys = [k for _u, _v, k in G.edges(keys=True)]
+    assert positional_keys == [0, 1, 2]
+    assert all(isinstance(k, int) for k in positional_keys)
+
+    with tempfile.TemporaryDirectory() as tmp:
+        out = Path(tmp) / "cypher.txt"
+        to_cypher(G, str(out))
+        content = out.read_text()
+
+    merge_lines = [ln for ln in content.splitlines() if ln.startswith("MATCH")]
+    assert len(merge_lines) == G.number_of_edges() == 3
+    edge_keys = re.findall(r"edge_key: '([^']+)'", content)
+    assert len(edge_keys) == 3
+    # The crux: distinct edge_key per parallel edge despite identical identity
+    # fields — count distinct == parallel count, so MERGE keeps all three.
+    assert len(set(edge_keys)) == 3
+
+
+# ── Simple-graph regression: byte-stable single-relation output ──────────────
+
+
+def test_export_simple_graph_regression():
+    """Single-relation DiGraph output is pinned exactly (pre-PR6 stability).
+
+    The Cypher line gains a documented `edge_key` property (required so Neo4j
+    MERGE never collapses parallel edges); the canvas id gains a `_0` parallel
+    suffix. Obsidian's single-relation Connections line is byte-identical to the
+    pre-PR6 ``- [[label]] - `relation` [confidence]`` form.
+    """
+    G = make_simple_digraph()
+    comm = {0: ["A", "B"]}
+    expected_key = make_stable_key("calls", "a.py", "L1")
+
+    with tempfile.TemporaryDirectory() as tmp:
+        # Cypher — exact line including the new edge_key property.
+        cypher_out = Path(tmp) / "cypher.txt"
+        to_cypher(G, str(cypher_out))
+        cypher_lines = [
+            ln for ln in cypher_out.read_text().splitlines() if ln.startswith("MATCH")
+        ]
+        assert cypher_lines == [
+            "MATCH (a {id: 'A'}), (b {id: 'B'}) "
+            f"MERGE (a)-[:CALLS {{edge_key: '{expected_key}', confidence: 'EXTRACTED'}}]->(b);"
+        ]
+
+        # Canvas — single edge keeps deterministic `_0` parallel suffix.
+        canvas_out = Path(tmp) / "graph.canvas"
+        to_canvas(G, comm, str(canvas_out))
+        canvas_edges = json.loads(canvas_out.read_text())["edges"]
+        assert canvas_edges == [
+            {
+                "id": "e_A_B_0",
+                "fromNode": "n_A",
+                "toNode": "n_B",
+                "label": "calls [EXTRACTED]",
+            }
+        ]
+
+        # Obsidian — byte-identical to the historical single-relation form.
+        obs_out = Path(tmp) / "vault"
+        to_obsidian(G, comm, str(obs_out))
+        conn_lines = [
+            ln
+            for ln in (obs_out / "Alpha.md").read_text().splitlines()
+            if ln.startswith("- [[")
+        ]
+        assert conn_lines == ["- [[Beta]] - `calls` [EXTRACTED]"]
+
+        # HTML — single edge, no summary edge injected.
+        html_out = Path(tmp) / "graph.html"
+        to_html(G, comm, str(html_out))
+        html = html_out.read_text()
+        m = re.search(r"const RAW_EDGES = (\[.*?\]);", html, re.DOTALL)
+        assert m
+        raw_edges = json.loads(m.group(1))
+        assert len(raw_edges) == 1
+        assert raw_edges[0]["from"] == "A"
+        assert raw_edges[0]["to"] == "B"
+        assert raw_edges[0]["confidence"] == "EXTRACTED"
diff --git a/tests/test_report.py b/tests/test_report.py
index 0f21fd40d..00aaafc73 100644
--- a/tests/test_report.py
+++ b/tests/test_report.py
@@ -193,3 +193,56 @@ def test_report_god_node_degree_not_inflated():
     gods = god_nodes(G)
     hub_entry = next(g for g in gods if g["label"] == "hub")
     assert hub_entry["degree"] == 3, f"Expected 3 unique neighbors, got {hub_entry['degree']}"
+
+
+# --- PR 6: parallel-edge preservation in per-edge report surfaces ---
+
+
+def test_report_preserves_parallel_inferred_edges():
+    """MultiDiGraph with parallel edges between one pair: every per-edge report
+    surface must preserve ALL parallel edges, not collapse to one (PR 6).
+
+    report.py iterates ``G.edges(data=True)`` for confidence stats, the INFERRED
+    count/avg, and the ambiguous-edges list — on a MultiDiGraph that yields every
+    parallel edge. This confirms the no-collapse contract:
+      - INFERRED count reflects all 3 parallel inferred edges (not 1)
+      - INFERRED avg confidence averages all 3 distinct scores (0.5/0.7/0.9 -> 0.7)
+      - the ambiguous-edges section emits one line per parallel ambiguous edge
+    """
+    G = nx.MultiDiGraph()
+    G.add_node("A", label="alpha", type="entity", source_file="a.py")
+    G.add_node("B", label="beta", type="entity", source_file="b.py")
+    # 3 parallel INFERRED edges between the SAME pair, distinct scores.
+    G.add_edge("A", "B", relation="calls", confidence="INFERRED", confidence_score=0.5)
+    G.add_edge("A", "B", relation="imports", confidence="INFERRED", confidence_score=0.7)
+    G.add_edge("A", "B", relation="contains", confidence="INFERRED", confidence_score=0.9)
+    # 2 parallel AMBIGUOUS edges between the SAME pair, distinct relations.
+    G.add_edge("A", "B", relation="maybe_uses", confidence="AMBIGUOUS")
+    G.add_edge("A", "B", relation="maybe_refs", confidence="AMBIGUOUS")
+    assert G.number_of_edges() == 5
+    report = _minimal_report(G)
+    # All 3 parallel inferred edges are counted (collapse would show "1 edges").
+    assert "INFERRED: 3 edges" in report
+    # Average over all 3 parallel scores, proving each edge contributed.
+    assert "avg confidence: 0.7" in report
+    # The ambiguous-edges list preserves one line per parallel ambiguous edge.
+    assert "relation: maybe_uses" in report
+    assert "relation: maybe_refs" in report
+
+
+def test_report_multigraph_edge_count_unchanged_semantics():
+    """PR4B total-vs-unique-pairs edge-count line still renders correctly on a
+    multigraph with parallel edges (regression for PR 6)."""
+    G = nx.MultiDiGraph()
+    G.add_node("A", label="alpha", type="entity", source_file="a.py")
+    G.add_node("B", label="beta", type="entity", source_file="b.py")
+    G.add_node("C", label="gamma", type="entity", source_file="c.py")
+    # 2 unique pairs, 5 total edges (3 parallel A->B, 2 parallel B->C).
+    G.add_edge("A", "B", relation="calls", confidence="EXTRACTED")
+    G.add_edge("A", "B", relation="imports", confidence="EXTRACTED")
+    G.add_edge("A", "B", relation="contains", confidence="EXTRACTED")
+    G.add_edge("B", "C", relation="calls", confidence="EXTRACTED")
+    G.add_edge("B", "C", relation="imports", confidence="EXTRACTED")
+    assert G.number_of_edges() == 5
+    report = _minimal_report(G)
+    assert "5 edges (2 unique pairs)" in report
diff --git a/tests/test_wiki.py b/tests/test_wiki.py
index 09279a814..78c7ab128 100644
--- a/tests/test_wiki.py
+++ b/tests/test_wiki.py
@@ -225,3 +225,58 @@ def test_community_article_handles_null_source_file(tmp_path):
     # Must not raise TypeError
     to_wiki(G, communities, tmp_path, community_labels=labels)
     assert (tmp_path / "index.md").exists()
+
+
+# PR 6: parallel-edge preservation in the god-node "Connections by Relation" section.
+
+
+def test_wiki_neighbor_appears_under_all_parallel_relations(tmp_path):
+    """A neighbor reached by 3 parallel edges must appear under ALL three relation
+    groups in the god-node article, not just the first edge's relation (PR 6)."""
+    G = nx.MultiDiGraph()
+    G.add_node("A", label="alpha", file_type="code", source_file="a.py")
+    G.add_node("B", label="beta", file_type="code", source_file="b.py")
+    # Three parallel relationships A->B, each with a distinct confidence.
+    G.add_edge("A", "B", relation="calls", confidence="EXTRACTED")
+    G.add_edge("A", "B", relation="imports", confidence="INFERRED")
+    G.add_edge("A", "B", relation="contains", confidence="AMBIGUOUS")
+    communities = {0: ["A", "B"]}
+    labels = {0: "Core Logic"}
+    gods = [{"id": "A", "label": "alpha", "degree": 3}]
+    to_wiki(G, communities, tmp_path, community_labels=labels, god_nodes_data=gods)
+    article = (tmp_path / "alpha.md").read_text()
+    # beta must be filed under EVERY distinct relation, not just one.
+    assert "### calls" in article
+    assert "### imports" in article
+    assert "### contains" in article
+    # The neighbor link appears once per relation group (3 total).
+    assert article.count("[[beta]]") == 3
+    # Per-relation confidence suffix is preserved for each single-edge relation.
+    assert "[[beta]] `EXTRACTED`" in article
+    assert "[[beta]] `INFERRED`" in article
+    assert "[[beta]] `AMBIGUOUS`" in article
+
+
+def test_wiki_simple_graph_regression(tmp_path):
+    """Simple DiGraph: a single-relation neighbor appears under exactly one
+    relation with the historical `[conf]` format, byte-stable (PR 6)."""
+    G = nx.DiGraph()
+    G.add_node("A", label="alpha", file_type="code", source_file="a.py")
+    G.add_node("B", label="beta", file_type="code", source_file="b.py")
+    G.add_node("C", label="gamma", file_type="code", source_file="c.py")
+    G.add_edge("A", "B", relation="calls", confidence="EXTRACTED", weight=1.0)
+    G.add_edge("A", "C", relation="imports", confidence="INFERRED", weight=1.0)
+    communities = {0: ["A", "B", "C"]}
+    labels = {0: "Core Logic"}
+    gods = [{"id": "A", "label": "alpha", "degree": 2}]
+    to_wiki(G, communities, tmp_path, community_labels=labels, god_nodes_data=gods)
+    article = (tmp_path / "alpha.md").read_text()
+    # Each neighbor appears under exactly one relation with its `[conf]` suffix,
+    # matching the pre-PR6 single-edge output format byte-for-byte.
+    assert "### calls" in article
+    assert "### imports" in article
+    assert "- [[beta]] `EXTRACTED`" in article
+    assert "- [[gamma]] `INFERRED`" in article
+    # No relation group lists a neighbor more than once on a simple graph.
+    assert article.count("[[beta]]") == 1
+    assert article.count("[[gamma]]") == 1

From 50c0cd3b3e89c08184fce8efb441c0e14e87ad9d Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Fri, 29 May 2026 20:53:15 -0500
Subject: [PATCH 13/21] fix(multigraph): keep canvas edge ids globally unique

Codex verification found that PR 6 canvas ids could still collide when node ids contained underscores, for example a_b->c and a->b_c. Add a deterministic digest fallback while preserving readable ids when unique.

Also isolate report-generation tests from optional clustering imports, apply ruff formatting across the rebased stack, and refresh the lockfile package version to match pyproject.
---
 graphify/export.py                 |  48 ++++-
 graphify/extract.py                |  30 ++--
 graphify/llm.py                    |   6 +-
 graphify/watch.py                  |  25 +--
 tests/test_claude_cli_backend.py   |  27 +--
 tests/test_export_multigraph.py    |  49 +++---
 tests/test_install.py              |   2 +
 tests/test_languages.py            |  32 ++--
 tests/test_report.py               |   5 +-
 tests/test_watch.py                |   4 +-
 uv.lock                            |   2 +-
 worked/example/raw/api.py          |  13 +-
 worked/example/raw/parser.py       |   2 +
 worked/example/raw/processor.py    |  13 +-
 worked/example/raw/storage.py      |   1 +
 worked/example/raw/validator.py    |   1 +
 worked/httpx/raw/auth.py           |   3 +
 worked/httpx/raw/client.py         |   5 +-
 worked/httpx/raw/exceptions.py     |   2 +
 worked/httpx/raw/models.py         |   1 +
 worked/httpx/raw/transport.py      |   1 +
 worked/httpx/raw/utils.py          |   7 +-
 worked/mixed-corpus/raw/analyze.py | 269 +++++++++++++++++------------
 worked/mixed-corpus/raw/build.py   |   5 +-
 worked/mixed-corpus/raw/cluster.py |   7 +-
 25 files changed, 338 insertions(+), 222 deletions(-)

diff --git a/graphify/export.py b/graphify/export.py
index 6aae689cd..5ba9ef833 100644
--- a/graphify/export.py
+++ b/graphify/export.py
@@ -653,6 +653,39 @@ def _edge_distinguishing_key(data: dict, explicit_key: object | None = None) ->
     )
 
 
+def _canvas_edge_id(
+    source: object,
+    target: object,
+    suffix: object,
+    used_ids: set[str],
+) -> str:
+    """Return a deterministic, globally unique Canvas edge id.
+
+    The readable legacy shape, ``e_{source}_{target}_{suffix}``, can collide when
+    node ids themselves contain underscores (``a_b -> c`` vs ``a -> b_c``). Keep
+    that readable id when it is unique, but fall back to a short digest of the
+    structured tuple when a collision is detected.
+    """
+    readable = f"e_{source}_{target}_{suffix}"
+    if readable not in used_ids:
+        used_ids.add(readable)
+        return readable
+
+    payload = json.dumps(
+        [str(source), str(target), str(suffix)],
+        ensure_ascii=True,
+        separators=(",", ":"),
+    )
+    digest = hashlib.sha256(payload.encode("utf-8")).hexdigest()[:12]
+    candidate = f"{readable}_{digest}"
+    counter = 1
+    while candidate in used_ids:
+        counter += 1
+        candidate = f"{readable}_{digest}_{counter}"
+    used_ids.add(candidate)
+    return candidate
+
+
 def to_cypher(G: nx.Graph, output_path: str) -> None:
     lines = ["// Neo4j Cypher import - generated by /graphify", ""]
     for node_id, data in G.nodes(data=True):
@@ -1073,9 +1106,7 @@ def _dominant_confidence(node_id: str) -> str:
                     confidence = edata.get("confidence", "EXTRACTED")
                     lines.append(f"- [[{neighbor_label}]] - `{relation}` [{confidence}]")
                 else:
-                    summary = format_relationship_envelope(
-                        G, node_id, neighbor, directed_only=True
-                    )
+                    summary = format_relationship_envelope(G, node_id, neighbor, directed_only=True)
                     lines.append(f"- [[{neighbor_label}]] - {summary}")
             lines.append("")
 
@@ -1417,10 +1448,11 @@ def safe_name(label: str) -> str:
 
     real_weighted.sort(key=lambda x: (-x[0], x[1], x[2], x[3]))
     surviving_real = real_weighted[:200]
+    used_edge_ids: set[str] = set()
     for weight, u, v, idx, label in surviving_real:
         canvas_edges.append(
             {
-                "id": f"e_{u}_{v}_{idx}",
+                "id": _canvas_edge_id(u, v, idx, used_edge_ids),
                 "fromNode": f"n_{u}",
                 "toNode": f"n_{v}",
                 "label": label,
@@ -1429,13 +1461,13 @@ def safe_name(label: str) -> str:
 
     # Append summary edges only for overflow pairs that survived the 200-cap.
     surviving_pairs = {(u, v) for _w, u, v, _idx, _lbl in surviving_real}
-    for (u, v) in sorted(overflow_pairs, key=lambda p: (str(p[0]), str(p[1]))):
+    for u, v in sorted(overflow_pairs, key=lambda p: (str(p[0]), str(p[1]))):
         if (u, v) not in surviving_pairs:
             continue  # pair fully displaced by the 200-cap — no summary needed
         summary_label = format_relationship_envelope(G, u, v, cap=cap, directed_only=True)
         canvas_edges.append(
             {
-                "id": f"e_{u}_{v}_summary",
+                "id": _canvas_edge_id(u, v, "summary", used_edge_ids),
                 "fromNode": f"n_{u}",
                 "toNode": f"n_{v}",
                 "label": summary_label,
@@ -1535,9 +1567,7 @@ def to_graphml(
     # non-scalar graph-level attrs so multigraph exports succeed losslessly;
     # simple graphs carry no such attrs and are unaffected (byte-stable).
     for attr_name in [
-        name
-        for name, value in H.graph.items()
-        if not isinstance(value, (str, int, float, bool))
+        name for name, value in H.graph.items() if not isinstance(value, (str, int, float, bool))
     ]:
         del H.graph[attr_name]
     nx.write_graphml(H, output_path)
diff --git a/graphify/extract.py b/graphify/extract.py
index 42b90c6e3..96413d6bf 100644
--- a/graphify/extract.py
+++ b/graphify/extract.py
@@ -1990,9 +1990,11 @@ def _js_extra_walk(
         parent = node.parent
         is_module_level = parent is not None and (
             parent.type == "program"
-            or (parent.type == "export_statement"
+            or (
+                parent.type == "export_statement"
                 and parent.parent is not None
-                and parent.parent.type == "program")
+                and parent.parent.type == "program"
+            )
         )
 
         # Arrow function declarations and module-level const literals (lexical_declaration only)
@@ -2366,17 +2368,19 @@ def _import_lua(node, source: bytes, file_nid: str, stem: str, edges: list, str_
         if raw_module:
             tgt_nid = _resolve_lua_import_target(raw_module, str_path)
             if tgt_nid:
-                edges.append({
-                    "source": file_nid,
-                    "target": tgt_nid,
-                    "relation": "imports",
-                    "context": "import",
-                    "confidence": "EXTRACTED",
-                    "confidence_score": 1.0,
-                    "source_file": str_path,
-                    "source_location": str(node.start_point[0] + 1),
-                    "weight": 1.0,
-                })
+                edges.append(
+                    {
+                        "source": file_nid,
+                        "target": tgt_nid,
+                        "relation": "imports",
+                        "context": "import",
+                        "confidence": "EXTRACTED",
+                        "confidence_score": 1.0,
+                        "source_file": str_path,
+                        "source_location": str(node.start_point[0] + 1),
+                        "weight": 1.0,
+                    }
+                )
 
 
 _LUA_CONFIG = LanguageConfig(
diff --git a/graphify/llm.py b/graphify/llm.py
index 343d3e0cd..743594401 100644
--- a/graphify/llm.py
+++ b/graphify/llm.py
@@ -553,8 +553,10 @@ def _call_claude_cli(user_message: str, max_tokens: int = 8192, *, deep_mode: bo
     # Replacing the default prompt eliminates the conflict at the source.
     # Side benefit: cache-creation tokens per call drop ~19% in practice.
     cli_args = [
-        claude_cmd, "-p",
-        "--output-format", "json",
+        claude_cmd,
+        "-p",
+        "--output-format",
+        "json",
         "--no-session-persistence",
         "--system-prompt",
         _extraction_system(deep=deep_mode),
diff --git a/graphify/watch.py b/graphify/watch.py
index 50647f967..1f32db828 100644
--- a/graphify/watch.py
+++ b/graphify/watch.py
@@ -414,8 +414,10 @@ def _rebuild_code(
             _queue_pending(out, list(changed_paths))
         with _rebuild_lock(out, blocking=block_on_lock) as got:
             if not got:
-                print("[graphify watch] Rebuild already in progress for "
-                      f"{watch_path.resolve()} - changes queued.")
+                print(
+                    "[graphify watch] Rebuild already in progress for "
+                    f"{watch_path.resolve()} - changes queued."
+                )
                 return False
             # Lock acquired. Drain anything queued by earlier contenders
             # (including, importantly, the paths we just queued ourselves)
@@ -445,14 +447,17 @@ def _rebuild_code(
                     late = _drain_pending(out)
                     if not late:
                         break
-                    ok = _rebuild_code(
-                        watch_path,
-                        changed_paths=late,
-                        follow_symlinks=follow_symlinks,
-                        force=force,
-                        no_cluster=no_cluster,
-                        acquire_lock=False,
-                    ) and ok
+                    ok = (
+                        _rebuild_code(
+                            watch_path,
+                            changed_paths=late,
+                            follow_symlinks=follow_symlinks,
+                            force=force,
+                            no_cluster=no_cluster,
+                            acquire_lock=False,
+                        )
+                        and ok
+                    )
             return ok
 
     watch_root = watch_path.resolve()
diff --git a/tests/test_claude_cli_backend.py b/tests/test_claude_cli_backend.py
index 1b2c1fa28..e5c38fab9 100644
--- a/tests/test_claude_cli_backend.py
+++ b/tests/test_claude_cli_backend.py
@@ -158,9 +158,11 @@ def fake_which(name):
             "claude.cmd": r"C:\Users\u\AppData\Roaming\npm\claude.cmd",
         }.get(name)
 
-    with patch("platform.system", return_value="Windows"), \
-         patch("shutil.which", side_effect=fake_which), \
-         patch("subprocess.run", return_value=completed) as run:
+    with (
+        patch("platform.system", return_value="Windows"),
+        patch("shutil.which", side_effect=fake_which),
+        patch("subprocess.run", return_value=completed) as run,
+    ):
         llm._call_claude_cli("dummy", max_tokens=8192)
 
     argv = run.call_args.args[0]
@@ -183,9 +185,11 @@ def fake_which(name):
             return "/usr/local/bin/claude"
         return None
 
-    with patch("platform.system", return_value="Windows"), \
-         patch("shutil.which", side_effect=fake_which), \
-         patch("subprocess.run", return_value=completed) as run:
+    with (
+        patch("platform.system", return_value="Windows"),
+        patch("shutil.which", side_effect=fake_which),
+        patch("subprocess.run", return_value=completed) as run,
+    ):
         llm._call_claude_cli("dummy", max_tokens=8192)
 
     argv = run.call_args.args[0]
@@ -195,8 +199,7 @@ def fake_which(name):
 def test_windows_raises_when_neither_cmd_nor_bare_claude_present():
     """If neither `claude.cmd` nor `claude` are on PATH on Windows,
     raise the standard not-found error."""
-    with patch("platform.system", return_value="Windows"), \
-         patch("shutil.which", return_value=None):
+    with patch("platform.system", return_value="Windows"), patch("shutil.which", return_value=None):
         with pytest.raises(RuntimeError, match="Claude Code CLI not found"):
             llm._call_claude_cli("dummy", max_tokens=8192)
 
@@ -207,9 +210,11 @@ def test_non_windows_uses_bare_claude(monkeypatch):
     completed = MagicMock(returncode=0, stdout=json.dumps(_ENVELOPE), stderr="")
     monkeypatch.setattr(llm, "_response_is_hollow", lambda raw, parsed: False)
 
-    with patch("platform.system", return_value="Linux"), \
-         patch("shutil.which", return_value="/usr/local/bin/claude"), \
-         patch("subprocess.run", return_value=completed) as run:
+    with (
+        patch("platform.system", return_value="Linux"),
+        patch("shutil.which", return_value="/usr/local/bin/claude"),
+        patch("subprocess.run", return_value=completed) as run,
+    ):
         llm._call_claude_cli("dummy", max_tokens=8192)
 
     argv = run.call_args.args[0]
diff --git a/tests/test_export_multigraph.py b/tests/test_export_multigraph.py
index 734d18eed..30b9e25cc 100644
--- a/tests/test_export_multigraph.py
+++ b/tests/test_export_multigraph.py
@@ -189,11 +189,7 @@ def test_cypher_emits_distinct_edge_per_parallel():
 
     # The three A->B parallel edges all sit between the same endpoints but keep
     # distinct keys, so MERGE treats them as three relationships, not one.
-    ab_lines = [
-        ln
-        for ln in merge_lines
-        if "{id: 'a'}" in ln and "{id: 'b'}" in ln
-    ]
+    ab_lines = [ln for ln in merge_lines if "{id: 'a'}" in ln and "{id: 'b'}" in ln]
     assert len(ab_lines) == 3
     ab_keys = set()
     for ln in ab_lines:
@@ -220,13 +216,30 @@ def test_canvas_edge_ids_unique():
 
     # Golden / deterministic ordering for the A->B trio (3 <= cap, all drawn).
     ab_ids = sorted(
-        e["id"]
-        for e in data["edges"]
-        if e["fromNode"] == "n_a" and e["toNode"] == "n_b"
+        e["id"] for e in data["edges"] if e["fromNode"] == "n_a" and e["toNode"] == "n_b"
     )
     assert ab_ids == ["e_a_b_0", "e_a_b_1", "e_a_b_2"]
 
 
+def test_canvas_edge_ids_unique_when_node_ids_contain_underscores():
+    """Tuple-concatenated ids must not collide for ambiguous underscore splits."""
+    G = nx.MultiDiGraph()
+    for node_id in ["a_b", "c", "a", "b_c"]:
+        G.add_node(node_id, label=node_id)
+    G.add_edge("a_b", "c", relation="r", confidence="EXTRACTED", weight=1.0)
+    G.add_edge("a", "b_c", relation="s", confidence="EXTRACTED", weight=1.0)
+
+    with tempfile.TemporaryDirectory() as tmp:
+        out = Path(tmp) / "graph.canvas"
+        to_canvas(G, {0: list(G.nodes)}, str(out))
+        data = json.loads(out.read_text())
+
+    edge_ids = [edge["id"] for edge in data["edges"]]
+    assert len(edge_ids) == 2
+    assert len(edge_ids) == len(set(edge_ids))
+    assert all(edge_id.startswith("e_a_b_c_0") for edge_id in edge_ids)
+
+
 def test_canvas_visual_cap_summary():
     """A >cap pair draws at most cap+1 canvas edges with an overflow summary."""
     G = make_multigraph()
@@ -236,9 +249,7 @@ def test_canvas_visual_cap_summary():
         to_canvas(G, COMMUNITIES, str(out))
         data = json.loads(out.read_text())
 
-    cd_edges = [
-        e for e in data["edges"] if e["fromNode"] == "n_c" and e["toNode"] == "n_d"
-    ]
+    cd_edges = [e for e in data["edges"] if e["fromNode"] == "n_c" and e["toNode"] == "n_d"]
     # 5 parallel edges -> cap drawn + 1 summary edge.
     assert len(cd_edges) == cap + 1
     cd_ids = sorted(e["id"] for e in cd_edges)
@@ -309,16 +320,12 @@ def test_html_svg_visual_cap():
     cd_real = [
         e
         for e in raw_edges
-        if e.get("from") == "c"
-        and e.get("to") == "d"
-        and e.get("confidence") != "SUMMARY"
+        if e.get("from") == "c" and e.get("to") == "d" and e.get("confidence") != "SUMMARY"
     ]
     cd_summary = [
         e
         for e in raw_edges
-        if e.get("from") == "c"
-        and e.get("to") == "d"
-        and e.get("confidence") == "SUMMARY"
+        if e.get("from") == "c" and e.get("to") == "d" and e.get("confidence") == "SUMMARY"
     ]
     assert len(cd_real) == cap
     assert len(cd_summary) == 1
@@ -501,9 +508,7 @@ def test_export_simple_graph_regression():
         # Cypher — exact line including the new edge_key property.
         cypher_out = Path(tmp) / "cypher.txt"
         to_cypher(G, str(cypher_out))
-        cypher_lines = [
-            ln for ln in cypher_out.read_text().splitlines() if ln.startswith("MATCH")
-        ]
+        cypher_lines = [ln for ln in cypher_out.read_text().splitlines() if ln.startswith("MATCH")]
         assert cypher_lines == [
             "MATCH (a {id: 'A'}), (b {id: 'B'}) "
             f"MERGE (a)-[:CALLS {{edge_key: '{expected_key}', confidence: 'EXTRACTED'}}]->(b);"
@@ -526,9 +531,7 @@ def test_export_simple_graph_regression():
         obs_out = Path(tmp) / "vault"
         to_obsidian(G, comm, str(obs_out))
         conn_lines = [
-            ln
-            for ln in (obs_out / "Alpha.md").read_text().splitlines()
-            if ln.startswith("- [[")
+            ln for ln in (obs_out / "Alpha.md").read_text().splitlines() if ln.startswith("- [[")
         ]
         assert conn_lines == ["- [[Beta]] - `calls` [EXTRACTED]"]
 
diff --git a/tests/test_install.py b/tests/test_install.py
index 23a4309e5..94616255a 100644
--- a/tests/test_install.py
+++ b/tests/test_install.py
@@ -362,6 +362,7 @@ def test_antigravity_uninstall_project_removes_project_skill_only(tmp_path, monk
 def test_antigravity_global_install_writes_gemini_config_skills(tmp_path, monkeypatch):
     """Global `graphify antigravity install` must write to ~/.gemini/config/skills/ (#1079)."""
     from graphify.__main__ import main
+
     home = tmp_path / "home"
     project = tmp_path / "project"
     project.mkdir()
@@ -381,6 +382,7 @@ def test_antigravity_global_install_writes_gemini_config_skills(tmp_path, monkey
 def test_antigravity_global_uninstall_removes_gemini_config_skill(tmp_path, monkeypatch):
     """Global `graphify antigravity uninstall` must remove from ~/.gemini/config/skills/ (#1079)."""
     from graphify.__main__ import main
+
     home = tmp_path / "home"
     project = tmp_path / "project"
     project.mkdir()
diff --git a/tests/test_languages.py b/tests/test_languages.py
index da2d3c167..72d6aab4d 100644
--- a/tests/test_languages.py
+++ b/tests/test_languages.py
@@ -1253,19 +1253,16 @@ def test_js_module_level_arrow_produces_node_and_call_edges(tmp_path):
 
     The scope guard must not accidentally suppress top-level arrow functions.
     """
-    src = (
-        "function helper() { return 1; }\n"
-        "const handler = () => {\n"
-        "  helper();\n"
-        "};\n"
-    )
+    src = "function helper() { return 1; }\nconst handler = () => {\n  helper();\n};\n"
     f = tmp_path / "arrows.js"
     f.write_text(src)
     r = extract_js(f)
     labels = _labels(r)
     relations = _relations(r)
 
-    assert any("handler" in label for label in labels), f"module-level arrow 'handler' missing: {labels}"
+    assert any("handler" in label for label in labels), (
+        f"module-level arrow 'handler' missing: {labels}"
+    )
     assert "calls" in relations, f"expected 'calls' edge from handler->helper: {relations}"
 
 
@@ -1323,8 +1320,10 @@ def test_markdown_skips_fenced_code_blocks():
     """
     r = extract_markdown(FIXTURES / "deploy_guide.md")
     labels = _labels(r)
-    assert not any(label.startswith("code:") for label in labels), \
+    assert not any(label.startswith("code:") for label in labels), (
         f"Expected no code:* nodes after #1077 fix, got: {[label for label in labels if label.startswith('code:')]}"
+    )
+
 
 def test_markdown_contains_edges():
     """Headings should be connected via 'contains' edges (file->h, h->h)."""
@@ -1344,15 +1343,9 @@ def test_markdown_fenced_heading_not_parsed():
     """
     import os
     import tempfile
+
     src = (
-        "# Real Heading\n"
-        "\n"
-        "```bash\n"
-        "## Not A Heading\n"
-        "echo hello\n"
-        "```\n"
-        "\n"
-        "## Another Real Heading\n"
+        "# Real Heading\n\n```bash\n## Not A Heading\necho hello\n```\n\n## Another Real Heading\n"
     )
     with tempfile.NamedTemporaryFile(suffix=".md", mode="w", delete=False) as fh:
         fh.write(src)
@@ -1364,9 +1357,12 @@ def test_markdown_fenced_heading_not_parsed():
         os.unlink(fpath)
 
     assert any("Real Heading" in label for label in labels), f"'Real Heading' missing: {labels}"
-    assert any("Another Real Heading" in label for label in labels), f"'Another Real Heading' missing: {labels}"
-    assert not any("Not A Heading" in label for label in labels), \
+    assert any("Another Real Heading" in label for label in labels), (
+        f"'Another Real Heading' missing: {labels}"
+    )
+    assert not any("Not A Heading" in label for label in labels), (
         f"fenced '## Not A Heading' was incorrectly parsed as a node: {labels}"
+    )
 
 
 def test_markdown_no_dangling_edges():
diff --git a/tests/test_report.py b/tests/test_report.py
index 00aaafc73..20265259c 100644
--- a/tests/test_report.py
+++ b/tests/test_report.py
@@ -2,7 +2,6 @@
 from pathlib import Path
 import networkx as nx
 from graphify.build import build_from_json
-from graphify.cluster import cluster, score_all
 from graphify.analyze import god_nodes, surprising_connections
 from graphify.report import generate
 
@@ -12,8 +11,8 @@
 def make_inputs():
     extraction = json.loads((FIXTURES / "extraction.json").read_text())
     G = build_from_json(extraction)
-    communities = cluster(G)
-    cohesion = score_all(G, communities)
+    communities = {0: list(G.nodes())}
+    cohesion = {0: 0.5}
     labels = {cid: f"Community {cid}" for cid in communities}
     gods = god_nodes(G)
     surprises = surprising_connections(G)
diff --git a/tests/test_watch.py b/tests/test_watch.py
index 7522a8335..7de02888f 100644
--- a/tests/test_watch.py
+++ b/tests/test_watch.py
@@ -545,7 +545,9 @@ def test_queue_and_drain_pending_round_trip(tmp_path):
     assert pending_file.exists()
     # Each path written on its own line.
     assert pending_file.read_text(encoding="utf-8").splitlines() == [
-        "a.py", "sub/b.py", "c.md",
+        "a.py",
+        "sub/b.py",
+        "c.md",
     ]
 
     drained = _drain_pending(out)
diff --git a/uv.lock b/uv.lock
index 17d022cdf..8e24a8df1 100644
--- a/uv.lock
+++ b/uv.lock
@@ -1109,7 +1109,7 @@ wheels = [
 
 [[package]]
 name = "graphifyy"
-version = "0.8.24"
+version = "0.8.25"
 source = { editable = "." }
 dependencies = [
     { name = "datasketch" },
diff --git a/worked/example/raw/api.py b/worked/example/raw/api.py
index 6720e1753..378523dca 100644
--- a/worked/example/raw/api.py
+++ b/worked/example/raw/api.py
@@ -2,6 +2,7 @@
 API module - exposes the document pipeline over HTTP.
 Thin layer over parser, validator, processor, and storage.
 """
+
 from parser import batch_parse, parse_file
 from validator import validate_document, ValidationError
 from processor import process_and_save, enrich_document
@@ -56,11 +57,13 @@ def handle_search(query: str) -> dict:
     for record_id, entry in index.items():
         keywords = set(entry.get("keywords", []))
         if terms & keywords:
-            matches.append({
-                "id": record_id,
-                "title": entry.get("title", ""),
-                "matched_keywords": list(terms & keywords),
-            })
+            matches.append(
+                {
+                    "id": record_id,
+                    "title": entry.get("title", ""),
+                    "matched_keywords": list(terms & keywords),
+                }
+            )
     return {"query": query, "results": matches}
 
 
diff --git a/worked/example/raw/parser.py b/worked/example/raw/parser.py
index 55f807373..74d4ee4c2 100644
--- a/worked/example/raw/parser.py
+++ b/worked/example/raw/parser.py
@@ -2,6 +2,7 @@
 Parser module - reads raw input documents and converts them into
 a structured format the rest of the pipeline can work with.
 """
+
 from validator import validate_document
 from storage import save_parsed
 
@@ -49,6 +50,7 @@ def parse_markdown(text: str) -> dict:
 def parse_json(text: str) -> dict:
     """Parse a JSON document into a structured dict."""
     import json
+
     data = json.loads(text)
     return {"data": data, "format": "json"}
 
diff --git a/worked/example/raw/processor.py b/worked/example/raw/processor.py
index d75bb9a7e..e0ff02bee 100644
--- a/worked/example/raw/processor.py
+++ b/worked/example/raw/processor.py
@@ -2,6 +2,7 @@
 Processor module - transforms validated documents into enriched records
 ready for storage and retrieval.
 """
+
 import re
 from storage import load_index, save_processed
 
@@ -31,11 +32,13 @@ def extract_keywords(text: str) -> list:
 
 def enrich_document(doc: dict) -> dict:
     """Add keyword index and cross-references to a validated document."""
-    text_blob = " ".join([
-        doc.get("title", ""),
-        " ".join(doc.get("sections", [])),
-        " ".join(doc.get("paragraphs", [])),
-    ])
+    text_blob = " ".join(
+        [
+            doc.get("title", ""),
+            " ".join(doc.get("sections", [])),
+            " ".join(doc.get("paragraphs", [])),
+        ]
+    )
     doc["keywords"] = extract_keywords(text_blob)
     doc["cross_refs"] = find_cross_references(doc)
     return doc
diff --git a/worked/example/raw/storage.py b/worked/example/raw/storage.py
index 46e8623dc..c8c069315 100644
--- a/worked/example/raw/storage.py
+++ b/worked/example/raw/storage.py
@@ -2,6 +2,7 @@
 Storage module - persists documents to disk and maintains the search index.
 All other modules read and write through this interface.
 """
+
 import json
 import uuid
 from pathlib import Path
diff --git a/worked/example/raw/validator.py b/worked/example/raw/validator.py
index 0d9550083..d4c01e52b 100644
--- a/worked/example/raw/validator.py
+++ b/worked/example/raw/validator.py
@@ -2,6 +2,7 @@
 Validator module - checks that parsed documents meet schema requirements
 before they are allowed into storage.
 """
+
 from processor import normalize_text
 
 
diff --git a/worked/httpx/raw/auth.py b/worked/httpx/raw/auth.py
index 290cadd3e..82312ec46 100644
--- a/worked/httpx/raw/auth.py
+++ b/worked/httpx/raw/auth.py
@@ -4,6 +4,7 @@
 DigestAuth is the most interesting: it participates in a full request/response cycle,
 reading the 401 response to build the challenge before re-sending.
 """
+
 import hashlib
 import time
 from models import Request, Response
@@ -26,6 +27,7 @@ def __init__(self, username: str, password: str):
 
     def auth_flow(self, request: Request):
         import base64
+
         credentials = f"{self.username}:{self.password}".encode()
         encoded = base64.b64encode(credentials).decode()
         request.headers["Authorization"] = f"Basic {encoded}"
@@ -102,6 +104,7 @@ class NetRCAuth(Auth):
 
     def auth_flow(self, request: Request):
         import netrc
+
         try:
             credentials = netrc.netrc().authenticators(request.url.host)
             if credentials:
diff --git a/worked/httpx/raw/client.py b/worked/httpx/raw/client.py
index d506dd613..03c144ba6 100644
--- a/worked/httpx/raw/client.py
+++ b/worked/httpx/raw/client.py
@@ -3,6 +3,7 @@
 BaseClient holds all shared logic. Client and AsyncClient extend it for sync/async.
 This is the integration hub of the library - it imports from every other module.
 """
+
 from models import Request, Response, URL, Headers, Cookies
 from auth import Auth, BasicAuth
 from transport import BaseTransport, HTTPTransport, AsyncHTTPTransport
@@ -60,7 +61,9 @@ def _build_request(self, method: str, url: str, **kwargs) -> Request:
         for k, v in self._headers.items():
             if k not in headers:
                 headers[k] = v
-        return Request(method, url, headers=headers, content=kwargs.get("content"), cookies=self._cookies)
+        return Request(
+            method, url, headers=headers, content=kwargs.get("content"), cookies=self._cookies
+        )
 
     def _merge_cookies(self, response: Response) -> None:
         for name, value in response.cookies.items():
diff --git a/worked/httpx/raw/exceptions.py b/worked/httpx/raw/exceptions.py
index ff5392fee..f09c24cc7 100644
--- a/worked/httpx/raw/exceptions.py
+++ b/worked/httpx/raw/exceptions.py
@@ -6,6 +6,7 @@
 
 class HTTPError(Exception):
     """Base class for all httpx exceptions."""
+
     def __init__(self, message, *, request=None):
         self.request = request
         super().__init__(message)
@@ -77,6 +78,7 @@ class TooManyRedirects(RequestError):
 
 class HTTPStatusError(HTTPError):
     """A 4xx or 5xx response was received."""
+
     def __init__(self, message, *, request, response):
         self.response = response
         super().__init__(message, request=request)
diff --git a/worked/httpx/raw/models.py b/worked/httpx/raw/models.py
index 80582b6fa..5c3ab9789 100644
--- a/worked/httpx/raw/models.py
+++ b/worked/httpx/raw/models.py
@@ -2,6 +2,7 @@
 Core data models: URL, Headers, Cookies, Request, Response.
 These are the central data types that everything else in the library references.
 """
+
 import json as _json
 from exceptions import HTTPStatusError
 
diff --git a/worked/httpx/raw/transport.py b/worked/httpx/raw/transport.py
index 5bd9b9166..16b7e68a4 100644
--- a/worked/httpx/raw/transport.py
+++ b/worked/httpx/raw/transport.py
@@ -3,6 +3,7 @@
 HTTPTransport wraps a connection pool. ProxyTransport sits in front of it.
 MockTransport is used in tests.
 """
+
 from models import Request, Response
 from exceptions import TransportError, ConnectError, TimeoutException
 
diff --git a/worked/httpx/raw/utils.py b/worked/httpx/raw/utils.py
index 84ca4a3b8..3f22513ed 100644
--- a/worked/httpx/raw/utils.py
+++ b/worked/httpx/raw/utils.py
@@ -2,6 +2,7 @@
 Utility functions shared across the library.
 Small helpers that don't belong in any one module.
 """
+
 import re
 from models import Cookies
 
@@ -54,10 +55,7 @@ def parse_content_type(content_type: str) -> tuple:
 
 def obfuscate_sensitive_headers(headers: dict) -> dict:
     """Return a copy of headers with sensitive values replaced by [obfuscated]."""
-    return {
-        k: "[obfuscated]" if k.lower() in SENSITIVE_HEADERS else v
-        for k, v in headers.items()
-    }
+    return {k: "[obfuscated]" if k.lower() in SENSITIVE_HEADERS else v for k, v in headers.items()}
 
 
 def unset_all_cookies(cookies: Cookies) -> None:
@@ -68,6 +66,7 @@ def unset_all_cookies(cookies: Cookies) -> None:
 def is_known_encoding(encoding: str) -> bool:
     """Check if a character encoding label is recognized by Python's codec system."""
     import codecs
+
     try:
         codecs.lookup(encoding)
         return True
diff --git a/worked/mixed-corpus/raw/analyze.py b/worked/mixed-corpus/raw/analyze.py
index cf5344960..d86ccd8bd 100644
--- a/worked/mixed-corpus/raw/analyze.py
+++ b/worked/mixed-corpus/raw/analyze.py
@@ -1,4 +1,5 @@
 """Graph analysis: god nodes (most connected), surprising connections (cross-community), suggested questions."""
+
 from __future__ import annotations
 import networkx as nx
 
@@ -44,11 +45,13 @@ def god_nodes(G: nx.Graph, top_n: int = 10) -> list[dict]:
     for node_id, deg in sorted_nodes:
         if _is_file_node(G, node_id) or _is_concept_node(G, node_id):
             continue
-        result.append({
-            "id": node_id,
-            "label": G.nodes[node_id].get("label", node_id),
-            "edges": deg,
-        })
+        result.append(
+            {
+                "id": node_id,
+                "label": G.nodes[node_id].get("label", node_id),
+                "edges": deg,
+            }
+        )
         if len(result) >= top_n:
             break
     return result
@@ -74,9 +77,7 @@ def surprising_connections(
     """
     # Identify unique source files (ignore empty/null source_file)
     source_files = {
-        data.get("source_file", "")
-        for _, data in G.nodes(data=True)
-        if data.get("source_file", "")
+        data.get("source_file", "") for _, data in G.nodes(data=True) if data.get("source_file", "")
     }
     is_multi_source = len(source_files) > 1
 
@@ -105,7 +106,23 @@ def _is_concept_node(G: nx.Graph, node_id: str) -> bool:
     return False
 
 
-_CODE_EXTENSIONS = {"py", "ts", "tsx", "js", "go", "rs", "java", "rb", "cpp", "c", "h", "cs", "kt", "scala", "php"}
+_CODE_EXTENSIONS = {
+    "py",
+    "ts",
+    "tsx",
+    "js",
+    "go",
+    "rs",
+    "java",
+    "rb",
+    "cpp",
+    "c",
+    "h",
+    "cs",
+    "kt",
+    "scala",
+    "php",
+}
 _DOC_EXTENSIONS = {"md", "txt", "rst"}
 _PAPER_EXTENSIONS = {"pdf"}
 _IMAGE_EXTENSIONS = {"png", "jpg", "jpeg", "webp", "gif", "svg"}
@@ -213,18 +230,20 @@ def _cross_file_surprises(G: nx.Graph, communities: dict[int, list[str]], top_n:
         score, reasons = _surprise_score(G, u, v, data, node_community, u_source, v_source)
         src_id = data.get("_src", u)
         tgt_id = data.get("_tgt", v)
-        candidates.append({
-            "_score": score,
-            "source": G.nodes[src_id].get("label", src_id),
-            "target": G.nodes[tgt_id].get("label", tgt_id),
-            "source_files": [
-                G.nodes[src_id].get("source_file", ""),
-                G.nodes[tgt_id].get("source_file", ""),
-            ],
-            "confidence": data.get("confidence", "EXTRACTED"),
-            "relation": relation,
-            "why": "; ".join(reasons) if reasons else "cross-file semantic connection",
-        })
+        candidates.append(
+            {
+                "_score": score,
+                "source": G.nodes[src_id].get("label", src_id),
+                "target": G.nodes[tgt_id].get("label", tgt_id),
+                "source_files": [
+                    G.nodes[src_id].get("source_file", ""),
+                    G.nodes[tgt_id].get("source_file", ""),
+                ],
+                "confidence": data.get("confidence", "EXTRACTED"),
+                "relation": relation,
+                "why": "; ".join(reasons) if reasons else "cross-file semantic connection",
+            }
+        )
 
     candidates.sort(key=lambda x: x["_score"], reverse=True)
     for c in candidates:
@@ -257,17 +276,19 @@ def _cross_community_surprises(
         result = []
         for (u, v), score in top_edges:
             data = G.edges[u, v]
-            result.append({
-                "source": G.nodes[u].get("label", u),
-                "target": G.nodes[v].get("label", v),
-                "source_files": [
-                    G.nodes[u].get("source_file", ""),
-                    G.nodes[v].get("source_file", ""),
-                ],
-                "confidence": data.get("confidence", "EXTRACTED"),
-                "relation": data.get("relation", ""),
-                "note": f"Bridges graph structure (betweenness={score:.3f})",
-            })
+            result.append(
+                {
+                    "source": G.nodes[u].get("label", u),
+                    "target": G.nodes[v].get("label", v),
+                    "source_files": [
+                        G.nodes[u].get("source_file", ""),
+                        G.nodes[v].get("source_file", ""),
+                    ],
+                    "confidence": data.get("confidence", "EXTRACTED"),
+                    "relation": data.get("relation", ""),
+                    "note": f"Bridges graph structure (betweenness={score:.3f})",
+                }
+            )
         return result
 
     # Build node → community map
@@ -289,18 +310,20 @@ def _cross_community_surprises(
         confidence = data.get("confidence", "EXTRACTED")
         src_id = data.get("_src", u)
         tgt_id = data.get("_tgt", v)
-        surprises.append({
-            "source": G.nodes[src_id].get("label", src_id),
-            "target": G.nodes[tgt_id].get("label", tgt_id),
-            "source_files": [
-                G.nodes[src_id].get("source_file", ""),
-                G.nodes[tgt_id].get("source_file", ""),
-            ],
-            "confidence": confidence,
-            "relation": relation,
-            "note": f"Bridges community {cid_u} → community {cid_v}",
-            "_pair": tuple(sorted([cid_u, cid_v])),
-        })
+        surprises.append(
+            {
+                "source": G.nodes[src_id].get("label", src_id),
+                "target": G.nodes[tgt_id].get("label", tgt_id),
+                "source_files": [
+                    G.nodes[src_id].get("source_file", ""),
+                    G.nodes[tgt_id].get("source_file", ""),
+                ],
+                "confidence": confidence,
+                "relation": relation,
+                "note": f"Bridges community {cid_u} → community {cid_v}",
+                "_pair": tuple(sorted([cid_u, cid_v])),
+            }
+        )
 
     # Sort: AMBIGUOUS first, then INFERRED, then EXTRACTED
     order = {"AMBIGUOUS": 0, "INFERRED": 1, "EXTRACTED": 2}
@@ -338,35 +361,46 @@ def suggest_questions(
             ul = G.nodes[u].get("label", u)
             vl = G.nodes[v].get("label", v)
             relation = data.get("relation", "related to")
-            questions.append({
-                "type": "ambiguous_edge",
-                "question": f"What is the exact relationship between `{ul}` and `{vl}`?",
-                "why": f"Edge tagged AMBIGUOUS (relation: {relation}) - confidence is low.",
-            })
+            questions.append(
+                {
+                    "type": "ambiguous_edge",
+                    "question": f"What is the exact relationship between `{ul}` and `{vl}`?",
+                    "why": f"Edge tagged AMBIGUOUS (relation: {relation}) - confidence is low.",
+                }
+            )
 
     # 2. Bridge nodes (high betweenness) → cross-cutting concern questions
     if G.number_of_edges() > 0:
         betweenness = nx.betweenness_centrality(G)
         # Top bridge nodes that are NOT file-level hubs
         bridges = sorted(
-            [(n, s) for n, s in betweenness.items()
-             if not _is_file_node(G, n) and not _is_concept_node(G, n) and s > 0],
+            [
+                (n, s)
+                for n, s in betweenness.items()
+                if not _is_file_node(G, n) and not _is_concept_node(G, n) and s > 0
+            ],
             key=lambda x: x[1],
             reverse=True,
         )[:3]
         for node_id, score in bridges:
             label = G.nodes[node_id].get("label", node_id)
             cid = node_community.get(node_id)
-            comm_label = community_labels.get(cid, f"Community {cid}") if cid is not None else "unknown"
+            comm_label = (
+                community_labels.get(cid, f"Community {cid}") if cid is not None else "unknown"
+            )
             neighbors = list(G.neighbors(node_id))
-            neighbor_comms = {node_community.get(n) for n in neighbors if node_community.get(n) != cid}
+            neighbor_comms = {
+                node_community.get(n) for n in neighbors if node_community.get(n) != cid
+            }
             if neighbor_comms:
                 other_labels = [community_labels.get(c, f"Community {c}") for c in neighbor_comms]
-                questions.append({
-                    "type": "bridge_node",
-                    "question": f"Why does `{label}` connect `{comm_label}` to {', '.join(f'`{l}`' for l in other_labels)}?",
-                    "why": f"High betweenness centrality ({score:.3f}) - this node is a cross-community bridge.",
-                })
+                questions.append(
+                    {
+                        "type": "bridge_node",
+                        "question": f"Why does `{label}` connect `{comm_label}` to {', '.join(f'`{l}`' for l in other_labels)}?",
+                        "why": f"High betweenness centrality ({score:.3f}) - this node is a cross-community bridge.",
+                    }
+                )
 
     # 3. God nodes with many INFERRED edges → verification questions
     degree = dict(G.degree())
@@ -377,7 +411,8 @@ def suggest_questions(
     )[:5]
     for node_id, _ in top_nodes:
         inferred = [
-            (u, v, d) for u, v, d in G.edges(node_id, data=True)
+            (u, v, d)
+            for u, v, d in G.edges(node_id, data=True)
             if d.get("confidence") == "INFERRED"
         ]
         if len(inferred) >= 2:
@@ -389,48 +424,58 @@ def suggest_questions(
                 tgt_id = d.get("_tgt", v)
                 other_id = tgt_id if src_id == node_id else src_id
                 others.append(G.nodes[other_id].get("label", other_id))
-            questions.append({
-                "type": "verify_inferred",
-                "question": f"Are the {len(inferred)} inferred relationships involving `{label}` (e.g. with `{others[0]}` and `{others[1]}`) actually correct?",
-                "why": f"`{label}` has {len(inferred)} INFERRED edges - model-reasoned connections that need verification.",
-            })
+            questions.append(
+                {
+                    "type": "verify_inferred",
+                    "question": f"Are the {len(inferred)} inferred relationships involving `{label}` (e.g. with `{others[0]}` and `{others[1]}`) actually correct?",
+                    "why": f"`{label}` has {len(inferred)} INFERRED edges - model-reasoned connections that need verification.",
+                }
+            )
 
     # 4. Isolated or weakly-connected nodes → exploration questions
     isolated = [
-        n for n in G.nodes()
+        n
+        for n in G.nodes()
         if G.degree(n) <= 1 and not _is_file_node(G, n) and not _is_concept_node(G, n)
     ]
     if isolated:
         labels = [G.nodes[n].get("label", n) for n in isolated[:3]]
-        questions.append({
-            "type": "isolated_nodes",
-            "question": f"What connects {', '.join(f'`{l}`' for l in labels)} to the rest of the system?",
-            "why": f"{len(isolated)} weakly-connected nodes found - possible documentation gaps or missing edges.",
-        })
+        questions.append(
+            {
+                "type": "isolated_nodes",
+                "question": f"What connects {', '.join(f'`{l}`' for l in labels)} to the rest of the system?",
+                "why": f"{len(isolated)} weakly-connected nodes found - possible documentation gaps or missing edges.",
+            }
+        )
 
     # 5. Low-cohesion communities → structural questions
     from .cluster import cohesion_score
+
     for cid, nodes in communities.items():
         score = cohesion_score(G, nodes)
         if score < 0.15 and len(nodes) >= 5:
             label = community_labels.get(cid, f"Community {cid}")
-            questions.append({
-                "type": "low_cohesion",
-                "question": f"Should `{label}` be split into smaller, more focused modules?",
-                "why": f"Cohesion score {score} - nodes in this community are weakly interconnected.",
-            })
+            questions.append(
+                {
+                    "type": "low_cohesion",
+                    "question": f"Should `{label}` be split into smaller, more focused modules?",
+                    "why": f"Cohesion score {score} - nodes in this community are weakly interconnected.",
+                }
+            )
 
     if not questions:
-        return [{
-            "type": "no_signal",
-            "question": None,
-            "why": (
-                "Not enough signal to generate questions. "
-                "This usually means the corpus has no AMBIGUOUS edges, no bridge nodes, "
-                "no INFERRED relationships, and all communities are tightly cohesive. "
-                "Add more files or run with --mode deep to extract richer edges."
-            ),
-        }]
+        return [
+            {
+                "type": "no_signal",
+                "question": None,
+                "why": (
+                    "Not enough signal to generate questions. "
+                    "This usually means the corpus has no AMBIGUOUS edges, no bridge nodes, "
+                    "no INFERRED relationships, and all communities are tightly cohesive. "
+                    "Add more files or run with --mode deep to extract richer edges."
+                ),
+            }
+        ]
 
     return questions[:top_n]
 
@@ -453,26 +498,16 @@ def graph_diff(G_old: nx.Graph, G_new: nx.Graph) -> dict:
     added_node_ids = new_nodes - old_nodes
     removed_node_ids = old_nodes - new_nodes
 
-    new_nodes_list = [
-        {"id": n, "label": G_new.nodes[n].get("label", n)}
-        for n in added_node_ids
-    ]
+    new_nodes_list = [{"id": n, "label": G_new.nodes[n].get("label", n)} for n in added_node_ids]
     removed_nodes_list = [
-        {"id": n, "label": G_old.nodes[n].get("label", n)}
-        for n in removed_node_ids
+        {"id": n, "label": G_old.nodes[n].get("label", n)} for n in removed_node_ids
     ]
 
     def edge_key(G: nx.Graph, u: str, v: str, data: dict) -> tuple:
         return (u, v, data.get("relation", ""))
 
-    old_edge_keys = {
-        edge_key(G_old, u, v, d)
-        for u, v, d in G_old.edges(data=True)
-    }
-    new_edge_keys = {
-        edge_key(G_new, u, v, d)
-        for u, v, d in G_new.edges(data=True)
-    }
+    old_edge_keys = {edge_key(G_old, u, v, d) for u, v, d in G_old.edges(data=True)}
+    new_edge_keys = {edge_key(G_new, u, v, d) for u, v, d in G_new.edges(data=True)}
 
     added_edge_keys = new_edge_keys - old_edge_keys
     removed_edge_keys = old_edge_keys - new_edge_keys
@@ -480,22 +515,26 @@ def edge_key(G: nx.Graph, u: str, v: str, data: dict) -> tuple:
     new_edges_list = []
     for u, v, d in G_new.edges(data=True):
         if edge_key(G_new, u, v, d) in added_edge_keys:
-            new_edges_list.append({
-                "source": u,
-                "target": v,
-                "relation": d.get("relation", ""),
-                "confidence": d.get("confidence", ""),
-            })
+            new_edges_list.append(
+                {
+                    "source": u,
+                    "target": v,
+                    "relation": d.get("relation", ""),
+                    "confidence": d.get("confidence", ""),
+                }
+            )
 
     removed_edges_list = []
     for u, v, d in G_old.edges(data=True):
         if edge_key(G_old, u, v, d) in removed_edge_keys:
-            removed_edges_list.append({
-                "source": u,
-                "target": v,
-                "relation": d.get("relation", ""),
-                "confidence": d.get("confidence", ""),
-            })
+            removed_edges_list.append(
+                {
+                    "source": u,
+                    "target": v,
+                    "relation": d.get("relation", ""),
+                    "confidence": d.get("confidence", ""),
+                }
+            )
 
     parts = []
     if new_nodes_list:
@@ -503,9 +542,13 @@ def edge_key(G: nx.Graph, u: str, v: str, data: dict) -> tuple:
     if new_edges_list:
         parts.append(f"{len(new_edges_list)} new edge{'s' if len(new_edges_list) != 1 else ''}")
     if removed_nodes_list:
-        parts.append(f"{len(removed_nodes_list)} node{'s' if len(removed_nodes_list) != 1 else ''} removed")
+        parts.append(
+            f"{len(removed_nodes_list)} node{'s' if len(removed_nodes_list) != 1 else ''} removed"
+        )
     if removed_edges_list:
-        parts.append(f"{len(removed_edges_list)} edge{'s' if len(removed_edges_list) != 1 else ''} removed")
+        parts.append(
+            f"{len(removed_edges_list)} edge{'s' if len(removed_edges_list) != 1 else ''} removed"
+        )
     summary = ", ".join(parts) if parts else "no changes"
 
     return {
diff --git a/worked/mixed-corpus/raw/build.py b/worked/mixed-corpus/raw/build.py
index 655820c04..725313ab2 100644
--- a/worked/mixed-corpus/raw/build.py
+++ b/worked/mixed-corpus/raw/build.py
@@ -10,7 +10,10 @@ def build_from_json(extraction: dict) -> nx.Graph:
     # Dangling edges (stdlib/external imports) are expected - only warn about real schema errors.
     real_errors = [e for e in errors if "does not match any node id" not in e]
     if real_errors:
-        print(f"[graphify] Extraction warning ({len(real_errors)} issues): {real_errors[0]}", file=sys.stderr)
+        print(
+            f"[graphify] Extraction warning ({len(real_errors)} issues): {real_errors[0]}",
+            file=sys.stderr,
+        )
     G = nx.Graph()
     for node in extraction.get("nodes", []):
         G.add_node(node["id"], **{k: v for k, v in node.items() if k != "id"})
diff --git a/worked/mixed-corpus/raw/cluster.py b/worked/mixed-corpus/raw/cluster.py
index b5c97b7c8..4bd68202e 100644
--- a/worked/mixed-corpus/raw/cluster.py
+++ b/worked/mixed-corpus/raw/cluster.py
@@ -1,4 +1,5 @@
 """Leiden community detection on NetworkX graphs. Splits oversized communities. Returns cohesion scores."""
+
 from __future__ import annotations
 import networkx as nx
 
@@ -20,8 +21,9 @@ def build_graph(nodes: list[dict], edges: list[dict]) -> nx.Graph:
         G.add_edge(e["source"], e["target"], **attrs)
     return G
 
-_MAX_COMMUNITY_FRACTION = 0.25   # communities larger than 25% of graph get split
-_MIN_SPLIT_SIZE = 10             # only split if community has at least this many nodes
+
+_MAX_COMMUNITY_FRACTION = 0.25  # communities larger than 25% of graph get split
+_MIN_SPLIT_SIZE = 10  # only split if community has at least this many nodes
 
 
 def cluster(G: nx.Graph) -> dict[int, list[str]]:
@@ -77,6 +79,7 @@ def _split_community(G: nx.Graph, nodes: list[str]) -> list[list[str]]:
         return [[n] for n in sorted(nodes)]
     try:
         from graspologic.partition import leiden
+
         sub_partition: dict[str, int] = leiden(subgraph)
         sub_communities: dict[int, list[str]] = {}
         for node, cid in sub_partition.items():

From 02e2a76ce4fa518e5ca46cb826d8ac9bc5235f78 Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Fri, 29 May 2026 22:13:33 -0500
Subject: [PATCH 14/21] feat(multigraph): PR 7 watch/cache/incremental update
 preserve keyed parallel edges

Stateful per-repo update is now MultiDiGraph-safe with no silent fallback to
simple-graph behavior (the PR 7 go/no-go gate):

- build.py build_merge: removed the multigraph NotImplementedError rejection;
  inherits the saved multigraph flag; prune is now key + source_file aware
  (removes only the parallel edge records whose source_file is evicted, keeping
  same-pair parallels from other files). Simple path byte-stable.
- export.py to_json: persists graphify_profile (graph_type) in graph.json so
  the profile round-trips through save/load (only added "graph" metadata).
- watch.py: _rebuild_code inherits the saved multigraph profile on both the
  clustered and --no-cluster paths (build_from_json(multigraph=...) + profile
  re-stamp) so a rewrite keeps multigraph=true and reloads as a MultiDiGraph
  (fixes a deferred collapse: edges survived but the dropped flag collapsed
  them on next load). Canonical comparison strips graph-level metadata and is
  key-aware (parallels not treated as duplicates). Changed/deleted-file
  eviction is source_file + key aware (evicts stale cross-file edges between
  surviving nodes).
- cache.py: confirmed raw extraction is profile-independent; added
  CACHE_SCHEMA_VERSION invalidation guard (version mismatch / legacy
  unversioned entries -> safe rebuild) without per-profile key fragmentation.
- __main__.py update: now PRESERVES a multidigraph (delegates to the
  profile-inheriting _rebuild_code) instead of refusing/collapsing.

26 new tests across realistic temp-repo scenarios (unchanged-file parallels
persist; changed-file evicts only its parallels; deleted-file removes all its
records; profile preserved through rewrite + reload; simple mode unchanged).
Full suite 1683 passed; ruff + pyright clean.

gost
---
 graphify/__main__.py      |  10 +
 graphify/build.py         |  80 ++++++--
 graphify/cache.py         |  70 ++++++-
 graphify/export.py        |  58 ++++++
 graphify/watch.py         | 126 ++++++++++++-
 tests/test_build.py       | 292 +++++++++++++++++++++++++++--
 tests/test_cache.py       | 111 +++++++++++
 tests/test_export.py      | 113 ++++++++++++
 tests/test_incremental.py | 169 +++++++++++++++++
 tests/test_watch.py       | 377 ++++++++++++++++++++++++++++++++++++++
 10 files changed, 1363 insertions(+), 43 deletions(-)

diff --git a/graphify/__main__.py b/graphify/__main__.py
index b824b910d..70786568f 100644
--- a/graphify/__main__.py
+++ b/graphify/__main__.py
@@ -2709,6 +2709,16 @@ def main() -> None:
         if not watch_path.exists():
             print(f"error: path not found: {watch_path}", file=sys.stderr)
             sys.exit(1)
+
+        # PR 7 go/no-go gate: "no silent fallback to simple graph behavior."
+        # No special handling is needed here: watch._rebuild_code now inherits
+        # the saved graph.json profile (it reads the on-disk `multigraph` flag
+        # and rebuilds via build_from_json(multigraph=...), re-stamping
+        # multigraph/directed + graphify_profile on write). A multidigraph
+        # graph.json therefore round-trips through `graphify update` as a
+        # MultiDiGraph with its keyed parallel edges intact — never silently
+        # collapsed to a simple graph — and simple/digraph graphs update exactly
+        # as before.
         from graphify.watch import _rebuild_code
 
         print(f"Re-extracting code files in {watch_path} (no LLM needed)...")
diff --git a/graphify/build.py b/graphify/build.py
index f7bad0d35..fc955ea34 100644
--- a/graphify/build.py
+++ b/graphify/build.py
@@ -611,7 +611,8 @@ def build_merge(
     dedup: bool = True,
     dedup_llm_backend: str | None = None,
     root: str | Path | None = None,
-) -> nx.Graph | nx.DiGraph:
+    multigraph: bool | None = None,
+) -> nx.Graph | nx.DiGraph | nx.MultiDiGraph:
     """Load existing graph.json, merge new chunks into it, and return the merged graph.
 
     Persistence is the caller's responsibility (e.g., via ``export.to_json``);
@@ -624,6 +625,14 @@ def build_merge(
     ``directed`` defaults to inheriting the saved graph's flag when an
     existing graph.json is present, so updating a directed simple graph with
     default args no longer silently downgrades it to undirected.
+
+    ``multigraph`` likewise defaults to inheriting the saved graph's flag. When
+    the saved graph.json has ``multigraph: true`` the merge produces a
+    MultiDiGraph that preserves keyed parallel edges end-to-end — existing edges
+    keep their stored ``key`` (so distinct parallel edges between the same pair
+    survive the re-feed), new chunks are merged without collapsing parallels, and
+    the result round-trips back out as multigraph. There is no silent fallback to
+    simple-graph behavior.
     """
     graph_path = Path(graph_path)
     if graph_path.exists():
@@ -642,22 +651,25 @@ def build_merge(
             raise TypeError(
                 f"saved graph.json at {graph_path} must be a JSON object, got {type(data).__name__}"
             )
-        # Refuse to silently collapse a saved multigraph. build() runs in
-        # simple mode here, which would drop parallel edges; stateful
-        # multigraph update paths are out of scope for the internal keyed
-        # build path (watch/cache/global-graph land in later slices).
+        # Honor the saved graph's `multigraph` flag so a stateful update of a
+        # multigraph graph.json preserves keyed parallel edges instead of
+        # collapsing to a simple graph. Existing edges keep their stored `key`
+        # when re-fed through build(multigraph=True), so distinct parallel edges
+        # between the same node pair survive the merge round-trip.
         saved_multigraph = data.get("multigraph", False)
-        if saved_multigraph is True:
-            raise NotImplementedError(
-                f"build_merge cannot update a multigraph graph.json. "
-                f"Found multigraph=true in {graph_path}. Rebuild from extraction "
-                f"or use a simple-graph build target."
-            )
-        if saved_multigraph is not False:
+        if saved_multigraph is not True and saved_multigraph is not False:
             raise TypeError(
                 f"'multigraph' in {graph_path} must be a boolean, "
                 f"got {type(saved_multigraph).__name__} ({saved_multigraph!r})"
             )
+        if multigraph is None:
+            multigraph = saved_multigraph
+        elif multigraph != saved_multigraph:
+            print(
+                f"[graphify] WARNING: build_merge multigraph={multigraph} overrides "
+                f"saved graph.json multigraph={saved_multigraph}",
+                file=sys.stderr,
+            )
         # Honor the saved graph's `directed` flag unless the caller explicitly
         # overrides. Without this, an update with default args on a directed
         # graph silently downgrades it and loses edge direction on next export.
@@ -683,12 +695,19 @@ def build_merge(
     else:
         if directed is None:
             directed = False
+        if multigraph is None:
+            multigraph = False
         existing_nodes = []
         base = []
 
     all_chunks = base + list(new_chunks)
     G = build(
-        all_chunks, directed=directed, dedup=dedup, dedup_llm_backend=dedup_llm_backend, root=root
+        all_chunks,
+        directed=directed,
+        dedup=dedup,
+        dedup_llm_backend=dedup_llm_backend,
+        root=root,
+        multigraph=multigraph,
     )
 
     # Prune nodes and edges from deleted source files
@@ -718,17 +737,38 @@ def build_merge(
                 file=sys.stderr,
             )
 
-        edges_to_remove = [
-            (u, v) for u, v, d in G.edges(data=True) if d.get("source_file") in prune_set
-        ]
-        if edges_to_remove:
-            G.remove_edges_from(edges_to_remove)
+        # Prune edges belonging to changed/deleted source files. On a
+        # MultiDiGraph a single (u, v) pair can carry MULTIPLE parallel edges
+        # from DIFFERENT source files, so removal MUST be keyed: drop only the
+        # parallel edges whose source_file is in prune_set and leave parallel
+        # edges from other files between the same pair intact. The two-tuple
+        # remove_edges_from used by simple graphs would drop only one edge per
+        # pair on a multigraph (first key) and could evict the wrong file's edge.
+        # remove_all_parallel_edges is deliberately NOT used here — it is too
+        # broad and would delete other-file parallels between the same pair.
+        if isinstance(G, (nx.MultiGraph, nx.MultiDiGraph)):
+            keyed_to_remove = [
+                (u, v, k)
+                for u, v, k, d in G.edges(keys=True, data=True)
+                if d.get("source_file") in prune_set
+            ]
+            for u, v, k in keyed_to_remove:
+                G.remove_edge(u, v, key=k)
+            n_edges_removed = len(keyed_to_remove)
+        else:
+            edges_to_remove = [
+                (u, v) for u, v, d in G.edges(data=True) if d.get("source_file") in prune_set
+            ]
+            if edges_to_remove:
+                G.remove_edges_from(edges_to_remove)
+            n_edges_removed = len(edges_to_remove)
+        if n_edges_removed:
             print(
-                f"[graphify] Pruned {len(edges_to_remove)} edge(s) from deleted source file(s).",
+                f"[graphify] Pruned {n_edges_removed} edge(s) from deleted source file(s).",
                 file=sys.stderr,
             )
 
-        if not n_nodes and not edges_to_remove:
+        if not n_nodes and not n_edges_removed:
             print(
                 f"[graphify] {n_files} source file(s) deleted since last run — "
                 f"no matching nodes or edges in graph, already clean.",
diff --git a/graphify/cache.py b/graphify/cache.py
index 73fff35e0..18c1a6d72 100644
--- a/graphify/cache.py
+++ b/graphify/cache.py
@@ -13,6 +13,32 @@
 # absolute path ("/shared/graphify-out").
 _GRAPHIFY_OUT = os.environ.get("GRAPHIFY_OUT", "graphify-out")
 
+# Cache schema version — bump this whenever the PRODUCER (AST/semantic
+# extraction) output format or content changes in a way that makes existing
+# cache entries invalid. Entries are stamped with this version on write and
+# revalidated on read; any entry whose recorded version != the current value
+# (including legacy entries written before versioning, which have no version
+# field) is treated as a cache MISS and rebuilt.
+#
+# Why this matters for graph profiles (PR 7): raw extraction output is
+# PROFILE-INDEPENDENT. The simple-graph vs MultiDiGraph distinction is a
+# build-time assembly choice (`build_from_json(multigraph=...)`), not an
+# extraction-time choice — the same nodes/edges are extracted regardless of how
+# they are later assembled. So the raw cache is intentionally NOT keyed by graph
+# profile, and reusing it across profiles is correct and safe (it protects cache
+# hit rate). This version constant is the escape hatch: if a future producer
+# change ever makes cached output differ by profile (or otherwise incompatible),
+# bumping CACHE_SCHEMA_VERSION forces a clean rebuild for everyone, fulfilling
+# the design-doc clause "add profile/version invalidation where graph outputs
+# can differ".
+CACHE_SCHEMA_VERSION = 1
+
+# Reserved metadata key stamped into each cache entry's JSON. Chosen with
+# dunder bracketing so it cannot collide with extraction payload keys (which are
+# plain identifiers like "nodes", "edges", "hyperedges", "source_file"). It is
+# stripped back out on read so callers see only their original result dict.
+_SCHEMA_VERSION_KEY = "__cache_schema_version__"
+
 
 def _body_content(content: bytes) -> bytes:
     """Strip YAML frontmatter from Markdown content, returning only the body."""
@@ -145,6 +171,34 @@ def file_hash(path: Path, root: Path = Path(".")) -> str:
     return digest
 
 
+def _stamp_schema_version(result: dict) -> dict:
+    """Return a shallow copy of result with the current schema version stamped in.
+
+    Used on write so every cache entry records the schema version it was
+    produced under. A shallow copy avoids mutating the caller's dict.
+    """
+    stamped = dict(result)
+    stamped[_SCHEMA_VERSION_KEY] = CACHE_SCHEMA_VERSION
+    return stamped
+
+
+def _validate_schema_version(data: dict) -> dict | None:
+    """Validate a loaded cache entry's schema version, returning the payload.
+
+    Returns the result dict with the reserved version key stripped out (so
+    callers see only their original payload) if the recorded version matches the
+    current CACHE_SCHEMA_VERSION. Returns None — a cache MISS — when the version
+    is missing (legacy entries written before versioning) or mismatched (a stale
+    entry from an older/newer producer). Treating both as a miss triggers a safe
+    rebuild rather than silently reusing potentially-incompatible cached output.
+    """
+    if data.get(_SCHEMA_VERSION_KEY) != CACHE_SCHEMA_VERSION:
+        return None
+    payload = dict(data)
+    payload.pop(_SCHEMA_VERSION_KEY, None)
+    return payload
+
+
 def cache_dir(root: Path = Path("."), kind: str = "ast") -> Path:
     """Returns graphify-out/cache/{kind}/ - creates it if needed.
 
@@ -166,7 +220,10 @@ def load_cached(path: Path, root: Path = Path("."), kind: str = "ast") -> dict |
 
     For kind="ast", also checks the legacy flat cache/  directory so users
     upgrading from pre-0.5.3 don't lose their existing AST cache entries.
-    Returns None if no cache entry or file has changed.
+    Returns None if no cache entry, file has changed, or the entry's recorded
+    schema version does not match the current CACHE_SCHEMA_VERSION (including
+    pre-versioning entries that lack the field — these are treated as a miss and
+    rebuilt, the backward-compatible choice).
     """
     try:
         h = file_hash(path, root)
@@ -175,7 +232,7 @@ def load_cached(path: Path, root: Path = Path("."), kind: str = "ast") -> dict |
     entry = cache_dir(root, kind) / f"{h}.json"
     if entry.exists():
         try:
-            return json.loads(entry.read_text(encoding="utf-8"))
+            return _validate_schema_version(json.loads(entry.read_text(encoding="utf-8")))
         except (json.JSONDecodeError, OSError):
             return None
     # Migration fallback: check legacy flat cache/ dir for AST entries
@@ -183,7 +240,7 @@ def load_cached(path: Path, root: Path = Path("."), kind: str = "ast") -> dict |
         legacy = Path(root).resolve() / _GRAPHIFY_OUT / "cache" / f"{h}.json"
         if legacy.exists():
             try:
-                return json.loads(legacy.read_text(encoding="utf-8"))
+                return _validate_schema_version(json.loads(legacy.read_text(encoding="utf-8")))
             except (json.JSONDecodeError, OSError):
                 return None
     return None
@@ -193,7 +250,10 @@ def save_cached(path: Path, result: dict, root: Path = Path("."), kind: str = "a
     """Save extraction result for this file.
 
     Stores as graphify-out/cache/{kind}/{hash}.json where hash = SHA256 of current file contents.
-    result should be a dict with 'nodes' and 'edges' lists.
+    result should be a dict with 'nodes' and 'edges' lists. The current
+    CACHE_SCHEMA_VERSION is stamped into the stored JSON (under a reserved key)
+    so load_cached can invalidate stale entries after a producer change; the
+    caller's `result` dict is not mutated.
 
     No-ops if `path` is not a regular file. Subagent-produced semantic fragments
     occasionally carry a directory path in `source_file`; skipping them prevents
@@ -207,7 +267,7 @@ def save_cached(path: Path, result: dict, root: Path = Path("."), kind: str = "a
     entry = target_dir / f"{h}.json"
     fd, tmp_path = tempfile.mkstemp(dir=target_dir, prefix=f"{h}.", suffix=".tmp")
     try:
-        os.write(fd, json.dumps(result).encode())
+        os.write(fd, json.dumps(_stamp_schema_version(result)).encode())
         os.close(fd)
         try:
             os.replace(tmp_path, entry)
diff --git a/graphify/export.py b/graphify/export.py
index 5ba9ef833..302becc9d 100644
--- a/graphify/export.py
+++ b/graphify/export.py
@@ -18,6 +18,7 @@
 from graphify.analyze import _node_community_map
 from graphify.build import edge_data, edge_datas
 from graphify.edge_identity import make_stable_key
+from graphify.graph_loader import GRAPHIFY_PROFILE_KEY
 from graphify.projections import (
     DEFAULT_RELATIONSHIP_CAP,
     format_relationship_envelope,
@@ -506,6 +507,48 @@ def _git_head() -> str | None:
         return None
 
 
+def _graph_type_for_instance(G: nx.Graph) -> str:
+    """Return the graphify ``graph_type`` token for a live NetworkX instance.
+
+    The instance is authoritative: we classify from ``is_multigraph()`` /
+    ``is_directed()`` rather than from any stored profile, mirroring the
+    ``multigraph``/``directed`` flag logic in :func:`graphify.graph_loader.load_graph`.
+    The vocabulary is kept byte-identical to the loader's
+    :func:`~graphify.graph_loader._set_graph_profile` (``"simple"`` /
+    ``"digraph"`` / ``"multidigraph"``) so a save/load round-trip is stable.
+
+    graphify only ever produces directed multigraphs (``MultiDiGraph``), and the
+    loader normalizes any ``multigraph: true`` payload to ``MultiDiGraph``, so an
+    undirected ``MultiGraph`` instance is still labelled ``"multidigraph"`` for
+    consistency with what a reload would reconstruct.
+    """
+    if G.is_multigraph():
+        return "multidigraph"
+    if G.is_directed():
+        return "digraph"
+    return "simple"
+
+
+def _ensure_graph_profile(G: nx.Graph) -> None:
+    """Stamp ``G.graph[GRAPHIFY_PROFILE_KEY]`` so the profile persists in graph.json.
+
+    A freshly *built* graph (from :func:`graphify.build.build_from_json`) has no
+    ``graphify_profile`` — that key is only set on *load*. Without it the saved
+    JSON would not carry the simple-vs-multidigraph profile that downstream PR 7
+    cache-invalidation / watch profile-mismatch detection relies on.
+
+    Existing profile fields (e.g. from a loaded graph) are preserved, but
+    ``graph_type`` is always overwritten to match the actual instance — the
+    instance is authoritative, so a stale serialized ``graph_type`` can never
+    mislabel the graph we are about to write. This mirrors the overwrite in
+    :func:`graphify.graph_loader._set_graph_profile`.
+    """
+    existing = G.graph.get(GRAPHIFY_PROFILE_KEY)
+    profile = dict(existing) if isinstance(existing, dict) else {}
+    profile["graph_type"] = _graph_type_for_instance(G)
+    G.graph[GRAPHIFY_PROFILE_KEY] = profile
+
+
 def to_json(
     G: nx.Graph,
     communities: dict[int, list[str]],
@@ -539,11 +582,26 @@ def to_json(
                 file=sys.stderr,
             )
 
+    # Persist the graph profile so a later load can detect a simple-vs-
+    # multidigraph mismatch (PR 7 cache invalidation / watch). The profile is
+    # derived from the live instance and written onto G.graph, which
+    # node_link_data surfaces under the top-level "graph" key.
+    _ensure_graph_profile(G)
+
     node_community = _node_community_map(communities)
     try:
         data = json_graph.node_link_data(G, edges="links")
     except TypeError:
         data = json_graph.node_link_data(G)
+    # Defensively guarantee the profile is present under data["graph"] even if a
+    # NetworkX build did not surface G.graph (it normally does). The NetworkX
+    # "multigraph"/"directed" boolean flags are emitted by node_link_data itself.
+    graph_meta = data.get("graph")
+    if not isinstance(graph_meta, dict):
+        graph_meta = {}
+        data["graph"] = graph_meta
+    if GRAPHIFY_PROFILE_KEY not in graph_meta:
+        graph_meta[GRAPHIFY_PROFILE_KEY] = dict(G.graph[GRAPHIFY_PROFILE_KEY])
     for node in data["nodes"]:
         node["community"] = node_community.get(node["id"])
         node["norm_label"] = _strip_diacritics(node.get("label", "")).lower()
diff --git a/graphify/watch.py b/graphify/watch.py
index 1f32db828..70cf23bc8 100644
--- a/graphify/watch.py
+++ b/graphify/watch.py
@@ -208,7 +208,13 @@ def _report_root_label(watch_path: Path) -> str:
 
 
 def _relativize_source_files(payload: dict, root: Path) -> None:
-    for bucket in ("nodes", "edges", "hyperedges"):
+    # Include "links" alongside "edges": modern NetworkX serialises the edge list
+    # under "links", and FIX 3 (source_file-aware edge eviction) compares each
+    # edge's source_file against an eviction set built from repo-relative paths.
+    # Without relativising edge source_files here, an absolute-pathed preserved
+    # edge from an older graph would never match the relative evict_sources and
+    # would survive as stale. Nodes were already relativised; this aligns edges.
+    for bucket in ("nodes", "edges", "links", "hyperedges"):
         for item in payload.get(bucket, []):
             source = item.get("source_file")
             if not source:
@@ -244,6 +250,14 @@ def _node_community_map(graph_data: dict) -> dict[str, int]:
 def _canonical_graph_for_compare(graph_data: dict) -> dict:
     canonical = dict(graph_data)
     canonical.pop("built_at_commit", None)
+    # Graph-level metadata under the top-level "graph" key (graphify_profile, a
+    # duplicate hyperedges copy, NetworkX bookkeeping) is NOT graph structure.
+    # export.to_json now persists graphify_profile there (PR 7 Phase A); without
+    # this strip the on-disk graph ({"graph": {"graphify_profile": ...}}) would
+    # never compare equal to a candidate that lacks it, defeating the
+    # no-change short-circuit. Hyperedge topology is preserved via the
+    # authoritative top-level "hyperedges" key sorted below.
+    canonical.pop("graph", None)
     for key in ("nodes", "links", "edges", "hyperedges"):
         if key in canonical and isinstance(canonical[key], list):
             canonical[key] = sorted(
@@ -256,6 +270,15 @@ def _canonical_graph_for_compare(graph_data: dict) -> dict:
 def _canonical_topology_for_compare(graph_data: dict) -> dict:
     canonical = dict(graph_data)
     canonical.pop("built_at_commit", None)
+    # Graph-level metadata under the top-level "graph" key (graphify_profile, a
+    # duplicate hyperedges copy, NetworkX bookkeeping) is NOT topology. Phase A's
+    # export.to_json persists graphify_profile there; the on-disk graph then has
+    # {"graph": {"graphify_profile": ...}} while a fresh candidate from
+    # _topology_from_graph has {"graph": {}}, which would otherwise be read as a
+    # spurious topology change and needlessly re-run cluster(). Strip it just like
+    # built_at_commit. Hyperedge topology is still compared via the authoritative
+    # top-level "hyperedges" key normalised below.
+    canonical.pop("graph", None)
 
     nodes = canonical.get("nodes")
     if isinstance(nodes, list):
@@ -292,7 +315,24 @@ def _canonical_topology_for_compare(graph_data: dict) -> dict:
             if true_src is not None and true_tgt is not None:
                 e["source"] = true_src
                 e["target"] = true_tgt
+            # VOLATILE fields — derived/recomputed every rebuild, so they must NOT
+            # drive a "topology changed" verdict. confidence_score is recomputed
+            # from confidence on every export (see export._CONFIDENCE_SCORE_DEFAULTS).
             e.pop("confidence_score", None)
+            # IDENTITY fields are everything that survives: source, target,
+            # relation, confidence, source_file, source_location, weight, and —
+            # critically for MultiDiGraphs — `key`. NetworkX guarantees `key` is
+            # unique within a (source, target) pair, so two parallel edges that
+            # share the same relation/source_file/source_location but differ only
+            # in `key` MUST stay distinct in the sorted comparison; otherwise an
+            # unchanged multigraph with parallel edges would read as "changed"
+            # (or a real parallel-edge add/remove would be silently missed). The
+            # json.dumps sort key below already includes `key` because we never
+            # strip it — this assignment makes that contract explicit and guards
+            # against a future edit accidentally dropping it from the canonical
+            # edge. Simple graphs have no `key`, so this is a no-op for them.
+            if "key" in edge:
+                e["key"] = edge["key"]
             norm_edges.append(e)
         canonical[key] = sorted(
             norm_edges,
@@ -320,6 +360,37 @@ def _topology_from_graph(G) -> dict:
     return data
 
 
+def _existing_is_multigraph(graph_data: dict) -> bool:
+    """Return True when an on-disk ``graph.json`` payload is a MultiDiGraph.
+
+    Mirrors :func:`graphify.graph_loader.load_graph`'s detection so an
+    incremental ``_rebuild_code`` rebuilds with the SAME graph class the file
+    was saved as. Without inheriting this flag, the rebuild would re-emit a
+    simple ``DiGraph`` whose serialized ``graph.json`` declares ``multigraph``
+    unset — a deferred silent fallback: the preserved parallel-edge *records*
+    survive the write, but the next ``load_graph`` collapses them to one edge
+    per pair (the PR 7 go/no-go violation).
+
+    The top-level ``multigraph: true`` boolean is authoritative (it is what
+    ``load_graph`` keys on). A serialized ``graphify_profile.graph_type ==
+    "multidigraph"`` is accepted as a secondary signal for graphs written by a
+    profile-stamping writer that, for any reason, lacked the top-level flag —
+    belt-and-suspenders so a multigraph is never misread as simple.
+    """
+    if not isinstance(graph_data, dict):
+        return False
+    if graph_data.get("multigraph") is True:
+        return True
+    graph_meta = graph_data.get("graph")
+    if isinstance(graph_meta, dict):
+        from graphify.graph_loader import GRAPHIFY_PROFILE_KEY
+
+        profile = graph_meta.get(GRAPHIFY_PROFILE_KEY)
+        if isinstance(profile, dict) and profile.get("graph_type") == "multidigraph":
+            return True
+    return False
+
+
 def _check_shrink(
     force: bool,
     existing_data: dict,
@@ -571,10 +642,32 @@ def _rebuild_code(
                     and (not evict_sources or n.get("source_file") not in evict_sources)
                 ]
                 all_ids = new_ast_ids | {n["id"] for n in preserved_nodes}
+                # Preserve an existing edge only when BOTH endpoints survive AND
+                # the edge's OWN source_file was not changed/deleted. The
+                # endpoints-survive check alone is insufficient: an edge
+                # CONTRIBUTED by an evicted file (e.g. a `calls` edge recorded in
+                # the changed file) between two nodes that are DEFINED elsewhere
+                # (and therefore survive) would otherwise persist as a stale
+                # link the re-extraction will re-emit fresh — double-counting or,
+                # if the relationship genuinely went away, leaving a phantom edge.
+                #
+                # This is key-aware on a MultiDiGraph for free: each on-disk
+                # "links" record is exactly one keyed parallel edge, so the
+                # per-record source_file test evicts only the parallel edges
+                # belonging to the changed/deleted file between a given (u, v)
+                # pair and leaves parallel edges contributed by other files
+                # intact. (Mirrors the keyed prune in build.build_merge.)
+                #
+                # Simple-graph safety: a simple graph has one edge per pair, so a
+                # surviving edge whose source_file is NOT evicted is unaffected;
+                # the only behavioral change is the correct removal of a stale
+                # cross-file edge the old endpoints-only check wrongly kept.
                 preserved_edges = [
                     e
                     for e in existing.get("links", existing.get("edges", []))
-                    if e.get("source") in all_ids and e.get("target") in all_ids
+                    if e.get("source") in all_ids
+                    and e.get("target") in all_ids
+                    and (not evict_sources or e.get("source_file") not in evict_sources)
                 ]
                 result = {
                     "nodes": result["nodes"] + preserved_nodes,
@@ -599,6 +692,24 @@ def _rebuild_code(
                 **{k: v for k, v in result.items() if k != "edges"},
                 "links": result.get("edges", []),
             }
+            # The no-cluster path writes raw merged extraction JSON directly (it
+            # never goes through build_from_json/to_json), so it would otherwise
+            # emit a graph.json with no multigraph flag — the same deferred
+            # collapse as the clustered path. When the existing graph was a
+            # MultiDiGraph, stamp the multigraph/directed flags + a multidigraph
+            # graphify_profile so the rewritten file reloads as a MultiDiGraph
+            # (preserved parallel-edge records keep their `key`; new AST edges get
+            # a generated key on load). Simple graphs are left untouched.
+            if _existing_is_multigraph(existing_graph_data):
+                from graphify.graph_loader import GRAPHIFY_PROFILE_KEY
+
+                candidate_graph_data["multigraph"] = True
+                candidate_graph_data["directed"] = True
+                graph_meta = candidate_graph_data.get("graph")
+                if not isinstance(graph_meta, dict):
+                    graph_meta = {}
+                graph_meta[GRAPHIFY_PROFILE_KEY] = {"graph_type": "multidigraph"}
+                candidate_graph_data["graph"] = graph_meta
             candidate_graph_text = _json_text(candidate_graph_data)
             same_graph = False
             if existing_graph.exists():
@@ -661,7 +772,16 @@ def _rebuild_code(
             "total_words": detected.get("total_words", 0),
         }
 
-        G = build_from_json(result)
+        # Inherit the existing graph's multigraph class so an incremental rebuild
+        # of a MultiDiGraph graph.json stays a MultiDiGraph. build_from_json with
+        # multigraph=True keeps each preserved edge record's `key`, so parallel
+        # edges survive the rebuild AND the to_json below stamps multigraph=true +
+        # graphify_profile (Phase A) so the file reloads as a MultiDiGraph rather
+        # than silently collapsing to one edge per pair on the next load_graph.
+        # When the existing graph is simple (or absent) this is False — build a
+        # simple graph exactly as before, no behavior change.
+        saved_is_multigraph = _existing_is_multigraph(existing_graph_data)
+        G = build_from_json(result, multigraph=saved_is_multigraph)
         candidate_topology = _topology_from_graph(G)
         if existing_graph_data:
             try:
diff --git a/tests/test_build.py b/tests/test_build.py
index 52bde5dec..b43029afa 100644
--- a/tests/test_build.py
+++ b/tests/test_build.py
@@ -1086,27 +1086,289 @@ def test_build_skips_malformed_edges_without_crashing(capsys):
     assert "Edge 0 must be an object" in captured.err
 
 
-def test_build_merge_rejects_multigraph_graph_json(tmp_path):
-    """build_merge must refuse a multigraph input rather than silently collapse parallel edges."""
-    import json as _json
+def _write_multigraph_graph_json(graph_path: Path, extraction: dict) -> dict:
+    """Build a MultiDiGraph from *extraction*, persist it via to_json, return the JSON.
+
+    Produces a realistic on-disk multigraph graph.json (multigraph=true, keyed
+    parallel edges) exactly as graphify writes it, so build_merge tests exercise
+    the real load -> merge -> prune round-trip rather than a hand-rolled dict.
+    """
+    from graphify.export import to_json
+
+    G = build_from_json(extraction, multigraph=True)
+    assert type(G) is nx.MultiDiGraph
+    assert to_json(G, {}, str(graph_path), force=True)
+    data = json.loads(graph_path.read_text())
+    assert data["multigraph"] is True
+    return data
+
+
+def _three_parallel_edges_one_pair() -> dict:
+    """A→B carrying three parallel edges, each from a distinct source_file."""
+    return {
+        "nodes": [
+            {"id": "A", "label": "A", "file_type": "code", "source_file": "file1.py"},
+            {"id": "B", "label": "B", "file_type": "code", "source_file": "file2.py"},
+        ],
+        "edges": [
+            {
+                "source": "A",
+                "target": "B",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "file1.py",
+                "source_location": "L1",
+            },
+            {
+                "source": "A",
+                "target": "B",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "file2.py",
+                "source_location": "L2",
+            },
+            {
+                "source": "A",
+                "target": "B",
+                "relation": "imports",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "file3.py",
+                "source_location": "L3",
+            },
+        ],
+    }
 
+
+def test_build_merge_multigraph_unchanged_file_preserves_parallel_edges(tmp_path):
+    """PR 7 gate: merging a new chunk that does not touch A/B's files must
+    preserve every keyed parallel edge on the existing A→B pair (no silent
+    collapse to a single edge)."""
     graph_path = tmp_path / "graph.json"
-    graph_path.write_text(
-        _json.dumps(
+    _write_multigraph_graph_json(graph_path, _three_parallel_edges_one_pair())
+
+    # New chunk touches only unrelated files (other.py); A/B's files untouched.
+    new_chunk = {
+        "nodes": [
+            {"id": "C", "label": "C", "file_type": "code", "source_file": "other.py"},
+            {"id": "D", "label": "D", "file_type": "code", "source_file": "other.py"},
+        ],
+        "edges": [
             {
-                "directed": True,
-                "multigraph": True,
-                "nodes": [{"id": "a"}, {"id": "b"}],
-                "links": [
-                    {"source": "a", "target": "b", "key": "k1", "relation": "calls"},
-                    {"source": "a", "target": "b", "key": "k2", "relation": "imports"},
-                ],
+                "source": "C",
+                "target": "D",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "other.py",
+                "source_location": "L9",
             }
-        )
+        ],
+    }
+
+    G = build_merge([new_chunk], graph_path=graph_path, dedup=False)
+
+    assert type(G) is nx.MultiDiGraph
+    assert G.number_of_edges("A", "B") == 3, "all 3 parallel edges must survive the merge"
+    assert sorted(d.get("source_file") for d in G["A"]["B"].values()) == [
+        "file1.py",
+        "file2.py",
+        "file3.py",
+    ]
+    assert G.number_of_edges("C", "D") == 1, "new chunk edge must be added"
+
+
+def test_build_merge_multigraph_changed_file_evicts_only_its_parallel_edges(tmp_path):
+    """Critical source_file+key intersection: a single A→B pair carries parallel
+    edges from file1.py AND file2.py; build_merge with file1.py in prune_set must
+    remove ONLY file1.py's A→B edge and leave file2.py's A→B edge between the SAME
+    pair intact. This is the core guarantee that key-aware pruning never collapses
+    or over-deletes parallel edges that share an endpoint pair.
+
+    Endpoint nodes deliberately live in a neutral file (defs.py) so that pruning
+    file1.py prunes the EDGE record by its source_file without removing the
+    endpoint nodes — isolating the source_file+key edge-prune behavior. (In the
+    real incremental flow, deleted files populate prune_sources while changed
+    files are re-extracted as fresh chunks; prune runs after the merge.)"""
+    extraction = {
+        "nodes": [
+            {"id": "A", "label": "A", "file_type": "code", "source_file": "defs.py"},
+            {"id": "B", "label": "B", "file_type": "code", "source_file": "defs.py"},
+        ],
+        "edges": [
+            {
+                "source": "A",
+                "target": "B",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "file1.py",
+                "source_location": "L1",
+            },
+            {
+                "source": "A",
+                "target": "B",
+                "relation": "imports",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "file2.py",
+                "source_location": "L2",
+            },
+        ],
+    }
+    graph_path = tmp_path / "graph.json"
+    _write_multigraph_graph_json(graph_path, extraction)
+
+    G = build_merge([], graph_path=graph_path, prune_sources=["file1.py"], dedup=False)
+
+    assert type(G) is nx.MultiDiGraph
+    # Both endpoint nodes survive (they live in defs.py, not pruned).
+    assert G.has_node("A") and G.has_node("B")
+    remaining = sorted(
+        (d.get("source_file"), d.get("source_location")) for d in G["A"]["B"].values()
+    )
+    # file2.py's parallel edge between A→B survives; file1.py's is evicted.
+    assert remaining == [("file2.py", "L2")], (
+        f"only file1.py's A→B edge must be pruned, file2.py's must survive; got {remaining}"
     )
+    assert G.number_of_edges("A", "B") == 1
 
-    with pytest.raises(NotImplementedError, match="multigraph"):
-        build_merge([], graph_path=graph_path)
+
+def test_build_merge_multigraph_deleted_file_removes_all_its_edge_records(tmp_path):
+    """Deleting a file must remove ALL edge records (including parallel ones)
+    carrying that source_file across every pair, while edges from other files
+    survive — even when they share an endpoint pair."""
+    extraction = {
+        "nodes": [
+            {"id": "A", "label": "A", "file_type": "code", "source_file": "keep.py"},
+            {"id": "B", "label": "B", "file_type": "code", "source_file": "keep.py"},
+            {"id": "C", "label": "C", "file_type": "code", "source_file": "gone.py"},
+        ],
+        "edges": [
+            # Two parallel A→B edges from the deleted file plus one from a kept file.
+            {
+                "source": "A",
+                "target": "B",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "gone.py",
+                "source_location": "L1",
+            },
+            {
+                "source": "A",
+                "target": "B",
+                "relation": "imports",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "gone.py",
+                "source_location": "L2",
+            },
+            {
+                "source": "A",
+                "target": "B",
+                "relation": "references",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "keep.py",
+                "source_location": "L3",
+            },
+            # An edge on a different pair, also from the deleted file.
+            {
+                "source": "A",
+                "target": "C",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "gone.py",
+                "source_location": "L4",
+            },
+        ],
+    }
+    graph_path = tmp_path / "graph.json"
+    _write_multigraph_graph_json(graph_path, extraction)
+
+    G = build_merge([], graph_path=graph_path, prune_sources=["gone.py"], dedup=False)
+
+    assert type(G) is nx.MultiDiGraph
+    # All gone.py edge records removed across all pairs.
+    assert all(
+        d.get("source_file") != "gone.py" for _u, _v, d in G.edges(data=True)
+    ), "no edge record from the deleted file may survive"
+    # The keep.py A→B parallel edge survives even though gone.py shared that pair.
+    assert G.number_of_edges("A", "B") == 1
+    assert next(iter(G["A"]["B"].values())).get("source_file") == "keep.py"
+    # Node C had source_file gone.py → pruned, so the A→C pair is gone entirely.
+    assert not G.has_node("C")
+
+
+def test_build_merge_multigraph_output_stays_multigraph(tmp_path):
+    """After merge, the written graph.json must still be multigraph=true and
+    reload as a MultiDiGraph — no silent fallback to a simple graph."""
+    from graphify.export import to_json
+    from graphify.graph_loader import load_graph_file
+
+    graph_path = tmp_path / "graph.json"
+    _write_multigraph_graph_json(graph_path, _three_parallel_edges_one_pair())
+
+    G = build_merge([], graph_path=graph_path, dedup=False)
+    assert type(G) is nx.MultiDiGraph
+    assert G.is_multigraph() and G.is_directed()
+
+    # Write back and confirm the multigraph flag round-trips on reload.
+    out_path = tmp_path / "graph_out.json"
+    assert to_json(G, {}, str(out_path), force=True)
+    data = json.loads(out_path.read_text())
+    assert data["multigraph"] is True
+    assert data["directed"] is True
+    reloaded = load_graph_file(out_path)
+    assert type(reloaded) is nx.MultiDiGraph
+    assert reloaded.number_of_edges("A", "B") == 3
+
+
+def test_build_merge_simple_graph_unchanged_regression(tmp_path):
+    """Removing the multigraph rejection must not change simple-graph behavior:
+    a simple/digraph build_merge output is identical to pre-PR7 behavior."""
+    import networkx as nx_local
+
+    # Build and persist a plain directed simple graph (multigraph absent/false).
+    chunk = {
+        "nodes": [
+            {"id": "a", "label": "A", "file_type": "code", "source_file": "a.py"},
+            {"id": "b", "label": "B", "file_type": "code", "source_file": "b.py"},
+        ],
+        "edges": [
+            {
+                "source": "a",
+                "target": "b",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "a.py",
+                "weight": 1.0,
+            }
+        ],
+    }
+    G0 = build([chunk], dedup=False)
+    graph_path = tmp_path / "graph.json"
+    graph_path.write_text(
+        json.dumps(nx_local.node_link_data(G0, edges="edges")), encoding="utf-8"
+    )
+
+    # No new chunks, default args → must inherit simple (non-multigraph) type and
+    # remain a plain undirected Graph here (saved directed flag is false).
+    G = build_merge([], graph_path=graph_path, dedup=False)
+    assert not G.is_multigraph(), "simple-graph build_merge must not upgrade to multigraph"
+    assert type(G) is nx.Graph
+    assert G.number_of_nodes() == 2
+    assert G.number_of_edges() == 1
+    assert G.has_edge("a", "b")
+
+    # Pruning a deleted file on a simple graph still removes the matching edge.
+    G2 = build_merge([], graph_path=graph_path, prune_sources=["a.py"], dedup=False)
+    assert not G2.is_multigraph()
+    assert G2.number_of_edges() == 0, "simple-graph prune path unchanged"
 
 
 def test_build_merge_inherits_directed_from_saved_graph_json(tmp_path):
diff --git a/tests/test_cache.py b/tests/test_cache.py
index 1adb1b5d5..1a6d5635a 100644
--- a/tests/test_cache.py
+++ b/tests/test_cache.py
@@ -1,7 +1,11 @@
 """Tests for graphify/cache.py."""
 
+import json
+
 import pytest
 from graphify.cache import (
+    CACHE_SCHEMA_VERSION,
+    _SCHEMA_VERSION_KEY,
     file_hash,
     load_cached,
     save_cached,
@@ -11,6 +15,12 @@
 )
 
 
+def _ast_entry_path(cache_root, src_file):
+    """Path to the on-disk AST cache entry JSON for a source file."""
+    h = file_hash(src_file, cache_root)
+    return cache_root / "graphify-out" / "cache" / "ast" / f"{h}.json"
+
+
 @pytest.fixture
 def tmp_file(tmp_path):
     f = tmp_path / "sample.txt"
@@ -133,3 +143,104 @@ def test_body_content_no_frontmatter():
     """_body_content returns content unchanged when no frontmatter present."""
     content = b"No frontmatter here."
     assert _body_content(content) == content
+
+
+# --- Cache schema versioning (PR 7: profile/version invalidation) ---
+
+
+def test_cache_schema_version_recorded(tmp_file, cache_root):
+    """A cache write stamps the current CACHE_SCHEMA_VERSION into the stored JSON."""
+    result = {"nodes": [{"id": "n1"}], "edges": []}
+    save_cached(tmp_file, result, root=cache_root)
+
+    entry = _ast_entry_path(cache_root, tmp_file)
+    assert entry.exists()
+    raw = json.loads(entry.read_text(encoding="utf-8"))
+    assert raw[_SCHEMA_VERSION_KEY] == CACHE_SCHEMA_VERSION
+    # The reserved version key must not leak into the payload callers consume.
+    loaded = load_cached(tmp_file, root=cache_root)
+    assert loaded is not None
+    assert loaded == result
+    assert _SCHEMA_VERSION_KEY not in loaded
+
+
+def test_cache_invalidates_on_schema_version_change(tmp_file, cache_root):
+    """An entry written under a mismatched/old version is a miss (rebuilt), not reused.
+
+    Covers both the explicit-but-stale version and the legacy pre-versioning
+    entry that has no version field at all — both must invalidate (return None)
+    so the producer rebuilds rather than silently trusting stale cached output.
+    """
+    result = {"nodes": [{"id": "stale"}], "edges": []}
+    save_cached(tmp_file, result, root=cache_root)
+    entry = _ast_entry_path(cache_root, tmp_file)
+
+    # 1. Stale explicit version (simulate a future producer bump).
+    raw = json.loads(entry.read_text(encoding="utf-8"))
+    raw[_SCHEMA_VERSION_KEY] = CACHE_SCHEMA_VERSION + 1
+    entry.write_text(json.dumps(raw), encoding="utf-8")
+    assert load_cached(tmp_file, root=cache_root) is None
+
+    # 2. Legacy entry with no version field (backward compatibility).
+    legacy = {"nodes": [{"id": "stale"}], "edges": []}
+    entry.write_text(json.dumps(legacy), encoding="utf-8")
+    assert load_cached(tmp_file, root=cache_root) is None
+
+
+def test_cache_hit_when_version_matches(tmp_file, cache_root):
+    """A matching schema version produces a cache hit (no needless invalidation)."""
+    result = {"nodes": [{"id": "n1"}], "edges": [{"source": "a", "target": "b"}]}
+    save_cached(tmp_file, result, root=cache_root)
+    loaded = load_cached(tmp_file, root=cache_root)
+    assert loaded == result  # protects hit rate when nothing changed
+
+
+def test_cache_reused_across_graph_profiles(tmp_path, cache_root):
+    """Raw extraction cache is profile-independent and reused across build profiles.
+
+    Extraction produces nodes + edge records keyed only by file hash; the
+    simple-graph vs MultiDiGraph distinction is a build-time assembly choice
+    (build_from_json(multigraph=...)), not an extraction-time one. The same
+    cached extraction must serve both a simple build and a multigraph build —
+    proving we did NOT needlessly profile-key the raw cache.
+    """
+    src = tmp_path / "module.py"
+    src.write_text("def f():\n    pass\n")
+
+    extraction = {
+        "nodes": [{"id": "module.f", "type": "function"}],
+        "edges": [
+            {"source": "module.f", "target": "module.g", "type": "calls"},
+            {"source": "module.f", "target": "module.g", "type": "imports"},
+        ],
+    }
+    save_cached(src, extraction, root=cache_root)
+
+    # Simulate two separate build runs (simple, then multigraph). Neither passes
+    # any profile to the cache layer; both must read back the identical entry.
+    loaded_for_simple = load_cached(src, root=cache_root)
+    loaded_for_multigraph = load_cached(src, root=cache_root)
+    assert loaded_for_simple == extraction
+    assert loaded_for_multigraph == extraction
+    assert loaded_for_simple == loaded_for_multigraph
+
+    # Only one cache entry exists — the cache was not split per profile.
+    ast_dir = cache_root / "graphify-out" / "cache" / "ast"
+    assert len(list(ast_dir.glob("*.json"))) == 1
+
+
+def test_cache_existing_behavior_regression(tmp_file, cache_root):
+    """Existing round-trip and hashing behavior is unchanged by versioning."""
+    # Round-trip equality (the original test_cache_roundtrip contract).
+    result = {"nodes": [{"id": "n1", "label": "Node1"}], "edges": []}
+    save_cached(tmp_file, result, root=cache_root)
+    assert load_cached(tmp_file, root=cache_root) == result
+
+    # Content change still invalidates.
+    tmp_file.write_text("completely different content")
+    assert load_cached(tmp_file, root=cache_root) is None
+
+    # Hashes remain stable across calls and unaffected by the version stamp.
+    h1 = file_hash(tmp_file, cache_root)
+    h2 = file_hash(tmp_file, cache_root)
+    assert h1 == h2 and len(h1) == 64
diff --git a/tests/test_export.py b/tests/test_export.py
index 78a1a36c3..7fe740729 100644
--- a/tests/test_export.py
+++ b/tests/test_export.py
@@ -1,9 +1,14 @@
 import json
 import tempfile
 from pathlib import Path
+
+import networkx as nx
+from networkx.readwrite import json_graph
+
 from graphify.build import build_from_json
 from graphify.cluster import cluster
 from graphify.export import to_json, to_cypher, to_graphml, to_html, to_canvas
+from graphify.graph_loader import GRAPHIFY_PROFILE_KEY, load_graph
 
 FIXTURES = Path(__file__).parent / "fixtures"
 
@@ -291,3 +296,111 @@ def test_backup_env_disable(tmp_path, monkeypatch):
     (tmp_path / "graph.json").write_text('{"nodes":[],"links":[]}')
     (tmp_path / ".graphify_semantic_marker").write_text("{}")
     assert backup_if_protected(tmp_path) is None
+
+
+# ── PR 7: graph profile persistence in graph.json ────────────────────────────
+#
+# to_json must stamp G.graph[GRAPHIFY_PROFILE_KEY] with a graph_type derived
+# from the live NetworkX instance so a later load can detect a
+# simple-vs-multidigraph mismatch (cache invalidation / watch). The graph_type
+# vocabulary ("simple"/"digraph"/"multidigraph") is shared with graph_loader.
+
+
+def _build_extraction():
+    return json.loads((FIXTURES / "extraction.json").read_text())
+
+
+def test_to_json_writes_multidigraph_profile():
+    """A MultiDiGraph export records graph_type=multidigraph and the NetworkX
+    multigraph flag in the saved JSON."""
+    G = build_from_json(_build_extraction(), multigraph=True)
+    assert isinstance(G, nx.MultiDiGraph)
+    communities = {0: list(G.nodes)}
+    with tempfile.TemporaryDirectory() as tmp:
+        out = Path(tmp) / "graph.json"
+        to_json(G, communities, str(out), force=True)
+        data = json.loads(out.read_text())
+    assert data["graph"][GRAPHIFY_PROFILE_KEY]["graph_type"] == "multidigraph"
+    assert data["multigraph"] is True
+    assert data["directed"] is True
+
+
+def test_to_json_writes_digraph_profile():
+    """A directed simple graph records graph_type=digraph with directed=True,
+    multigraph=False."""
+    G = build_from_json(_build_extraction(), directed=True)
+    assert isinstance(G, nx.DiGraph) and not G.is_multigraph()
+    communities = {0: list(G.nodes)}
+    with tempfile.TemporaryDirectory() as tmp:
+        out = Path(tmp) / "graph.json"
+        to_json(G, communities, str(out), force=True)
+        data = json.loads(out.read_text())
+    assert data["graph"][GRAPHIFY_PROFILE_KEY]["graph_type"] == "digraph"
+    assert data["directed"] is True
+    assert data["multigraph"] is False
+
+
+def test_to_json_writes_simple_profile():
+    """An undirected nx.Graph records graph_type=simple."""
+    G = build_from_json(_build_extraction())
+    assert type(G) is nx.Graph
+    communities = cluster(G)
+    with tempfile.TemporaryDirectory() as tmp:
+        out = Path(tmp) / "graph.json"
+        to_json(G, communities, str(out))
+        data = json.loads(out.read_text())
+    assert data["graph"][GRAPHIFY_PROFILE_KEY]["graph_type"] == "simple"
+    assert data["directed"] is False
+    assert data["multigraph"] is False
+
+
+def test_to_json_profile_roundtrips_through_loader():
+    """to_json -> load_graph reconstructs the same graph_type for every type,
+    proving the profile survives a save/load cycle."""
+    cases = [
+        (build_from_json(_build_extraction()), "simple", nx.Graph),
+        (build_from_json(_build_extraction(), directed=True), "digraph", nx.DiGraph),
+        (build_from_json(_build_extraction(), multigraph=True), "multidigraph", nx.MultiDiGraph),
+    ]
+    for G, expected_type, expected_cls in cases:
+        communities = {0: list(G.nodes)}
+        with tempfile.TemporaryDirectory() as tmp:
+            out = Path(tmp) / "graph.json"
+            to_json(G, communities, str(out), force=True)
+            data = json.loads(out.read_text())
+            reloaded = load_graph(data, require_capabilities=False)
+        assert isinstance(reloaded, expected_cls)
+        assert reloaded.graph[GRAPHIFY_PROFILE_KEY]["graph_type"] == expected_type
+        # node_link_graph (the lower-level loader) also sees G.graph metadata.
+        nlg = json_graph.node_link_graph(data, edges="links")
+        assert nlg.graph[GRAPHIFY_PROFILE_KEY]["graph_type"] == expected_type
+
+
+def test_to_json_simple_graph_regression():
+    """Simple-graph output is unchanged except for the added graphify_profile.
+
+    The "graph" metadata object gains exactly one key (graphify_profile); it was
+    empty ({}) before. Stripping that key leaves the pre-PR7 empty object, and
+    every other structural key (nodes/links/directed/multigraph/hyperedges) is
+    unaffected.
+    """
+    G = build_from_json(_build_extraction())
+    communities = cluster(G)
+    with tempfile.TemporaryDirectory() as tmp:
+        out = Path(tmp) / "graph.json"
+        to_json(G, communities, str(out))
+        data = json.loads(out.read_text())
+
+    # The only added graph-metadata content is the profile.
+    assert data["graph"] == {GRAPHIFY_PROFILE_KEY: {"graph_type": "simple"}}
+    # Removing the profile yields the pre-change empty "graph" object — nothing
+    # else leaked into the graph-level metadata.
+    data["graph"].pop(GRAPHIFY_PROFILE_KEY)
+    assert data["graph"] == {}
+    # Core structural keys remain present and well-formed.
+    assert isinstance(data["nodes"], list) and data["nodes"]
+    assert isinstance(data["links"], list)
+    assert data["directed"] is False
+    assert data["multigraph"] is False
+    for node in data["nodes"]:
+        assert "id" in node and "community" in node
diff --git a/tests/test_incremental.py b/tests/test_incremental.py
index 6858c7520..a602cf3fd 100644
--- a/tests/test_incremental.py
+++ b/tests/test_incremental.py
@@ -80,3 +80,172 @@ def test_no_incremental_without_manifest(tmp_path):
     # which pytest derives from the test name and contains "incremental".
     assert "incremental update" not in r.stdout.lower()
     assert "incremental scan" not in r.stdout.lower()
+
+
+# ── PR 7: `graphify update` preserves the multidigraph profile (no silent fallback) ──
+#
+# watch._rebuild_code inherits the saved graph.json profile: it reads the on-disk
+# `multigraph` flag and rebuilds via build_from_json(multigraph=...), re-stamping
+# multigraph/directed + graphify_profile on write. So `graphify update` on a
+# multidigraph round-trips it as a MultiDiGraph with keyed parallel edges intact —
+# never silently collapsed to a simple graph. These tests prove that end-to-end by
+# actually running `update` as a subprocess and reloading the rewritten graph.json.
+
+
+def _make_code_corpus(tmp_path: Path) -> Path:
+    """A tiny real code corpus so `graphify update` has AST-extractable files.
+
+    Includes ``extra()`` so a rebuild ADDS AST nodes the seeded multidigraph
+    graph.json lacks (file node + login()/helper()/extra()). That guarantees a
+    real topology change, so `update` hits the graph.json REWRITE path rather
+    than the no-change early return — the rewrite is what must preserve the
+    multigraph profile, so the test would be vacuous without forcing it.
+    """
+    corpus = tmp_path / "corpus"
+    corpus.mkdir()
+    (corpus / "auth.py").write_text(
+        "def login():\n    return helper()\n\n\ndef helper():\n    return 1\n\n\n"
+        "def extra():\n    return login()\n",
+        encoding="utf-8",
+    )
+    return corpus
+
+
+def _write_multidigraph_graph_json(corpus: Path) -> Path:
+    """Seed corpus/graphify-out/graph.json as a multidigraph with parallel edges.
+
+    Built and serialized exactly the way Phase A persists it (build_from_json
+    multigraph=True -> export.to_json), so the saved file carries the top-level
+    ``multigraph: true`` flag and ``graphify_profile.graph_type == multidigraph``.
+    The two parallel ``login -> helper`` edges (ids absent from AST output) are
+    preserved by `_rebuild_code` across the rebuild, proving parallels survive.
+    """
+    import networkx as nx
+    from graphify.build import build_from_json
+    from graphify.export import to_json
+
+    nodes = [
+        {
+            "id": n,
+            "label": n,
+            "file_type": "code",
+            "source_file": "auth.py",
+            "source_location": "L1",
+        }
+        for n in ("login", "helper")
+    ]
+    # Two parallel edges between the same (login -> helper) pair.
+    edges = [
+        {
+            "source": "login",
+            "target": "helper",
+            "relation": rel,
+            "confidence": "EXTRACTED",
+            "source_file": "auth.py",
+            "source_location": f"L{i}",
+        }
+        for i, rel in enumerate(["calls", "imports"])
+    ]
+    G = build_from_json({"nodes": nodes, "edges": edges}, multigraph=True)
+    assert isinstance(G, nx.MultiDiGraph)
+    assert G.number_of_edges() == 2
+    out = corpus / "graphify-out"
+    out.mkdir(exist_ok=True)
+    graph_json = out / "graph.json"
+    to_json(G, {0: ["login", "helper"]}, str(graph_json), force=True)
+    # Persist the scan root so `graphify update` (no path arg) can recover it.
+    (out / ".graphify_root").write_text(str(corpus), encoding="utf-8")
+    return graph_json
+
+
+def _parallel_login_helper_edges(graph_data: dict) -> list[dict]:
+    """Return the parallel ``login -> helper`` edge records from a graph.json dict."""
+    links = graph_data.get("links", graph_data.get("edges", []))
+    return [e for e in links if e.get("source") == "login" and e.get("target") == "helper"]
+
+
+def test_update_preserves_multigraph_profile(tmp_path):
+    """`graphify update` on a multidigraph graph.json preserves the profile and
+    its parallel edges end-to-end: the rewritten file stays multigraph=true /
+    graph_type=multidigraph and reloads via load_graph as a MultiDiGraph with the
+    parallel edges intact."""
+    from graphify.graph_loader import load_graph
+
+    corpus = _make_code_corpus(tmp_path)
+    graph_json = _write_multidigraph_graph_json(corpus)
+
+    before = json.loads(graph_json.read_text(encoding="utf-8"))
+    assert before.get("multigraph") is True
+    assert len(_parallel_login_helper_edges(before)) == 2  # both parallel edges present
+
+    r = _run(["update", str(corpus)], tmp_path)
+    assert r.returncode == 0, f"update on multidigraph should succeed, got: {r.stderr}"
+    assert "multidigraph" not in r.stderr  # no refusal message
+
+    after = json.loads(graph_json.read_text(encoding="utf-8"))
+    # Profile preserved (no silent collapse to simple).
+    assert after.get("multigraph") is True, "multigraph flag must be preserved"
+    assert after.get("graph", {}).get("graphify_profile", {}).get("graph_type") == "multidigraph"
+    # Prove the REWRITE path ran (rebuild added AST nodes the seed lacked), not a
+    # no-change early return that would trivially leave the seed file untouched.
+    assert any(n.get("label") == "extra()" for n in after.get("nodes", [])), (
+        "rebuild should have added AST nodes — rewrite path must have executed"
+    )
+    # Parallel edges survive the rewrite.
+    par = _parallel_login_helper_edges(after)
+    assert len(par) == 2, "keyed parallel edges must be preserved across update"
+    assert sorted(e["relation"] for e in par) == ["calls", "imports"]
+    # Reloads as a MultiDiGraph with the parallels intact.
+    G2 = load_graph(after)
+    assert G2.is_multigraph(), "rewritten graph.json must reload as a MultiDiGraph"
+    assert G2.number_of_edges("login", "helper") == 2
+
+
+def test_update_simple_graph_unchanged_regression(tmp_path):
+    """A simple graph.json updated in simple mode behaves exactly as before:
+    `graphify update` succeeds and the graph stays a simple graph."""
+    corpus = _make_code_corpus(tmp_path)
+
+    # First run on a fresh corpus builds the simple graph via the normal path.
+    r1 = _run(["update", str(corpus)], tmp_path)
+    assert r1.returncode == 0, f"initial simple update failed: {r1.stderr}"
+    graph_json = corpus / "graphify-out" / "graph.json"
+    assert graph_json.exists()
+    data1 = json.loads(graph_json.read_text(encoding="utf-8"))
+    assert data1.get("multigraph") is False
+    assert data1.get("graph", {}).get("graphify_profile", {}).get("graph_type") == "simple"
+    assert any(n.get("label") == "login()" for n in data1.get("nodes", []))
+
+    # Re-running update on the now-simple graph must still succeed (no refusal,
+    # no profile change) — the pre-PR7 behavior is preserved.
+    r2 = _run(["update", str(corpus)], tmp_path)
+    assert r2.returncode == 0, f"re-run simple update failed: {r2.stderr}"
+    assert "multidigraph" not in r2.stderr
+    data2 = json.loads(graph_json.read_text(encoding="utf-8"))
+    assert data2.get("multigraph") is False
+    assert data2.get("graph", {}).get("graphify_profile", {}).get("graph_type") == "simple"
+
+
+def test_update_profile_mismatch_no_silent_fallback(tmp_path):
+    """Go/no-go gate: `graphify update` on a multidigraph must NOT silently fall
+    back to simple-graph behavior. The gate is satisfied by PRESERVATION — the
+    result is still a multidigraph with parallel edges, never a collapsed simple
+    graph (and never a spurious refusal now that the pipeline preserves)."""
+    corpus = _make_code_corpus(tmp_path)
+    graph_json = _write_multidigraph_graph_json(corpus)
+
+    r = _run(["update", str(corpus)], tmp_path)
+    after = json.loads(graph_json.read_text(encoding="utf-8"))
+
+    # The invariant: never a silent simple-graph result.
+    assert after.get("multigraph") is True, (
+        "no silent fallback: a multidigraph update must remain a multidigraph, "
+        f"got multigraph={after.get('multigraph')!r}"
+    )
+    assert after.get("graph", {}).get("graphify_profile", {}).get("graph_type") == "multidigraph"
+    # Parallel edges are not collapsed away.
+    assert len(_parallel_login_helper_edges(after)) == 2, (
+        "parallel edges must survive — collapsing to one edge is a silent fallback"
+    )
+    # Preservation, not refusal: the command succeeds normally.
+    assert r.returncode == 0, f"update should preserve (succeed), not refuse: {r.stderr}"
diff --git a/tests/test_watch.py b/tests/test_watch.py
index 7de02888f..a25179bf3 100644
--- a/tests/test_watch.py
+++ b/tests/test_watch.py
@@ -739,3 +739,380 @@ def test_merge_changed_paths_dedupes_in_order():
         [Path("a.py")],
     )
     assert [p.as_posix() for p in merged] == ["a.py", "b.py", "c.py"]
+
+
+# --- PR 7: MultiDiGraph keyed parallel-edge eviction + canonical comparison ----
+#
+# These exercise the incremental-update path of _rebuild_code against an on-disk
+# MultiDiGraph graph.json. _rebuild_code's eviction logic (preserved_edges)
+# operates on the raw on-disk "links" records BEFORE any graph build, and each
+# parallel edge is one record carrying its own `key` + `source_file`, so the
+# logic is naturally key-aware. The go/no-go gate: "Incremental update preserves
+# and evicts keyed parallel edges intentionally, with no silent fallback to
+# simple graph behavior."
+
+
+def _git_init(path: Path) -> None:
+    """Initialise a throwaway git repo so detect() treats `path` as a real corpus."""
+    subprocess.run(["git", "init", "-q", str(path)], check=True)
+    subprocess.run(
+        ["git", "-C", str(path), "config", "user.email", "test@example.com"], check=True
+    )
+    subprocess.run(["git", "-C", str(path), "config", "user.name", "Test"], check=True)
+
+
+def _build_multigraph_repo(tmp_path: Path) -> tuple[Path, str, str]:
+    """Create a repo whose graph.json is a MultiDiGraph with two stable endpoints.
+
+    Returns (repo_dir, a_id, b_id). Nodes ``afn``/``bfn`` live in dedicated,
+    never-changed files (amod.py / bmod.py) so re-extraction of an edge-
+    contributing file does not re-emit or evict them. The edge-contributing
+    files (file1.py / file2.py / edgesrc.py) exist as tracked code so detect()
+    keeps them in the corpus; parallel A->B edges are injected directly into the
+    on-disk "links" so each carries its own `key` + `source_file`.
+    """
+    from graphify.watch import _rebuild_code
+
+    _git_init(tmp_path)
+    (tmp_path / "amod.py").write_text("def afn():\n    return 1\n", encoding="utf-8")
+    (tmp_path / "bmod.py").write_text("def bfn():\n    return 2\n", encoding="utf-8")
+    (tmp_path / "file1.py").write_text("x1 = 1\n", encoding="utf-8")
+    (tmp_path / "file2.py").write_text("x2 = 2\n", encoding="utf-8")
+    (tmp_path / "edgesrc.py").write_text("y = 1\n", encoding="utf-8")
+
+    cwd = os.getcwd()
+    try:
+        os.chdir(tmp_path)
+        assert _rebuild_code(tmp_path, no_cluster=True) is True
+    finally:
+        os.chdir(cwd)
+
+    graph_path = tmp_path / "graphify-out" / "graph.json"
+    data = json.loads(graph_path.read_text(encoding="utf-8"))
+    a_id = next(n["id"] for n in data["nodes"] if n.get("label", "").startswith("afn("))
+    b_id = next(n["id"] for n in data["nodes"] if n.get("label", "").startswith("bfn("))
+    return graph_path, a_id, b_id
+
+
+def _set_links(graph_path: Path, base_data: dict, a_id: str, b_id: str, edges: list) -> None:
+    """Append `edges` (A->B parallel records) and stamp multigraph flags on disk."""
+    links = base_data.get("links", base_data.get("edges", []))
+    links += edges
+    base_data["links"] = links
+    base_data["multigraph"] = True
+    base_data["directed"] = True
+    graph_path.write_text(json.dumps(base_data, indent=2), encoding="utf-8")
+
+
+def _ab_links(graph_path: Path, a_id: str, b_id: str, source_file: str | None = None) -> list:
+    """Return the surviving A->B link records on disk, optionally filtered by source_file."""
+    data = json.loads(graph_path.read_text(encoding="utf-8"))
+    links = data.get("links", data.get("edges", []))
+    out = [e for e in links if e.get("source") == a_id and e.get("target") == b_id]
+    if source_file is not None:
+        out = [e for e in out if e.get("source_file") == source_file]
+    return out
+
+
+@pytest.mark.skipif(sys.platform == "win32", reason="git CLI behaviour varies on Windows runners")
+def test_watch_multigraph_unchanged_file_parallel_edges_persist(tmp_path):
+    """A pair with 3 parallel edges from a file that is NOT changed must keep all
+    3 across an incremental rebuild triggered by an unrelated file."""
+    from graphify.watch import _rebuild_code
+
+    graph_path, a_id, b_id = _build_multigraph_repo(tmp_path)
+    data = json.loads(graph_path.read_text(encoding="utf-8"))
+    _set_links(
+        graph_path,
+        data,
+        a_id,
+        b_id,
+        [
+            {"source": a_id, "target": b_id, "relation": "calls", "confidence": "EXTRACTED",
+             "source_file": "edgesrc.py", "source_location": "L1", "key": "k1"},
+            {"source": a_id, "target": b_id, "relation": "imports", "confidence": "EXTRACTED",
+             "source_file": "edgesrc.py", "source_location": "L2", "key": "k2"},
+            {"source": a_id, "target": b_id, "relation": "references", "confidence": "EXTRACTED",
+             "source_file": "edgesrc.py", "source_location": "L3", "key": "k3"},
+        ],
+    )
+
+    cwd = os.getcwd()
+    try:
+        os.chdir(tmp_path)
+        # Change an UNRELATED file; edgesrc.py (the edge contributor) is untouched.
+        (tmp_path / "other_change.py").write_text("def newfn():\n    return 0\n", encoding="utf-8")
+        assert _rebuild_code(tmp_path, changed_paths=[Path("other_change.py")], no_cluster=True)
+    finally:
+        os.chdir(cwd)
+
+    survivors = _ab_links(graph_path, a_id, b_id, source_file="edgesrc.py")
+    assert len(survivors) == 3, "all 3 parallel edges from the unchanged file must persist"
+    assert {e["relation"] for e in survivors} == {"calls", "imports", "references"}
+
+
+@pytest.mark.skipif(sys.platform == "win32", reason="git CLI behaviour varies on Windows runners")
+def test_watch_multigraph_changed_file_evicts_its_parallel_edges(tmp_path):
+    """A pair A->B with parallel edges from file1 AND file2; changing file1 must
+    evict file1's parallel edges between A->B while file2's survive (keyed,
+    per-source_file eviction — no collapse to one-edge-per-pair behaviour)."""
+    from graphify.watch import _rebuild_code
+
+    graph_path, a_id, b_id = _build_multigraph_repo(tmp_path)
+    data = json.loads(graph_path.read_text(encoding="utf-8"))
+    _set_links(
+        graph_path,
+        data,
+        a_id,
+        b_id,
+        [
+            {"source": a_id, "target": b_id, "relation": "calls", "confidence": "EXTRACTED",
+             "source_file": "file1.py", "source_location": "L1", "key": "k_f1_a"},
+            {"source": a_id, "target": b_id, "relation": "imports", "confidence": "EXTRACTED",
+             "source_file": "file1.py", "source_location": "L2", "key": "k_f1_b"},
+            {"source": a_id, "target": b_id, "relation": "calls", "confidence": "EXTRACTED",
+             "source_file": "file2.py", "source_location": "L9", "key": "k_f2_a"},
+        ],
+    )
+
+    cwd = os.getcwd()
+    try:
+        os.chdir(tmp_path)
+        (tmp_path / "file1.py").write_text("x1 = 99\n", encoding="utf-8")
+        assert _rebuild_code(tmp_path, changed_paths=[Path("file1.py")], no_cluster=True)
+    finally:
+        os.chdir(cwd)
+
+    assert _ab_links(graph_path, a_id, b_id, source_file="file1.py") == [], (
+        "file1's parallel A->B edges must be evicted when file1 changes"
+    )
+    file2_survivors = _ab_links(graph_path, a_id, b_id, source_file="file2.py")
+    assert len(file2_survivors) == 1, "file2's parallel A->B edge must survive selectively"
+    assert file2_survivors[0]["relation"] == "calls"
+
+
+@pytest.mark.skipif(sys.platform == "win32", reason="git CLI behaviour varies on Windows runners")
+def test_watch_multigraph_changed_file_evicts_stale_cross_file_edge(tmp_path):
+    """The FIX 3 gap: an edge between two SURVIVING nodes that was CONTRIBUTED by
+    the changed file must be evicted. The old endpoints-only check wrongly kept
+    it because both A and B (defined in unchanged files) still exist."""
+    from graphify.watch import _rebuild_code
+
+    graph_path, a_id, b_id = _build_multigraph_repo(tmp_path)
+    data = json.loads(graph_path.read_text(encoding="utf-8"))
+    # A single stale cross-file edge contributed by file1.py between A and B,
+    # both of which live in amod.py / bmod.py and therefore survive the change.
+    _set_links(
+        graph_path,
+        data,
+        a_id,
+        b_id,
+        [
+            {"source": a_id, "target": b_id, "relation": "calls", "confidence": "EXTRACTED",
+             "source_file": "file1.py", "source_location": "L1", "key": "stale"},
+        ],
+    )
+
+    cwd = os.getcwd()
+    try:
+        os.chdir(tmp_path)
+        (tmp_path / "file1.py").write_text("x1 = 99\n", encoding="utf-8")
+        assert _rebuild_code(tmp_path, changed_paths=[Path("file1.py")], no_cluster=True)
+    finally:
+        os.chdir(cwd)
+
+    assert _ab_links(graph_path, a_id, b_id, source_file="file1.py") == [], (
+        "stale cross-file edge contributed by the changed file must be evicted "
+        "even though both endpoints survive (FIX 3)"
+    )
+
+
+@pytest.mark.skipif(sys.platform == "win32", reason="git CLI behaviour varies on Windows runners")
+def test_watch_multigraph_deleted_file_removes_all_its_edge_records(tmp_path):
+    """Deleting a file must remove ALL its edge records, including parallels,
+    while leaving another file's parallel between the same pair intact."""
+    from graphify.watch import _rebuild_code
+
+    graph_path, a_id, b_id = _build_multigraph_repo(tmp_path)
+    data = json.loads(graph_path.read_text(encoding="utf-8"))
+    _set_links(
+        graph_path,
+        data,
+        a_id,
+        b_id,
+        [
+            {"source": a_id, "target": b_id, "relation": "calls", "confidence": "EXTRACTED",
+             "source_file": "file1.py", "source_location": "L1", "key": "d1"},
+            {"source": a_id, "target": b_id, "relation": "imports", "confidence": "EXTRACTED",
+             "source_file": "file1.py", "source_location": "L2", "key": "d2"},
+            {"source": a_id, "target": b_id, "relation": "calls", "confidence": "EXTRACTED",
+             "source_file": "file2.py", "source_location": "L3", "key": "keep"},
+        ],
+    )
+
+    cwd = os.getcwd()
+    try:
+        os.chdir(tmp_path)
+        # Delete file1.py and pass it in changed_paths (post-commit-hook style).
+        (tmp_path / "file1.py").unlink()
+        assert _rebuild_code(tmp_path, changed_paths=[Path("file1.py")], no_cluster=True)
+    finally:
+        os.chdir(cwd)
+
+    assert _ab_links(graph_path, a_id, b_id, source_file="file1.py") == [], (
+        "all of the deleted file's edge records (incl. parallels) must be removed"
+    )
+    assert len(_ab_links(graph_path, a_id, b_id, source_file="file2.py")) == 1, (
+        "the surviving file's parallel edge between the same pair must be kept"
+    )
+
+
+def test_watch_canonical_comparison_distinguishes_parallel_edges():
+    """Two multigraphs differing ONLY in a parallel edge's presence must canonical-
+    compare as DIFFERENT (FIX 2). Identical multigraphs must compare EQUAL, and
+    two parallels that differ ONLY in `key` must stay distinct (key is the
+    load-bearing identity field that keeps parallels from collapsing)."""
+    from graphify.watch import _canonical_topology_for_compare
+
+    nodes = [{"id": "A", "label": "A"}, {"id": "B", "label": "B"}]
+    e1 = {"source": "A", "target": "B", "relation": "calls",
+          "source_file": "f1.py", "source_location": "L1", "key": "k1"}
+    e2 = {"source": "A", "target": "B", "relation": "calls",
+          "source_file": "f1.py", "source_location": "L2", "key": "k2"}
+    profile = {"graphify_profile": {"graph_type": "multidigraph"}}
+
+    two = {"nodes": nodes, "links": [dict(e1), dict(e2)], "graph": dict(profile)}
+    one = {"nodes": nodes, "links": [dict(e1)], "graph": dict(profile)}
+    two_again = {"nodes": nodes, "links": [dict(e1), dict(e2)], "graph": dict(profile)}
+
+    def canon(g: dict) -> str:
+        return json.dumps(_canonical_topology_for_compare(g), sort_keys=True)
+
+    assert canon(two) != canon(one), "adding a parallel edge must register as a change"
+    assert canon(two) == canon(two_again), "identical multigraphs must compare equal"
+
+    # Two parallels identical in every field EXCEPT key must remain distinct.
+    twin_a = {"source": "A", "target": "B", "relation": "calls",
+              "source_file": "f1.py", "source_location": "L1", "key": "ka"}
+    twin_b = {"source": "A", "target": "B", "relation": "calls",
+              "source_file": "f1.py", "source_location": "L1", "key": "kb"}
+    twins = {"nodes": nodes, "links": [twin_a, twin_b]}
+    canon_twins = _canonical_topology_for_compare(twins)
+    assert len(canon_twins["links"]) == 2, "key-only-different parallels must not collapse"
+    assert all("key" in e for e in canon_twins["links"]), "canonical edge must retain `key`"
+    single_twin = {"nodes": nodes, "links": [dict(twin_a)]}
+    assert canon(twins) != canon(single_twin), (
+        "removing a key-only-different parallel must register as a change"
+    )
+
+
+def test_watch_simple_mode_unchanged_regression(tmp_path, monkeypatch):
+    """Simple-graph watch rebuild behaves as before: a topology-unchanged second
+    pass still skips cluster(). Guards the FIX 1 regression (graph-level
+    graphify_profile metadata must not be read as a topology change)."""
+    from graphify import cluster as cluster_mod
+    from graphify.watch import _rebuild_code
+
+    (tmp_path / "app.py").write_text(
+        "def alpha():\n    return 1\n\ndef beta():\n    return alpha()\n", encoding="utf-8"
+    )
+
+    calls = {"n": 0}
+
+    def cluster_once(G):
+        calls["n"] += 1
+        if calls["n"] > 1:
+            raise AssertionError("cluster() must be skipped when topology is unchanged")
+        return {0: sorted(G.nodes())}
+
+    monkeypatch.setattr(cluster_mod, "cluster", cluster_once)
+    monkeypatch.setattr(cluster_mod, "score_all", lambda _G, comm: {cid: 1.0 for cid in comm})
+
+    assert _rebuild_code(tmp_path)
+    assert _rebuild_code(tmp_path)
+    assert calls["n"] == 1, "topology-unchanged simple-graph rebuild must not re-cluster"
+
+
+@pytest.mark.skipif(sys.platform == "win32", reason="git CLI behaviour varies on Windows runners")
+def test_watch_multigraph_full_rebuild_preserves_profile_flag(tmp_path):
+    """Regression for the DEFERRED silent collapse to simple graph.
+
+    A MultiDiGraph graph.json with keyed parallel edges, put through a
+    TOPOLOGY-CHANGING rebuild (a new file is added so _rebuild_code does NOT
+    early-return on the unchanged-topology check and actually rewrites
+    graph.json via to_json), must rewrite a graph.json that still:
+      - declares ``multigraph == true`` (the flag load_graph keys on),
+      - carries ``graphify_profile.graph_type == "multidigraph"``,
+      - keeps the parallel A->B edge records, and
+      - reloads via the production loader as a MultiDiGraph with all parallels
+        intact (NOT collapsed to one edge per pair).
+
+    Before the inherit-multigraph fix, _rebuild_code built a simple DiGraph, so
+    to_json wrote a graph.json with no multigraph flag and a "simple" profile —
+    the next load_graph would collapse the preserved parallel links to a single
+    edge (the PR 7 go/no-go violation: "no silent fallback to simple graph
+    behavior"). This test fails on that regression and passes once _rebuild_code
+    inherits the saved multigraph class.
+    """
+    from graphify.watch import _rebuild_code
+    from graphify.graph_loader import load_graph, GRAPHIFY_PROFILE_KEY
+    import networkx as nx
+
+    graph_path, a_id, b_id = _build_multigraph_repo(tmp_path)
+    data = json.loads(graph_path.read_text(encoding="utf-8"))
+    # Three keyed parallel A->B edges contributed by amod.py (a file that is NOT
+    # changed, so the edges are preserved across the rebuild). amod.py/bmod.py
+    # define A/B, so both endpoints survive.
+    _set_links(
+        graph_path,
+        data,
+        a_id,
+        b_id,
+        [
+            {"source": a_id, "target": b_id, "relation": "calls", "confidence": "EXTRACTED",
+             "source_file": "amod.py", "source_location": "L1", "key": "mk1"},
+            {"source": a_id, "target": b_id, "relation": "imports", "confidence": "EXTRACTED",
+             "source_file": "amod.py", "source_location": "L2", "key": "mk2"},
+            {"source": a_id, "target": b_id, "relation": "references", "confidence": "EXTRACTED",
+             "source_file": "amod.py", "source_location": "L3", "key": "mk3"},
+        ],
+    )
+
+    cwd = os.getcwd()
+    try:
+        os.chdir(tmp_path)
+        # Add a NEW file: this changes topology so the rebuild does NOT hit the
+        # "no topology change" early return and genuinely rewrites graph.json via
+        # the clustered to_json path. no_viz keeps the test fast.
+        (tmp_path / "newmod.py").write_text("def newfn():\n    return 9\n", encoding="utf-8")
+        assert _rebuild_code(tmp_path, no_viz=True) is True
+    finally:
+        os.chdir(cwd)
+
+    rewritten = json.loads(graph_path.read_text(encoding="utf-8"))
+    # 1. The multigraph flag survived the rewrite.
+    assert rewritten.get("multigraph") is True, (
+        "rewritten graph.json must keep multigraph=true (else next load collapses parallels)"
+    )
+    # 2. The multidigraph profile survived (Phase A persists it from the instance).
+    profile = (rewritten.get("graph") or {}).get(GRAPHIFY_PROFILE_KEY) or {}
+    assert profile.get("graph_type") == "multidigraph", (
+        f"rewritten graphify_profile must be multidigraph, got {profile!r}"
+    )
+    # 3. The parallel edge records are still present (3 A->B records from amod.py).
+    ab = _ab_links(graph_path, a_id, b_id, source_file="amod.py")
+    assert len(ab) == 3, f"all 3 parallel A->B edge records must persist, got {len(ab)}"
+    # 4. The new file's node landed (proves the rebuild actually re-ran, not a no-op).
+    new_labels = {n.get("label", "") for n in rewritten.get("nodes", [])}
+    assert any(lbl.startswith("newfn(") for lbl in new_labels), (
+        "the topology-changing new file must have been extracted into the rewrite"
+    )
+    # 5. The production loader reloads it as a MultiDiGraph with parallels intact —
+    #    the definitive proof there is no deferred collapse to simple.
+    reloaded = load_graph(rewritten)
+    assert isinstance(reloaded, nx.MultiDiGraph), (
+        f"reloaded graph must be a MultiDiGraph, got {type(reloaded).__name__}"
+    )
+    assert reloaded.number_of_edges(a_id, b_id) == 3, (
+        "reloaded MultiDiGraph must keep all 3 parallel A->B edges (NOT collapsed to 1)"
+    )

From 5c64c925c692dd549aef5c0246c68091c463493f Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Fri, 29 May 2026 22:36:28 -0500
Subject: [PATCH 15/21] fix(multigraph): keep watch rebuild links idempotent

---
 graphify/cluster.py |  25 +++-
 graphify/watch.py   |  61 ++++++++++
 tests/test_build.py |  10 +-
 tests/test_watch.py | 280 ++++++++++++++++++++++++++++++++++++++------
 4 files changed, 330 insertions(+), 46 deletions(-)

diff --git a/graphify/cluster.py b/graphify/cluster.py
index 97edb4b04..4ba9e782c 100644
--- a/graphify/cluster.py
+++ b/graphify/cluster.py
@@ -6,6 +6,7 @@
 import io
 import json
 import sys
+import warnings
 import networkx as nx
 
 from graphify.projections import project_for_community
@@ -22,6 +23,25 @@ def _suppress_output():
     return contextlib.redirect_stdout(io.StringIO())
 
 
+@contextlib.contextmanager
+def _suppress_graspologic_dependency_warnings():
+    """Suppress known optional-dependency deprecations emitted by graspologic imports."""
+    with warnings.catch_warnings():
+        warnings.filterwarnings(
+            "ignore",
+            message=r"Please import `random` from the `scipy\.sparse` namespace.*",
+            category=DeprecationWarning,
+            module=r"hyppo\.independence\.hhg",
+        )
+        warnings.filterwarnings(
+            "ignore",
+            message=r"The keyword argument 'nopython=False' was supplied.*",
+            category=Warning,
+            module=r"numba\.core\.decorators",
+        )
+        yield
+
+
 def _partition(G: nx.Graph, resolution: float = 1.0) -> dict[str, int]:
     """Run community detection. Returns {node_id: community_id}.
 
@@ -48,7 +68,8 @@ def _partition(G: nx.Graph, resolution: float = 1.0) -> dict[str, int]:
         stable.add_edge(src, tgt, **attrs)
 
     try:
-        from graspologic.partition import leiden
+        with _suppress_graspologic_dependency_warnings():
+            from graspologic.partition import leiden
 
         lsig = inspect.signature(leiden).parameters
         kwargs: dict = {}
@@ -63,7 +84,7 @@ def _partition(G: nx.Graph, resolution: float = 1.0) -> dict[str, int]:
         old_stderr = sys.stderr
         try:
             sys.stderr = io.StringIO()
-            with _suppress_output():
+            with _suppress_graspologic_dependency_warnings(), _suppress_output():
                 result = leiden(stable, **kwargs)
         finally:
             sys.stderr = old_stderr
diff --git a/graphify/watch.py b/graphify/watch.py
index 70cf23bc8..b342e092b 100644
--- a/graphify/watch.py
+++ b/graphify/watch.py
@@ -349,6 +349,66 @@ def _canonical_topology_for_compare(graph_data: dict) -> dict:
     return canonical
 
 
+def _dedupe_rebuilt_edge_records(edges: list[dict]) -> list[dict]:
+    """Remove duplicate edge records introduced by preserving + re-extracting.
+
+    A full AST rebuild re-extracts all code edges and also preserves existing
+    graph links so semantic/non-code relationships survive. Without this pass,
+    raw ``--no-cluster`` rebuilds append the same AST links on every run.
+
+    Distinct keyed parallels remain distinct. When the same relationship appears
+    once with a key and once without one, prefer the keyed record because it
+    carries the stable MultiDiGraph identity from the previous graph.
+    """
+
+    def fingerprint(edge: dict) -> str:
+        comparable = dict(edge)
+        comparable.pop("key", None)
+        comparable.pop("confidence_score", None)
+        return json.dumps(comparable, sort_keys=True, ensure_ascii=False, default=str)
+
+    kept: list[dict] = []
+    by_fingerprint: dict[str, list[int]] = {}
+    for edge in edges:
+        if not isinstance(edge, dict):
+            kept.append(edge)
+            continue
+        fp = fingerprint(edge)
+        key = edge.get("key")
+        existing_indexes = by_fingerprint.get(fp, [])
+        if not existing_indexes:
+            by_fingerprint[fp] = [len(kept)]
+            kept.append(edge)
+            continue
+
+        if key is None:
+            # A keyless duplicate is either an exact repeat from a previous raw
+            # no-cluster rebuild or a fresh AST record that matches a preserved
+            # keyed edge. In both cases the existing record is authoritative.
+            continue
+
+        same_key = False
+        replace_keyless_index: int | None = None
+        for idx in existing_indexes:
+            existing = kept[idx]
+            if not isinstance(existing, dict):
+                continue
+            existing_key = existing.get("key")
+            if existing_key == key:
+                same_key = True
+                break
+            if existing_key is None and replace_keyless_index is None:
+                replace_keyless_index = idx
+        if same_key:
+            continue
+        if replace_keyless_index is not None:
+            kept[replace_keyless_index] = edge
+            continue
+        existing_indexes.append(len(kept))
+        kept.append(edge)
+    return kept
+
+
 def _topology_from_graph(G) -> dict:
     from networkx.readwrite import json_graph
 
@@ -683,6 +743,7 @@ def _rebuild_code(
                 )
 
         _relativize_source_files(result, project_root)
+        result["edges"] = _dedupe_rebuilt_edge_records(result.get("edges", []))
         out.mkdir(exist_ok=True)
         (out / ".graphify_root").write_text(str(watch_root), encoding="utf-8")
 
diff --git a/tests/test_build.py b/tests/test_build.py
index b43029afa..71c0eff1b 100644
--- a/tests/test_build.py
+++ b/tests/test_build.py
@@ -1294,9 +1294,9 @@ def test_build_merge_multigraph_deleted_file_removes_all_its_edge_records(tmp_pa
 
     assert type(G) is nx.MultiDiGraph
     # All gone.py edge records removed across all pairs.
-    assert all(
-        d.get("source_file") != "gone.py" for _u, _v, d in G.edges(data=True)
-    ), "no edge record from the deleted file may survive"
+    assert all(d.get("source_file") != "gone.py" for _u, _v, d in G.edges(data=True)), (
+        "no edge record from the deleted file may survive"
+    )
     # The keep.py A→B parallel edge survives even though gone.py shared that pair.
     assert G.number_of_edges("A", "B") == 1
     assert next(iter(G["A"]["B"].values())).get("source_file") == "keep.py"
@@ -1352,9 +1352,7 @@ def test_build_merge_simple_graph_unchanged_regression(tmp_path):
     }
     G0 = build([chunk], dedup=False)
     graph_path = tmp_path / "graph.json"
-    graph_path.write_text(
-        json.dumps(nx_local.node_link_data(G0, edges="edges")), encoding="utf-8"
-    )
+    graph_path.write_text(json.dumps(nx_local.node_link_data(G0, edges="edges")), encoding="utf-8")
 
     # No new chunks, default args → must inherit simple (non-multigraph) type and
     # remain a plain undirected Graph here (saved directed flag is false).
diff --git a/tests/test_watch.py b/tests/test_watch.py
index a25179bf3..71a04350f 100644
--- a/tests/test_watch.py
+++ b/tests/test_watch.py
@@ -755,9 +755,7 @@ def test_merge_changed_paths_dedupes_in_order():
 def _git_init(path: Path) -> None:
     """Initialise a throwaway git repo so detect() treats `path` as a real corpus."""
     subprocess.run(["git", "init", "-q", str(path)], check=True)
-    subprocess.run(
-        ["git", "-C", str(path), "config", "user.email", "test@example.com"], check=True
-    )
+    subprocess.run(["git", "-C", str(path), "config", "user.email", "test@example.com"], check=True)
     subprocess.run(["git", "-C", str(path), "config", "user.name", "Test"], check=True)
 
 
@@ -828,12 +826,33 @@ def test_watch_multigraph_unchanged_file_parallel_edges_persist(tmp_path):
         a_id,
         b_id,
         [
-            {"source": a_id, "target": b_id, "relation": "calls", "confidence": "EXTRACTED",
-             "source_file": "edgesrc.py", "source_location": "L1", "key": "k1"},
-            {"source": a_id, "target": b_id, "relation": "imports", "confidence": "EXTRACTED",
-             "source_file": "edgesrc.py", "source_location": "L2", "key": "k2"},
-            {"source": a_id, "target": b_id, "relation": "references", "confidence": "EXTRACTED",
-             "source_file": "edgesrc.py", "source_location": "L3", "key": "k3"},
+            {
+                "source": a_id,
+                "target": b_id,
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "edgesrc.py",
+                "source_location": "L1",
+                "key": "k1",
+            },
+            {
+                "source": a_id,
+                "target": b_id,
+                "relation": "imports",
+                "confidence": "EXTRACTED",
+                "source_file": "edgesrc.py",
+                "source_location": "L2",
+                "key": "k2",
+            },
+            {
+                "source": a_id,
+                "target": b_id,
+                "relation": "references",
+                "confidence": "EXTRACTED",
+                "source_file": "edgesrc.py",
+                "source_location": "L3",
+                "key": "k3",
+            },
         ],
     )
 
@@ -866,12 +885,33 @@ def test_watch_multigraph_changed_file_evicts_its_parallel_edges(tmp_path):
         a_id,
         b_id,
         [
-            {"source": a_id, "target": b_id, "relation": "calls", "confidence": "EXTRACTED",
-             "source_file": "file1.py", "source_location": "L1", "key": "k_f1_a"},
-            {"source": a_id, "target": b_id, "relation": "imports", "confidence": "EXTRACTED",
-             "source_file": "file1.py", "source_location": "L2", "key": "k_f1_b"},
-            {"source": a_id, "target": b_id, "relation": "calls", "confidence": "EXTRACTED",
-             "source_file": "file2.py", "source_location": "L9", "key": "k_f2_a"},
+            {
+                "source": a_id,
+                "target": b_id,
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "file1.py",
+                "source_location": "L1",
+                "key": "k_f1_a",
+            },
+            {
+                "source": a_id,
+                "target": b_id,
+                "relation": "imports",
+                "confidence": "EXTRACTED",
+                "source_file": "file1.py",
+                "source_location": "L2",
+                "key": "k_f1_b",
+            },
+            {
+                "source": a_id,
+                "target": b_id,
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "file2.py",
+                "source_location": "L9",
+                "key": "k_f2_a",
+            },
         ],
     )
 
@@ -908,8 +948,15 @@ def test_watch_multigraph_changed_file_evicts_stale_cross_file_edge(tmp_path):
         a_id,
         b_id,
         [
-            {"source": a_id, "target": b_id, "relation": "calls", "confidence": "EXTRACTED",
-             "source_file": "file1.py", "source_location": "L1", "key": "stale"},
+            {
+                "source": a_id,
+                "target": b_id,
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "file1.py",
+                "source_location": "L1",
+                "key": "stale",
+            },
         ],
     )
 
@@ -941,12 +988,33 @@ def test_watch_multigraph_deleted_file_removes_all_its_edge_records(tmp_path):
         a_id,
         b_id,
         [
-            {"source": a_id, "target": b_id, "relation": "calls", "confidence": "EXTRACTED",
-             "source_file": "file1.py", "source_location": "L1", "key": "d1"},
-            {"source": a_id, "target": b_id, "relation": "imports", "confidence": "EXTRACTED",
-             "source_file": "file1.py", "source_location": "L2", "key": "d2"},
-            {"source": a_id, "target": b_id, "relation": "calls", "confidence": "EXTRACTED",
-             "source_file": "file2.py", "source_location": "L3", "key": "keep"},
+            {
+                "source": a_id,
+                "target": b_id,
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "file1.py",
+                "source_location": "L1",
+                "key": "d1",
+            },
+            {
+                "source": a_id,
+                "target": b_id,
+                "relation": "imports",
+                "confidence": "EXTRACTED",
+                "source_file": "file1.py",
+                "source_location": "L2",
+                "key": "d2",
+            },
+            {
+                "source": a_id,
+                "target": b_id,
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "file2.py",
+                "source_location": "L3",
+                "key": "keep",
+            },
         ],
     )
 
@@ -975,10 +1043,22 @@ def test_watch_canonical_comparison_distinguishes_parallel_edges():
     from graphify.watch import _canonical_topology_for_compare
 
     nodes = [{"id": "A", "label": "A"}, {"id": "B", "label": "B"}]
-    e1 = {"source": "A", "target": "B", "relation": "calls",
-          "source_file": "f1.py", "source_location": "L1", "key": "k1"}
-    e2 = {"source": "A", "target": "B", "relation": "calls",
-          "source_file": "f1.py", "source_location": "L2", "key": "k2"}
+    e1 = {
+        "source": "A",
+        "target": "B",
+        "relation": "calls",
+        "source_file": "f1.py",
+        "source_location": "L1",
+        "key": "k1",
+    }
+    e2 = {
+        "source": "A",
+        "target": "B",
+        "relation": "calls",
+        "source_file": "f1.py",
+        "source_location": "L2",
+        "key": "k2",
+    }
     profile = {"graphify_profile": {"graph_type": "multidigraph"}}
 
     two = {"nodes": nodes, "links": [dict(e1), dict(e2)], "graph": dict(profile)}
@@ -992,10 +1072,22 @@ def canon(g: dict) -> str:
     assert canon(two) == canon(two_again), "identical multigraphs must compare equal"
 
     # Two parallels identical in every field EXCEPT key must remain distinct.
-    twin_a = {"source": "A", "target": "B", "relation": "calls",
-              "source_file": "f1.py", "source_location": "L1", "key": "ka"}
-    twin_b = {"source": "A", "target": "B", "relation": "calls",
-              "source_file": "f1.py", "source_location": "L1", "key": "kb"}
+    twin_a = {
+        "source": "A",
+        "target": "B",
+        "relation": "calls",
+        "source_file": "f1.py",
+        "source_location": "L1",
+        "key": "ka",
+    }
+    twin_b = {
+        "source": "A",
+        "target": "B",
+        "relation": "calls",
+        "source_file": "f1.py",
+        "source_location": "L1",
+        "key": "kb",
+    }
     twins = {"nodes": nodes, "links": [twin_a, twin_b]}
     canon_twins = _canonical_topology_for_compare(twins)
     assert len(canon_twins["links"]) == 2, "key-only-different parallels must not collapse"
@@ -1034,7 +1126,7 @@ def cluster_once(G):
 
 
 @pytest.mark.skipif(sys.platform == "win32", reason="git CLI behaviour varies on Windows runners")
-def test_watch_multigraph_full_rebuild_preserves_profile_flag(tmp_path):
+def test_watch_multigraph_full_rebuild_preserves_profile_flag(tmp_path, monkeypatch):
     """Regression for the DEFERRED silent collapse to simple graph.
 
     A MultiDiGraph graph.json with keyed parallel edges, put through a
@@ -1056,8 +1148,16 @@ def test_watch_multigraph_full_rebuild_preserves_profile_flag(tmp_path):
     """
     from graphify.watch import _rebuild_code
     from graphify.graph_loader import load_graph, GRAPHIFY_PROFILE_KEY
+    from graphify import cluster as cluster_mod
     import networkx as nx
 
+    monkeypatch.setattr(
+        cluster_mod,
+        "cluster",
+        lambda G: {0: sorted(G.nodes(), key=str)},
+    )
+    monkeypatch.setattr(cluster_mod, "score_all", lambda _G, comm: {cid: 1.0 for cid in comm})
+
     graph_path, a_id, b_id = _build_multigraph_repo(tmp_path)
     data = json.loads(graph_path.read_text(encoding="utf-8"))
     # Three keyed parallel A->B edges contributed by amod.py (a file that is NOT
@@ -1069,12 +1169,33 @@ def test_watch_multigraph_full_rebuild_preserves_profile_flag(tmp_path):
         a_id,
         b_id,
         [
-            {"source": a_id, "target": b_id, "relation": "calls", "confidence": "EXTRACTED",
-             "source_file": "amod.py", "source_location": "L1", "key": "mk1"},
-            {"source": a_id, "target": b_id, "relation": "imports", "confidence": "EXTRACTED",
-             "source_file": "amod.py", "source_location": "L2", "key": "mk2"},
-            {"source": a_id, "target": b_id, "relation": "references", "confidence": "EXTRACTED",
-             "source_file": "amod.py", "source_location": "L3", "key": "mk3"},
+            {
+                "source": a_id,
+                "target": b_id,
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "source_file": "amod.py",
+                "source_location": "L1",
+                "key": "mk1",
+            },
+            {
+                "source": a_id,
+                "target": b_id,
+                "relation": "imports",
+                "confidence": "EXTRACTED",
+                "source_file": "amod.py",
+                "source_location": "L2",
+                "key": "mk2",
+            },
+            {
+                "source": a_id,
+                "target": b_id,
+                "relation": "references",
+                "confidence": "EXTRACTED",
+                "source_file": "amod.py",
+                "source_location": "L3",
+                "key": "mk3",
+            },
         ],
     )
 
@@ -1116,3 +1237,86 @@ def test_watch_multigraph_full_rebuild_preserves_profile_flag(tmp_path):
     assert reloaded.number_of_edges(a_id, b_id) == 3, (
         "reloaded MultiDiGraph must keep all 3 parallel A->B edges (NOT collapsed to 1)"
     )
+
+
+@pytest.mark.skipif(sys.platform == "win32", reason="git CLI behaviour varies on Windows runners")
+def test_watch_no_cluster_full_rebuild_does_not_duplicate_links(tmp_path):
+    """A full raw rebuild must be idempotent for links.
+
+    The full no-cluster path re-extracts every code file and also preserves
+    existing links. Without a dedupe pass, each full rebuild appends another copy
+    of the same AST edge records.
+    """
+    from graphify.watch import _rebuild_code
+
+    (tmp_path / "app.py").write_text(
+        "def alpha():\n    return 1\n\ndef beta():\n    return alpha()\n",
+        encoding="utf-8",
+    )
+
+    cwd = os.getcwd()
+    try:
+        os.chdir(tmp_path)
+        assert _rebuild_code(tmp_path, no_cluster=True, acquire_lock=False)
+        graph_path = tmp_path / "graphify-out" / "graph.json"
+        first = json.loads(graph_path.read_text(encoding="utf-8"))
+        first_links = first.get("links", [])
+
+        assert _rebuild_code(tmp_path, no_cluster=True, acquire_lock=False)
+        second = json.loads(graph_path.read_text(encoding="utf-8"))
+    finally:
+        os.chdir(cwd)
+
+    second_links = second.get("links", [])
+    assert len(second_links) == len(first_links)
+
+    def fingerprint(edge: dict) -> str:
+        comparable = dict(edge)
+        comparable.pop("key", None)
+        comparable.pop("confidence_score", None)
+        return json.dumps(comparable, sort_keys=True, ensure_ascii=False)
+
+    assert len({fingerprint(edge) for edge in second_links}) == len(second_links)
+
+
+@pytest.mark.skipif(sys.platform == "win32", reason="git CLI behaviour varies on Windows runners")
+def test_watch_no_cluster_full_rebuild_keeps_distinct_keyed_parallels(tmp_path):
+    """Dedupe removes the fresh keyless duplicate, not the keyed parallels."""
+    from graphify.watch import _rebuild_code
+
+    (tmp_path / "app.py").write_text(
+        "def alpha():\n    return 1\n\ndef beta():\n    return alpha()\n",
+        encoding="utf-8",
+    )
+
+    cwd = os.getcwd()
+    try:
+        os.chdir(tmp_path)
+        assert _rebuild_code(tmp_path, no_cluster=True, acquire_lock=False)
+        graph_path = tmp_path / "graphify-out" / "graph.json"
+        data = json.loads(graph_path.read_text(encoding="utf-8"))
+        base_links = data["links"]
+        selected = dict(base_links[0])
+        selected_a = dict(selected, key="parallel-a")
+        selected_b = dict(selected, key="parallel-b")
+        data["links"] = [selected_a, selected_b]
+        data["multigraph"] = True
+        data["directed"] = True
+        graph_path.write_text(json.dumps(data, indent=2), encoding="utf-8")
+
+        assert _rebuild_code(tmp_path, no_cluster=True, acquire_lock=False)
+        rebuilt = json.loads(graph_path.read_text(encoding="utf-8"))
+    finally:
+        os.chdir(cwd)
+
+    def same_relationship(edge: dict) -> bool:
+        comparable = dict(edge)
+        comparable.pop("key", None)
+        comparable.pop("confidence_score", None)
+        expected = dict(selected)
+        expected.pop("confidence_score", None)
+        return comparable == expected
+
+    matching = [edge for edge in rebuilt["links"] if same_relationship(edge)]
+    assert {edge.get("key") for edge in matching} == {"parallel-a", "parallel-b"}
+    assert len(matching) == 2

From 5ff5f5464285c83b29a39390fe5947aba782dda2 Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Fri, 29 May 2026 23:43:41 -0500
Subject: [PATCH 16/21] feat(multigraph): PR 8 global graph/merge/recovery
 preserve keyed parallel edges
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Cross-repo global composition and merge are now MultiDiGraph-safe with explicit
recovery + backup policy. Gate met: mixed simple/multi inputs never crash
through NetworkX class mismatch and never silently collapse a multigraph
without an explicit simple target.

global_graph.py:
- global_add composes key-aware after normalizing existing + incoming to an
  inferred target class — no silent collapse; re-adding the same repo is
  idempotent (no edge drift).
- Reusable helpers: normalize_graphs_for_global (infers target; warns+projects
  only on explicit simple target over multi input), detect_pre_profile,
  refuse_pre_profile_upgrade, backup_global_graph.
- Stores graphify_profile; pre-profile files refuse silent multidigraph
  "upgrade" (unreconstructable parallels) with a recovery message; dated
  backup before any overwrite.

__main__.py merge-driver / merge-graphs:
- Consume the same helpers: normalize-then-keyed-compose (no nx.compose class
  mismatch crash), backup before write, pre-profile refusal. merge-graphs gains
  --multigraph/--simple flags; default infers (no silent collapse).
- merge-driver pre-profile refusal scoped to the OVERWRITTEN current file only
  (other is read-only — never blocks a valid merge).

Verification (PR 7 lesson): every merge/recovery path tested under REPEATED
application (global_add + both merge commands 3x assert stable records/keys/
counts/profile). Independent coordinator probe confirmed global_add 3x keeps
parallels with no drift.

23 new tests incl. idempotence-under-repeat, mixed-input no-crash, pre-profile
refusal scope (both directions), backup + NO_BACKUP, simple-only regression.
Full suite 1707 passed; ruff + pyright clean.

gost
---
 graphify/__main__.py       | 205 ++++++++-
 graphify/global_graph.py   | 356 +++++++++++++-
 tests/test_global_graph.py | 919 +++++++++++++++++++++++++++++++++++++
 3 files changed, 1443 insertions(+), 37 deletions(-)

diff --git a/graphify/__main__.py b/graphify/__main__.py
index 70786568f..8a558dec5 100644
--- a/graphify/__main__.py
+++ b/graphify/__main__.py
@@ -45,6 +45,50 @@ def _enforce_graph_size_cap_or_exit(gp: Path) -> None:
         sys.exit(1)
 
 
+def _backup_merge_target(target: Path) -> "Path | None":
+    """Snapshot an existing merge target to a dated ``.bak`` sibling before overwrite.
+
+    Mirrors :func:`graphify.global_graph.backup_global_graph`'s dated-snapshot
+    contract, parameterized for an arbitrary merge output path (the merge-driver
+    writes to ``current``; ``merge-graphs`` writes to ``--out``). Neither of the
+    existing backup helpers fits here: ``export.backup_if_protected`` keys off a
+    ``graphify-out`` layout (semantic marker / curated labels) and
+    ``backup_global_graph`` is hard-wired to the global-graph path.
+
+    The backup is written next to *target* as ``<stem>.<YYYY-MM-DD>.bak``.
+    Idempotent within a day — identical content is not re-copied; a changed
+    target refreshes the same-day snapshot in place (one backup per day, always
+    the latest pre-overwrite state). Returns the backup path, or None when there
+    is nothing to back up (target absent) or backups are disabled via
+    ``GRAPHIFY_NO_BACKUP``. Never raises: a backup failure prints a warning and
+    returns None so it can never block the write it protects.
+    """
+    import hashlib
+    from datetime import date
+
+    if os.environ.get("GRAPHIFY_NO_BACKUP"):
+        return None
+    if not target.exists():
+        return None
+
+    today = date.today().isoformat()
+    backup_path = target.with_name(f"{target.stem}.{today}.bak")
+    try:
+        if backup_path.exists():
+            src_hash = hashlib.sha256(target.read_bytes()).hexdigest()
+            bak_hash = hashlib.sha256(backup_path.read_bytes()).hexdigest()
+            if src_hash == bak_hash:
+                return backup_path  # identical content, nothing to do
+        shutil.copy2(target, backup_path)
+        return backup_path
+    except Exception as exc:
+        print(
+            f"[graphify merge] warning: backup failed ({exc}) — continuing with overwrite",
+            file=sys.stderr,
+        )
+        return None
+
+
 def _check_skill_version(skill_dst: Path) -> None:
     """Warn if the installed skill is from an older graphify version."""
     version_file = skill_dst.parent / ".graphify_version"
@@ -1539,7 +1583,7 @@ def main() -> None:
             "  merge-driver <base> <current> <other>  git merge driver: union-merge two graph.json files (set up via hook install)"
         )
         print(
-            "  merge-graphs <g1> <g2>  merge two or more graph.json files into one cross-repo graph"
+            "  merge-graphs <g1> <g2> [--multigraph|--simple]  merge two or more graph.json files into one cross-repo graph"
         )
         print("    --out <path>            output path (default: graphify-out/merged-graph.json)")
         print("    --branch <branch>       checkout a specific branch (default: repo default)")
@@ -2870,12 +2914,78 @@ def _load_graph(p: str):
                 return _jg.node_link_graph(data), data
 
         try:
-            current_graph, _ = _load_graph(_current_path)
-            other_graph, _ = _load_graph(_other_path)
+            current_graph, current_data = _load_graph(_current_path)
+            other_graph, other_data = _load_graph(_other_path)
         except Exception as exc:
             print(f"[graphify merge-driver] error loading graphs: {exc}", file=sys.stderr)
             sys.exit(1)  # surface the conflict so git doesn't accept a corrupt merge
-        merged_graph = _nx.compose(current_graph, other_graph)
+
+        # Class-safe union. Reuse the Phase A normalizer so a class mismatch
+        # (e.g. simple current + MultiDiGraph other) can never reach
+        # ``nx.compose`` raising NetworkXError. ``normalize_graphs_for_global``
+        # with no explicit target infers the target (multidigraph if either
+        # side is multi; else digraph if either directed; else simple) and
+        # returns BOTH inputs realized as that one class, in order.
+        from graphify.global_graph import (
+            GlobalGraphRecoveryError,
+            detect_pre_profile,
+            normalize_graphs_for_global,
+            refuse_pre_profile_upgrade,
+        )
+
+        try:
+            (current_norm, other_norm), target_type = normalize_graphs_for_global(
+                [current_graph, other_graph]
+            )
+        except Exception as exc:
+            print(f"[graphify merge-driver] error normalizing graphs: {exc}", file=sys.stderr)
+            sys.exit(1)
+
+        # Recovery refusal: refuse to UPGRADE a pre-profile input (one with no
+        # graphify_profile / multigraph / directed markers — possibly already a
+        # silently-collapsed simple graph) to a multidigraph target, since the
+        # lost parallel edges cannot be reconstructed in place. Leave both files
+        # unmutated. Only the irreversible multidigraph upgrade is refused;
+        # simple/digraph merges of a pre-profile graph proceed normally.
+        # Only the OVERWRITTEN file (current) is protected: an in-place
+        # multidigraph upgrade of a pre-profile current graph is irreversible.
+        # `other` is read-only (merged in, never rewritten), so its pre-profile
+        # status implies no unreconstructable in-place loss and must not block.
+        if target_type == "multidigraph":
+            try:
+                refuse_pre_profile_upgrade(current_data, target_type)
+            except GlobalGraphRecoveryError as exc:
+                print(f"[graphify merge-driver] {exc}", file=sys.stderr)
+                sys.exit(1)
+            if detect_pre_profile(current_data):
+                # Defensive: detect_pre_profile is the same predicate
+                # refuse_pre_profile_upgrade uses, so the branch above
+                # already exited; this guards against future divergence.
+                print(
+                    f"[graphify merge-driver] refusing to upgrade pre-profile graph "
+                    f"{_current_path} to multidigraph",
+                    file=sys.stderr,
+                )
+                sys.exit(1)
+
+        # KEY-AWARE compose, mirroring graphify.global_graph.global_add: start
+        # from the normalized current graph and replay the normalized other
+        # graph. For a multidigraph target iterate ``edges(keys=True)`` and
+        # ``add_edge(u, v, key=key, ...)`` so parallel edges survive distinctly
+        # AND a repeated merge of the same inputs overwrites the same
+        # ``(u, v, key)`` slots instead of accumulating fresh auto-int keys —
+        # that keyless drift is exactly what makes a naive ``nx.compose`` merge
+        # non-idempotent. Simple/digraph targets keep one edge per pair.
+        merged_graph = current_norm
+        for _node, _ndata in other_norm.nodes(data=True):
+            merged_graph.add_node(_node, **_ndata)
+        if isinstance(other_norm, (_nx.MultiGraph, _nx.MultiDiGraph)):
+            for _u, _v, _key, _edata in other_norm.edges(keys=True, data=True):
+                merged_graph.add_edge(_u, _v, key=_key, **_edata)
+        else:
+            for _u, _v, _edata in other_norm.edges(data=True):
+                merged_graph.add_edge(_u, _v, **_edata)
+
         if merged_graph.number_of_nodes() > _MERGE_MAX_NODES:
             print(
                 f"[graphify merge-driver] merged graph has {merged_graph.number_of_nodes()} nodes, "
@@ -2883,29 +2993,50 @@ def _load_graph(p: str):
                 file=sys.stderr,
             )
             sys.exit(1)
-        try:
-            out_data = _jg.node_link_data(merged_graph, edges="links")
-        except TypeError:
-            out_data = _jg.node_link_data(merged_graph)
-        Path(_current_path).write_text(json.dumps(out_data, indent=2), encoding="utf-8")
+
+        # Back up the existing target before the irreversible overwrite, then
+        # write via ``to_json`` so the graphify_profile (graph_type) is persisted
+        # and round-trips on the next load. ``force=True`` because a merge
+        # legitimately overwrites ``current`` (the shrink-guard would otherwise
+        # block a merge that drops nodes via dedup); ``communities={}`` because a
+        # merged graph is not (re)clustered here.
+        from graphify.export import to_json as _to_json
+
+        _backup_merge_target(Path(_current_path))
+        _to_json(merged_graph, {}, _current_path, force=True)
         sys.exit(0)
 
     elif cmd == "merge-graphs":
         # graphify merge-graphs graph1.json graph2.json ... --out merged.json
+        #   [--multigraph | --simple]
+        # Optional target flag controls the merged graph class. By DEFAULT the
+        # target is INFERRED (multidigraph if any input is multi; else digraph if
+        # any directed; else simple) so multigraph inputs never silently
+        # collapse. --simple forces a simple projection (the normalizer warns and
+        # collapses parallel edges — an explicit, audible choice); --multigraph
+        # forces a keyed multidigraph.
         args = sys.argv[2:]
         graph_paths: list[Path] = []
         out_path = Path(_GRAPHIFY_OUT) / "merged-graph.json"
+        explicit_target: str | None = None
         i = 0
         while i < len(args):
             if args[i] == "--out" and i + 1 < len(args):
                 out_path = Path(args[i + 1])
                 i += 2
+            elif args[i] == "--multigraph":
+                explicit_target = "multidigraph"
+                i += 1
+            elif args[i] == "--simple":
+                explicit_target = "simple"
+                i += 1
             else:
                 graph_paths.append(Path(args[i]))
                 i += 1
         if len(graph_paths) < 2:
             print(
-                "Usage: graphify merge-graphs <graph1.json> <graph2.json> [...] [--out merged.json]",
+                "Usage: graphify merge-graphs <graph1.json> <graph2.json> [...] "
+                "[--out merged.json] [--multigraph | --simple]",
                 file=sys.stderr,
             )
             sys.exit(1)
@@ -2929,19 +3060,57 @@ def _load_graph(p: str):
             except TypeError:
                 input_graph = _jg.node_link_graph(data)
             loaded_graphs.append(input_graph)
-        merged_graph = _nx.Graph()
+        # Prefix every input for cross-repo isolation FIRST (relabel preserves
+        # multigraph keys), then normalize the whole batch to one common class
+        # via the Phase A helper. Replacing the old hard-coded ``_nx.Graph()``
+        # start removes the silent collapse: the resolved class is multidigraph
+        # if any input is multi (unless --simple is given, which warns + projects).
+        from graphify.global_graph import normalize_graphs_for_global
+
+        prefixed_graphs = []
         for input_graph, gp in zip(loaded_graphs, graph_paths):
             repo_tag = gp.parent.parent.name  # graphify-out/../ → repo dir name
-            prefixed = _prefix(input_graph, repo_tag)
-            merged_graph = _nx.compose(merged_graph, prefixed)
+            prefixed_graphs.append(_prefix(input_graph, repo_tag))
+
         try:
-            out_data = _jg.node_link_data(merged_graph, edges="links")
-        except TypeError:
-            out_data = _jg.node_link_data(merged_graph)
+            normalized_graphs, target_type = normalize_graphs_for_global(
+                prefixed_graphs, target_type=explicit_target
+            )
+        except Exception as exc:
+            print(f"[graphify merge-graphs] error normalizing graphs: {exc}", file=sys.stderr)
+            sys.exit(1)
+
+        # KEY-AWARE compose into the resolved class (mirrors global_add). For a
+        # multidigraph target replay ``edges(keys=True)`` so parallel edges keep
+        # distinct keys and a repeated merge of the same inputs overwrites the
+        # same ``(u, v, key)`` slots (idempotent) instead of drifting on fresh
+        # auto-int keys. Prefixing already isolates repos, so cross-graph node
+        # collisions cannot occur; same-class composition is safe.
+        target_cls = type(normalized_graphs[0]) if normalized_graphs else _nx.Graph
+        merged_graph = target_cls()
+        for ng in normalized_graphs:
+            merged_graph.graph.update(ng.graph)
+            for _node, _ndata in ng.nodes(data=True):
+                merged_graph.add_node(_node, **_ndata)
+            if isinstance(ng, (_nx.MultiGraph, _nx.MultiDiGraph)):
+                for _u, _v, _key, _edata in ng.edges(keys=True, data=True):
+                    merged_graph.add_edge(_u, _v, key=_key, **_edata)
+            else:
+                for _u, _v, _edata in ng.edges(data=True):
+                    merged_graph.add_edge(_u, _v, **_edata)
+
+        # Back up an existing target before overwrite, then write via ``to_json``
+        # so the graphify_profile (graph_type=target_type) persists and the class
+        # round-trips on reload. ``force=True``: merge-graphs deliberately
+        # (re)writes the merged output; ``communities={}``: not clustered here.
+        from graphify.export import to_json as _to_json
+
         out_path.parent.mkdir(parents=True, exist_ok=True)
-        out_path.write_text(json.dumps(out_data, indent=2), encoding="utf-8")
+        _backup_merge_target(out_path)
+        _to_json(merged_graph, {}, str(out_path), force=True)
         print(
-            f"Merged {len(loaded_graphs)} graphs -> {merged_graph.number_of_nodes()} nodes, {merged_graph.number_of_edges()} edges"
+            f"Merged {len(loaded_graphs)} graphs -> {merged_graph.number_of_nodes()} nodes, "
+            f"{merged_graph.number_of_edges()} edges ({target_type})"
         )
         print(f"Written to: {out_path}")
 
diff --git a/graphify/global_graph.py b/graphify/global_graph.py
index dde2cdaa6..99fea850c 100644
--- a/graphify/global_graph.py
+++ b/graphify/global_graph.py
@@ -1,17 +1,248 @@
 from __future__ import annotations
 import json
 import hashlib
+import shutil
 import sys
+import warnings
 from contextlib import suppress
-from datetime import datetime, timezone
+from datetime import date, datetime, timezone
 from pathlib import Path
 import networkx as nx
 from networkx.readwrite import json_graph as _jg
 
+from graphify.graph_loader import GRAPHIFY_PROFILE_KEY
+from graphify.projections import normalize_to_multidigraph
+
 _GLOBAL_DIR = Path.home() / ".graphify"
 _GLOBAL_GRAPH = _GLOBAL_DIR / "global-graph.json"
 _GLOBAL_MANIFEST = _GLOBAL_DIR / "global-manifest.json"
 
+# Graphify graph_type vocabulary (kept byte-identical to graph_loader /
+# export so the global graph profile round-trips through node_link_data).
+_GRAPH_TYPE_SIMPLE = "simple"
+_GRAPH_TYPE_DIGRAPH = "digraph"
+_GRAPH_TYPE_MULTIDIGRAPH = "multidigraph"
+_GRAPH_TYPES = frozenset({_GRAPH_TYPE_SIMPLE, _GRAPH_TYPE_DIGRAPH, _GRAPH_TYPE_MULTIDIGRAPH})
+
+
+def _graph_type_for_instance(G: nx.Graph) -> str:
+    """Return the graphify ``graph_type`` token for a live NetworkX instance.
+
+    The instance is authoritative: classify from ``is_multigraph()`` /
+    ``is_directed()`` rather than from any stored profile. Mirrors
+    :func:`graphify.export._graph_type_for_instance` and the loader's
+    :func:`~graphify.graph_loader._set_graph_profile` vocabulary so a
+    save/load round-trip is stable.
+    """
+    if G.is_multigraph():
+        return _GRAPH_TYPE_MULTIDIGRAPH
+    if G.is_directed():
+        return _GRAPH_TYPE_DIGRAPH
+    return _GRAPH_TYPE_SIMPLE
+
+
+def _graph_class_for_type(graph_type: str) -> type[nx.Graph]:
+    """Map a graphify ``graph_type`` token to the NetworkX class that realizes it."""
+    if graph_type == _GRAPH_TYPE_MULTIDIGRAPH:
+        return nx.MultiDiGraph
+    if graph_type == _GRAPH_TYPE_DIGRAPH:
+        return nx.DiGraph
+    return nx.Graph
+
+
+def _project_to_class(G: nx.Graph, graph_type: str) -> nx.Graph:
+    """Return a copy of *G* realized as the NetworkX class for *graph_type*.
+
+    Multigraph targets reuse :func:`normalize_to_multidigraph` so parallel
+    keys survive. Simple/digraph targets rebuild the skeleton and replay
+    edges with keyless ``add_edge``; when *G* is itself a multigraph this is
+    an intentional, caller-warned collapse (parallel edges fold onto one
+    ``(u, v)`` pair). Already-correct classes are still copied so callers can
+    mutate the result without aliasing the input.
+    """
+    if graph_type == _GRAPH_TYPE_MULTIDIGRAPH:
+        return normalize_to_multidigraph(G)
+    target_cls = _graph_class_for_type(graph_type)
+    H = target_cls()
+    H.graph.update(G.graph)
+    H.add_nodes_from((node, attrs.copy()) for node, attrs in G.nodes(data=True))
+    if isinstance(G, (nx.MultiGraph, nx.MultiDiGraph)):
+        for u, v, _key, data in G.edges(keys=True, data=True):
+            H.add_edge(u, v, **data)
+    else:
+        for u, v, data in G.edges(data=True):
+            H.add_edge(u, v, **data)
+    return H
+
+
+def _infer_target_type(graphs: list[nx.Graph]) -> str:
+    """Infer the composition target type from a list of graphs.
+
+    Multidigraph if ANY input is a multigraph; else digraph if ANY input is
+    directed; else simple. This is the no-explicit-target precedence the
+    global compose and the merge driver both rely on.
+    """
+    if any(G.is_multigraph() for G in graphs):
+        return _GRAPH_TYPE_MULTIDIGRAPH
+    if any(G.is_directed() for G in graphs):
+        return _GRAPH_TYPE_DIGRAPH
+    return _GRAPH_TYPE_SIMPLE
+
+
+def normalize_graphs_for_global(
+    graphs: list[nx.Graph], *, target_type: str | None = None
+) -> tuple[list[nx.Graph], str]:
+    """Normalize a list of graphs to one common class for global composition.
+
+    Reusable by both :func:`global_add` and the ``__main__`` merge driver /
+    merge-graphs path so class normalization lives in exactly one place.
+
+    - When *target_type* is ``None`` it is inferred via :func:`_infer_target_type`
+      (multidigraph if any input is multi; else digraph if any directed; else
+      simple). An inferred multidigraph target never loses data.
+    - When *target_type* is an EXPLICIT ``"simple"`` / ``"digraph"`` and any
+      input is a multigraph, that input is projected down to the simple class
+      with an explicit :func:`warnings.warn` + stderr WARNING — graphify never
+      silently collapses multigraph input without an explicit simple target.
+    - Returns ``(normalized_graphs, resolved_target_type)`` where every graph
+      is the same class and ``resolved_target_type`` is in the graph_type
+      vocabulary.
+
+    Raises:
+        ValueError - *target_type* is not a recognized graph_type token.
+    """
+    if target_type is not None and target_type not in _GRAPH_TYPES:
+        raise ValueError(
+            f"target_type must be one of {sorted(_GRAPH_TYPES)}, got {target_type!r}"
+        )
+
+    explicit = target_type is not None
+    resolved = target_type if explicit else _infer_target_type(graphs)
+
+    if explicit and resolved != _GRAPH_TYPE_MULTIDIGRAPH:
+        # Down-projecting an explicit simple/digraph target: warn loudly for
+        # every multigraph input whose parallel edges are about to collapse.
+        for G in graphs:
+            if G.is_multigraph():
+                msg = (
+                    f"global compose: projecting multigraph input "
+                    f"({G.number_of_edges()} edges) to '{resolved}' target — "
+                    f"parallel edges will be collapsed onto single (u, v) pairs. "
+                    f"Omit the explicit simple/digraph target to preserve them."
+                )
+                warnings.warn(msg, stacklevel=2)
+                print(f"[graphify global] WARNING: {msg}", file=sys.stderr)
+
+    normalized = [_project_to_class(G, resolved) for G in graphs]
+    return normalized, resolved
+
+
+def detect_pre_profile(data: object) -> bool:
+    """Return True when a global-graph JSON dict predates profile/class metadata.
+
+    A "pre-profile" graph JSON LACKS ``graphify_profile`` (at the top level and
+    nested under ``"graph"``) AND lacks BOTH explicit ``multigraph`` / ``directed``
+    flags. Such a file was written before class normalization existed, so it may
+    already be a silently-collapsed simple graph whose lost parallel edges cannot
+    be reconstructed. The presence of ANY of those four markers means the writer
+    knew the graph class, so it is NOT pre-profile.
+    """
+    if not isinstance(data, dict):
+        return False
+    if GRAPHIFY_PROFILE_KEY in data:
+        return False
+    nested = data.get("graph")
+    if isinstance(nested, dict) and GRAPHIFY_PROFILE_KEY in nested:
+        return False
+    if "multigraph" in data or "directed" in data:
+        return False
+    return True
+
+
+class GlobalGraphRecoveryError(RuntimeError):
+    """Raised when a global operation would irreversibly upgrade a pre-profile graph."""
+
+
+def refuse_pre_profile_upgrade(
+    data: dict,
+    target_type: str,
+    *,
+    backup_hint: Path | None = None,
+) -> None:
+    """Refuse to upgrade a pre-profile global graph to multigraph.
+
+    Reusable guard (callable by the merge driver) that enforces the recovery
+    policy: a pre-profile global graph (see :func:`detect_pre_profile`) may
+    already be collapsed, so "upgrading" it to a multidigraph target would
+    fabricate a keyed graph from data that can no longer carry the lost
+    parallel edges. In that case raise :class:`GlobalGraphRecoveryError` with a
+    clear recovery message pointing at the backup and the rebuild-from-source
+    path (``global remove`` + ``global add``).
+
+    Simple-in -> simple-out (or digraph) operation on a pre-profile graph is
+    NOT refused — only an upgrade to ``multidigraph`` is irreversible.
+    """
+    if target_type != _GRAPH_TYPE_MULTIDIGRAPH:
+        return
+    if not detect_pre_profile(data):
+        return
+    backup_line = (
+        f" A pre-overwrite backup was saved at {backup_hint}."
+        if backup_hint is not None
+        else " Check ~/.graphify for a dated .bak snapshot of the previous graph."
+    )
+    raise GlobalGraphRecoveryError(
+        "refusing to upgrade a pre-profile global graph to a multidigraph: the "
+        "existing global-graph.json has no graphify_profile or multigraph/directed "
+        "flags, so it predates class tracking and may already have collapsed "
+        "parallel edges that cannot be reconstructed by upgrading in place."
+        + backup_line
+        + " To rebuild safely, remove the affected repos and re-add them from source "
+        "(`graphify global remove <tag>` then `graphify global add`), which "
+        "regenerates keyed parallel edges from the per-repo graph.json."
+    )
+
+
+def backup_global_graph() -> Path | None:
+    """Snapshot the existing global-graph.json to a dated ``.bak`` before overwrite.
+
+    Mirrors :func:`graphify.export.backup_if_protected`'s dated-snapshot pattern,
+    adapted for the single global-graph.json file: the backup is written next to
+    it as ``global-graph.<YYYY-MM-DD>.bak``. Idempotent within a day — if today's
+    backup already holds byte-identical content the copy is skipped; if the live
+    graph changed since the last backup today the snapshot is refreshed in place
+    (one backup per day, always the latest pre-overwrite state).
+
+    Returns the backup path, or None when there is nothing to back up (no
+    existing global graph) or backup is disabled via ``GRAPHIFY_NO_BACKUP``.
+    Never raises — a backup failure prints a warning and returns None so it can
+    never block the write it protects.
+    """
+    import os
+
+    if os.environ.get("GRAPHIFY_NO_BACKUP"):
+        return None
+    if not _GLOBAL_GRAPH.exists():
+        return None
+
+    today = date.today().isoformat()
+    backup_path = _GLOBAL_GRAPH.with_name(f"{_GLOBAL_GRAPH.stem}.{today}.bak")
+    try:
+        if backup_path.exists():
+            src_hash = hashlib.sha256(_GLOBAL_GRAPH.read_bytes()).hexdigest()
+            bak_hash = hashlib.sha256(backup_path.read_bytes()).hexdigest()
+            if src_hash == bak_hash:
+                return backup_path  # identical content, nothing to do
+        _GLOBAL_DIR.mkdir(parents=True, exist_ok=True)
+        shutil.copy2(_GLOBAL_GRAPH, backup_path)
+        return backup_path
+    except Exception as exc:
+        print(
+            f"[graphify global] warning: backup failed ({exc}) — continuing with overwrite",
+            file=sys.stderr,
+        )
+        return None
+
 
 def _load_manifest() -> dict:
     if _GLOBAL_MANIFEST.exists():
@@ -25,27 +256,69 @@ def _save_manifest(manifest: dict) -> None:
     _GLOBAL_MANIFEST.write_text(json.dumps(manifest, indent=2), encoding="utf-8")
 
 
-def _load_global_graph() -> nx.Graph:
-    if _GLOBAL_GRAPH.exists():
-        from graphify.security import check_graph_file_size_cap
+def _read_global_graph_data() -> dict | None:
+    """Return the raw global-graph.json dict (size-capped), or None if absent.
+
+    Reads the on-disk JSON WITHOUT rebuilding the NetworkX graph so callers can
+    inspect pre-profile markers (:func:`detect_pre_profile`) before deciding
+    whether an operation is safe. The ``edges``->``links`` alias is normalized
+    so downstream node_link_graph rehydration is consistent with
+    :func:`_load_global_graph`.
+    """
+    if not _GLOBAL_GRAPH.exists():
+        return None
+    from graphify.security import check_graph_file_size_cap
+
+    check_graph_file_size_cap(_GLOBAL_GRAPH)
+    data = json.loads(_GLOBAL_GRAPH.read_text(encoding="utf-8"))
+    if "links" not in data and "edges" in data:
+        data = dict(data, links=data["edges"])
+    return data
 
-        check_graph_file_size_cap(_GLOBAL_GRAPH)
-        data = json.loads(_GLOBAL_GRAPH.read_text(encoding="utf-8"))
-        if "links" not in data and "edges" in data:
-            data = dict(data, links=data["edges"])
+
+def _load_global_graph() -> nx.Graph:
+    data = _read_global_graph_data()
+    if data is not None:
         try:
-            return _jg.node_link_graph(data, edges="links")
+            G = _jg.node_link_graph(data, edges="links")
         except TypeError:
-            return _jg.node_link_graph(data)
+            G = _jg.node_link_graph(data)
+        # Surface the persisted profile (if any) on G.graph and reconcile its
+        # graph_type with the live instance so a later save round-trips stably.
+        _stamp_global_profile(G)
+        return G
     return nx.Graph()
 
 
+def _stamp_global_profile(G: nx.Graph) -> None:
+    """Stamp ``G.graph[GRAPHIFY_PROFILE_KEY]`` with the instance graph_type.
+
+    Existing profile fields are preserved; ``graph_type`` is always overwritten
+    to match the live instance (the instance is authoritative), mirroring
+    :func:`graphify.export._ensure_graph_profile`. This guarantees the global
+    graph JSON always carries an accurate, round-trippable profile.
+    """
+    existing = G.graph.get(GRAPHIFY_PROFILE_KEY)
+    profile = dict(existing) if isinstance(existing, dict) else {}
+    profile["graph_type"] = _graph_type_for_instance(G)
+    G.graph[GRAPHIFY_PROFILE_KEY] = profile
+
+
 def _save_global_graph(G: nx.Graph) -> None:
     _GLOBAL_DIR.mkdir(parents=True, exist_ok=True)
+    _stamp_global_profile(G)
     try:
         data = _jg.node_link_data(G, edges="links")
     except TypeError:
         data = _jg.node_link_data(G)
+    # Defensively guarantee the profile is present under data["graph"] even if a
+    # backend did not surface G.graph (node_link_data normally does).
+    graph_meta = data.get("graph")
+    if not isinstance(graph_meta, dict):
+        graph_meta = {}
+        data["graph"] = graph_meta
+    if GRAPHIFY_PROFILE_KEY not in graph_meta:
+        graph_meta[GRAPHIFY_PROFILE_KEY] = dict(G.graph[GRAPHIFY_PROFILE_KEY])
     _GLOBAL_GRAPH.write_text(json.dumps(data, indent=2), encoding="utf-8")
 
 
@@ -93,13 +366,42 @@ def global_add(source_path: Path, repo_tag: str) -> dict:
     except TypeError:
         src_G = _jg.node_link_graph(data)
 
-    # Prefix IDs for cross-project isolation
+    # Prefix IDs for cross-project isolation (relabel_nodes preserves multigraph
+    # keys, so a MultiDiGraph source keeps its parallel edges through this step).
     prefixed = prefix_graph_for_global(src_G, repo_tag)
 
-    # Load global graph and prune stale nodes for this repo
+    # Inspect the on-disk global graph BEFORE rehydrating, so the recovery
+    # policy can see whether it is a pre-profile file that may already have
+    # collapsed parallel edges.
+    existing_data = _read_global_graph_data()
+
+    # Load global graph and prune stale nodes for this repo. Pruning happens on
+    # the loaded class; the surviving (other-repo) subgraph is what we compose
+    # the incoming repo into.
     G = _load_global_graph()
     removed = prune_repo_from_graph(G, repo_tag)
 
+    # Resolve the composition target class: multidigraph if EITHER the existing
+    # global graph OR the incoming source is multi; else digraph if either is
+    # directed; else simple. Inferred (target_type=None) never silently
+    # collapses — a simple+multi mix upgrades to multidigraph, which is exactly
+    # the go/no-go gate (no class-mismatch crash, no silent collapse).
+    target_type = _infer_target_type([G, prefixed])
+
+    # Recovery refusal: if composing would UPGRADE a pre-profile global graph to
+    # multidigraph (lost parallel edges unreconstructable), back up first so the
+    # refusal can point at the snapshot, then refuse without mutating the file.
+    if existing_data is not None and target_type == _GRAPH_TYPE_MULTIDIGRAPH:
+        if detect_pre_profile(existing_data):
+            backup_hint = backup_global_graph()
+            refuse_pre_profile_upgrade(existing_data, target_type, backup_hint=backup_hint)
+
+    # Normalize the surviving global graph and the prefixed source to the common
+    # target class. normalize_graphs_for_global returns them in the same order.
+    (G, prefixed), target_type = normalize_graphs_for_global(
+        [G, prefixed], target_type=target_type
+    )
+
     # Merge external-library nodes (no source_file) by label to avoid duplication
     external_labels = {
         d.get("label", ""): n
@@ -107,19 +409,34 @@ def global_add(source_path: Path, repo_tag: str) -> dict:
         if not d.get("source_file") and d.get("label")
     }
     nodes_to_skip = set()
-    for node, data in prefixed.nodes(data=True):
-        if not data.get("source_file") and data.get("label") in external_labels:
+    for node, ndata in prefixed.nodes(data=True):
+        if not ndata.get("source_file") and ndata.get("label") in external_labels:
             nodes_to_skip.add(node)
 
     # Compose: add prefixed nodes (except deduplicated externals) into global graph
-    for node, data in prefixed.nodes(data=True):
+    for node, ndata in prefixed.nodes(data=True):
         if node not in nodes_to_skip:
-            G.add_node(node, **data)
-    for u, v, data in prefixed.edges(data=True):
-        if u not in nodes_to_skip and v not in nodes_to_skip:
-            G.add_edge(u, v, **data)
+            G.add_node(node, **ndata)
+    # KEY-AWARE edge compose. For a multigraph target, iterate keys=True and
+    # replay G.add_edge(u, v, key=key, ...) so parallel edges are preserved
+    # distinctly AND re-adding the same (pruned-then-readded) repo overwrites the
+    # same (u, v, key) slots instead of accumulating fresh auto-int keys — that
+    # keyless drift is the bug this fixes. Simple/digraph targets keep the
+    # historical keyless behavior (one edge per pair), unchanged byte-for-byte.
+    if isinstance(prefixed, (nx.MultiGraph, nx.MultiDiGraph)):
+        for u, v, key, edata in prefixed.edges(keys=True, data=True):
+            if u not in nodes_to_skip and v not in nodes_to_skip:
+                G.add_edge(u, v, key=key, **edata)
+    else:
+        for u, v, edata in prefixed.edges(data=True):
+            if u not in nodes_to_skip and v not in nodes_to_skip:
+                G.add_edge(u, v, **edata)
 
     added = prefixed.number_of_nodes() - len(nodes_to_skip)
+
+    # Back up the existing global graph before the irreversible overwrite. Cheap
+    # and idempotent within a day; no-op when there is nothing to back up.
+    backup_global_graph()
     _save_global_graph(G)
 
     manifest["repos"][repo_tag] = {
@@ -128,6 +445,7 @@ def global_add(source_path: Path, repo_tag: str) -> dict:
         "node_count": added,
         "edge_count": prefixed.number_of_edges(),
         "source_hash": src_hash,
+        "graph_type": target_type,
     }
     _save_manifest(manifest)
 
diff --git a/tests/test_global_graph.py b/tests/test_global_graph.py
index c09b457f1..2f273157a 100644
--- a/tests/test_global_graph.py
+++ b/tests/test_global_graph.py
@@ -5,10 +5,14 @@
 from __future__ import annotations
 
 import json
+from contextlib import contextmanager
+
 import pytest
 import networkx as nx
 from unittest.mock import patch
 
+import graphify.__main__ as mainmod
+
 
 # ── helpers ──────────────────────────────────────────────────────────────────
 
@@ -38,6 +42,45 @@ def _graph_to_json(G, path):
     path.write_text(json.dumps(data), encoding="utf-8")
 
 
+def _make_multidigraph(nodes, edges):
+    """Build an nx.MultiDiGraph from node dicts and keyed edge dicts.
+
+    Each edge dict must carry a ``key`` so parallel edges between the same
+    (source, target) survive the build and the node_link round-trip.
+    """
+    G = nx.MultiDiGraph()
+    for n in nodes:
+        nid = n["id"]
+        G.add_node(nid, **{k: v for k, v in n.items() if k != "id"})
+    for e in edges:
+        G.add_edge(
+            e["source"],
+            e["target"],
+            key=e["key"],
+            **{k: v for k, v in e.items() if k not in ("source", "target", "key")},
+        )
+    return G
+
+
+@contextmanager
+def _patch_global(global_dir):
+    """Single context manager that points global_graph at a temp dir.
+
+    Patches ``_GLOBAL_DIR`` / ``_GLOBAL_GRAPH`` / ``_GLOBAL_MANIFEST`` for the
+    duration of the ``with`` block, mirroring the inline triple-patch the older
+    tests use, so the PR 8 tests can ``with _patch_global(tmp / ".graphify"):``.
+    """
+    with (
+        patch("graphify.global_graph._GLOBAL_DIR", global_dir),
+        patch("graphify.global_graph._GLOBAL_GRAPH", global_dir / "global-graph.json"),
+        patch(
+            "graphify.global_graph._GLOBAL_MANIFEST",
+            global_dir / "global-manifest.json",
+        ),
+    ):
+        yield
+
+
 # ── build.py helpers ──────────────────────────────────────────────────────────
 
 
@@ -335,3 +378,879 @@ def test_global_add_rejects_oversized_source_graph(monkeypatch, tmp_path):
 
         with pytest.raises(ValueError, match="exceeds"):
             global_add(src_graph, "repoA")
+
+
+# ── PR 8: keyed/class-normalized composition, recovery, backup ─────────────────
+
+
+def _PARALLEL_NODES():
+    return [
+        {"id": "a", "label": "A", "source_file": "src/a.py"},
+        {"id": "b", "label": "B", "source_file": "src/b.py"},
+    ]
+
+
+def _PARALLEL_EDGES():
+    return [
+        {"source": "a", "target": "b", "key": "calls:L1", "relation": "calls"},
+        {"source": "a", "target": "b", "key": "imports:L2", "relation": "imports"},
+    ]
+
+
+def test_global_add_multidigraph_preserves_parallel_edges(tmp_path):
+    """A MultiDiGraph source with parallel edges keeps every keyed edge in the
+    global graph, which reloads as a MultiDiGraph (no keyless collapse)."""
+    src_graph = tmp_path / "graph.json"
+    M = _make_multidigraph(_PARALLEL_NODES(), _PARALLEL_EDGES())
+    _graph_to_json(M, src_graph)
+
+    global_dir = tmp_path / ".graphify"
+    with _patch_global(global_dir):
+        from graphify.global_graph import global_add, _load_global_graph
+
+        result = global_add(src_graph, "repoA")
+        G = _load_global_graph()
+
+    assert result["skipped"] is False
+    assert isinstance(G, nx.MultiDiGraph)
+    assert G.number_of_edges("repoA::a", "repoA::b") == 2
+    assert sorted(G["repoA::a"]["repoA::b"].keys()) == ["calls:L1", "imports:L2"]
+    assert G.graph["graphify_profile"]["graph_type"] == "multidigraph"
+
+
+def test_global_add_multidigraph_idempotent_under_repeat(tmp_path):
+    """THE PR-7-lesson test: running global_add of the SAME multigraph repo 3
+    times keeps the parallel-edge count, edge keys, and stored profile identical
+    after every run — no duplication, no drift, no re-collapse.
+
+    To prove the KEYED COMPOSE itself is idempotent (not merely the hash-skip
+    short-circuit), the second repo's source is mutated to a fresh hash on every
+    iteration while the FIRST repo (repoA) survives the prune and is re-composed
+    through the keyed edge path each time. repoA's parallel edges must stay
+    rock-stable across all three forced re-composes."""
+    repo_a_src = tmp_path / "a.json"
+    M_a = _make_multidigraph(_PARALLEL_NODES(), _PARALLEL_EDGES())
+    _graph_to_json(M_a, repo_a_src)
+
+    global_dir = tmp_path / ".graphify"
+    with _patch_global(global_dir):
+        from graphify.global_graph import global_add, _load_global_graph
+
+        global_add(repo_a_src, "repoA")
+
+        observed = []
+        for i in range(3):
+            # Distinct second repo each iteration → forces a real re-compose that
+            # re-runs the keyed edge loop over the surviving repoA subgraph.
+            churn_src = tmp_path / f"churn_{i}.json"
+            M_b = _make_multidigraph(
+                [
+                    {"id": f"c{i}", "label": f"C{i}", "source_file": "src/c.py"},
+                    {"id": f"d{i}", "label": f"D{i}", "source_file": "src/d.py"},
+                ],
+                [
+                    {"source": f"c{i}", "target": f"d{i}", "key": "j1", "relation": "calls"},
+                    {"source": f"c{i}", "target": f"d{i}", "key": "j2", "relation": "uses"},
+                ],
+            )
+            _graph_to_json(M_b, churn_src)
+            global_add(churn_src, "repoB")
+
+            G = _load_global_graph()
+            observed.append(
+                (
+                    G.number_of_edges("repoA::a", "repoA::b"),
+                    tuple(sorted(G["repoA::a"]["repoA::b"].keys())),
+                    G.graph["graphify_profile"]["graph_type"],
+                )
+            )
+
+    # IDEMPOTENCE ASSERTION: parallel-edge count (2), keys, and profile identical
+    # after each of the three repeated global_add re-composes — no drift.
+    assert observed == [
+        (2, ("calls:L1", "imports:L2"), "multidigraph"),
+        (2, ("calls:L1", "imports:L2"), "multidigraph"),
+        (2, ("calls:L1", "imports:L2"), "multidigraph"),
+    ]
+
+
+def test_global_add_mixed_simple_and_multi_no_crash(tmp_path):
+    """One simple repo + one multi repo must not crash through a NetworkX class
+    mismatch; the global target upgrades to multidigraph and both repos' edges
+    are present (the multi repo keyed)."""
+    simple_src = tmp_path / "simple.json"
+    S = _make_graph(
+        [
+            {"id": "x", "label": "X", "source_file": "src/x.py"},
+            {"id": "y", "label": "Y", "source_file": "src/y.py"},
+        ],
+        [{"source": "x", "target": "y", "relation": "calls"}],
+    )
+    _graph_to_json(S, simple_src)
+
+    multi_src = tmp_path / "multi.json"
+    M = _make_multidigraph(_PARALLEL_NODES(), _PARALLEL_EDGES())
+    _graph_to_json(M, multi_src)
+
+    global_dir = tmp_path / ".graphify"
+    with _patch_global(global_dir):
+        from graphify.global_graph import global_add, _load_global_graph
+
+        global_add(simple_src, "repoSimple")
+        # Composing a multi repo into the existing simple global graph must not
+        # raise "All graphs must be directed or undirected."
+        global_add(multi_src, "repoMulti")
+        G = _load_global_graph()
+
+    assert isinstance(G, nx.MultiDiGraph)
+    assert G.graph["graphify_profile"]["graph_type"] == "multidigraph"
+    # simple repo's single edge survives (folded into the multigraph)
+    assert G.has_edge("repoSimple::x", "repoSimple::y")
+    # multi repo's parallel edges survive distinctly, keyed
+    assert G.number_of_edges("repoMulti::a", "repoMulti::b") == 2
+    assert sorted(G["repoMulti::a"]["repoMulti::b"].keys()) == ["calls:L1", "imports:L2"]
+
+
+def test_global_add_simple_only_regression(tmp_path):
+    """Pure simple inputs produce a simple global graph whose output is unchanged
+    apart from the new graphify_profile metadata. Repeating twice is identical."""
+    g1 = tmp_path / "g1.json"
+    g2 = tmp_path / "g2.json"
+    _graph_to_json(
+        _make_graph(
+            [
+                {"id": "u", "label": "U", "source_file": "src/u.py"},
+                {"id": "v", "label": "V", "source_file": "src/v.py"},
+            ],
+            [{"source": "u", "target": "v", "relation": "calls"}],
+        ),
+        g1,
+    )
+    _graph_to_json(
+        _make_graph([{"id": "w", "label": "W", "source_file": "src/w.py"}]),
+        g2,
+    )
+
+    global_dir = tmp_path / ".graphify"
+    global_graph_path = global_dir / "global-graph.json"
+    with _patch_global(global_dir):
+        from graphify.global_graph import global_add, _load_global_graph
+
+        global_add(g1, "repoA")
+        global_add(g2, "repoB")
+        G = _load_global_graph()
+        first_bytes = global_graph_path.read_text(encoding="utf-8")
+
+        # Repeat the same two adds (hash-skip path) → byte-identical output.
+        global_add(g1, "repoA")
+        global_add(g2, "repoB")
+        second_bytes = global_graph_path.read_text(encoding="utf-8")
+
+    # Simple-only stays a simple Graph (not upgraded), profile is "simple".
+    assert isinstance(G, nx.Graph)
+    assert not G.is_multigraph()
+    assert not G.is_directed()
+    assert G.graph["graphify_profile"]["graph_type"] == "simple"
+    assert G.has_edge("repoA::u", "repoA::v")
+    assert "repoB::w" in G.nodes
+    # Byte-stable across repeated adds (idempotent simple output).
+    assert first_bytes == second_bytes
+
+
+def test_normalize_graphs_for_global_infers_target(recwarn):
+    """Mixed inputs infer multidigraph; an explicit simple target on a multi
+    input warns and projects the multigraph down to simple."""
+    from graphify.global_graph import normalize_graphs_for_global
+
+    simple = _make_graph(
+        [{"id": "x"}, {"id": "y"}], [{"source": "x", "target": "y"}]
+    )
+    multi = _make_multidigraph(
+        [{"id": "a"}, {"id": "b"}],
+        [
+            {"source": "a", "target": "b", "key": "k1"},
+            {"source": "a", "target": "b", "key": "k2"},
+        ],
+    )
+
+    # Inference: any multi input → multidigraph target, no warning, no collapse.
+    normalized, target = normalize_graphs_for_global([simple, multi])
+    assert target == "multidigraph"
+    assert all(isinstance(g, nx.MultiDiGraph) for g in normalized)
+    assert normalized[1].number_of_edges("a", "b") == 2
+    assert len(recwarn.list) == 0
+
+    # Explicit simple target with a multi input → WARNING + projection to simple.
+    with pytest.warns(UserWarning, match="collaps"):
+        normalized2, target2 = normalize_graphs_for_global(
+            [simple, multi], target_type="simple"
+        )
+    assert target2 == "simple"
+    assert all(type(g) is nx.Graph for g in normalized2)
+    # Parallel edges collapse to a single (a, b) pair on the simple projection.
+    assert normalized2[1].number_of_edges() == 1
+
+    # Unknown target token is rejected.
+    with pytest.raises(ValueError, match="target_type"):
+        normalize_graphs_for_global([simple], target_type="bogus")
+
+
+def test_detect_pre_profile_global_graph():
+    """A JSON without graphify_profile and without multigraph/directed flags is
+    detected as pre-profile; any of those markers clears the flag."""
+    from graphify.global_graph import detect_pre_profile
+
+    assert detect_pre_profile({"nodes": [{"id": "a"}], "links": []}) is True
+    # Top-level profile present → not pre-profile.
+    assert (
+        detect_pre_profile({"nodes": [], "links": [], "graphify_profile": {}}) is False
+    )
+    # Nested profile under "graph" → not pre-profile.
+    assert (
+        detect_pre_profile(
+            {"nodes": [], "links": [], "graph": {"graphify_profile": {"graph_type": "simple"}}}
+        )
+        is False
+    )
+    # Explicit class flags → writer knew the class → not pre-profile.
+    assert detect_pre_profile({"nodes": [], "links": [], "multigraph": False}) is False
+    assert detect_pre_profile({"nodes": [], "links": [], "directed": True}) is False
+    assert detect_pre_profile("not a dict") is False
+
+
+def test_pre_profile_upgrade_refused_with_recovery_message(tmp_path):
+    """Upgrading a pre-profile global graph to multidigraph refuses with a clear
+    recovery message and does NOT mutate/destroy the existing global-graph.json."""
+    from graphify.global_graph import (
+        GlobalGraphRecoveryError,
+        refuse_pre_profile_upgrade,
+    )
+
+    # Direct helper contract: pre-profile + multidigraph target → raises.
+    pre_profile = {"nodes": [{"id": "a"}], "links": []}
+    with pytest.raises(GlobalGraphRecoveryError, match="rebuild|remove|backup|pre-profile"):
+        refuse_pre_profile_upgrade(pre_profile, "multidigraph")
+    # Non-upgrade targets are allowed (no raise).
+    refuse_pre_profile_upgrade(pre_profile, "simple")
+    refuse_pre_profile_upgrade(pre_profile, "digraph")
+
+    # End-to-end through global_add: seed a pre-profile global graph (no profile,
+    # no flags), then add a multigraph repo → upgrade refused, file untouched.
+    global_dir = tmp_path / ".graphify"
+    global_dir.mkdir(parents=True)
+    global_graph_path = global_dir / "global-graph.json"
+    pre_profile_disk = {
+        "nodes": [{"id": "legacy::old", "repo": "legacy", "label": "Old"}],
+        "links": [],
+    }
+    original = json.dumps(pre_profile_disk, indent=2)
+    global_graph_path.write_text(original, encoding="utf-8")
+
+    multi_src = tmp_path / "multi.json"
+    _graph_to_json(_make_multidigraph(_PARALLEL_NODES(), _PARALLEL_EDGES()), multi_src)
+
+    with _patch_global(global_dir):
+        from graphify.global_graph import global_add
+
+        with pytest.raises(GlobalGraphRecoveryError):
+            global_add(multi_src, "repoMulti")
+
+    # The original pre-profile graph.json must be intact (not overwritten).
+    assert global_graph_path.read_text(encoding="utf-8") == original
+    # A recovery backup may have been taken alongside it; that is allowed.
+
+
+def test_global_add_backs_up_before_overwrite(tmp_path):
+    """A backup snapshot of the prior global-graph.json is created before an
+    overwrite, and the original content is recoverable from the backup."""
+    global_dir = tmp_path / ".graphify"
+
+    g1 = tmp_path / "g1.json"
+    _graph_to_json(
+        _make_graph([{"id": "u", "label": "U", "source_file": "src/u.py"}]), g1
+    )
+    g2 = tmp_path / "g2.json"
+    _graph_to_json(
+        _make_graph([{"id": "w", "label": "W", "source_file": "src/w.py"}]), g2
+    )
+
+    with _patch_global(global_dir):
+        from graphify.global_graph import global_add
+
+        global_add(g1, "repoA")
+        first = (global_dir / "global-graph.json").read_text(encoding="utf-8")
+        # Second add (different repo, different hash) overwrites → backup taken.
+        global_add(g2, "repoB")
+
+    backups = list(global_dir.glob("global-graph.*.bak"))
+    assert backups, "expected a dated .bak snapshot before overwrite"
+    # The backup holds the pre-overwrite (first-add) state, recoverable verbatim.
+    assert backups[0].read_text(encoding="utf-8") == first
+
+
+def test_backup_global_graph_idempotent(tmp_path):
+    """Repeated backup_global_graph() calls in the same run do not error and do
+    not corrupt the snapshot (one dated backup, byte-stable)."""
+    global_dir = tmp_path / ".graphify"
+    global_dir.mkdir(parents=True)
+    global_graph_path = global_dir / "global-graph.json"
+    content = json.dumps({"nodes": [{"id": "a"}], "links": []}, indent=2)
+    global_graph_path.write_text(content, encoding="utf-8")
+
+    with _patch_global(global_dir):
+        from graphify.global_graph import backup_global_graph
+
+        p1 = backup_global_graph()
+        p2 = backup_global_graph()
+        p3 = backup_global_graph()
+
+    assert p1 is not None
+    assert p1 == p2 == p3  # same dated backup path
+    assert p1.read_text(encoding="utf-8") == content
+    # Exactly one backup file (no proliferation across repeated calls).
+    assert len(list(global_dir.glob("global-graph.*.bak"))) == 1
+
+
+def test_backup_global_graph_none_when_absent(tmp_path):
+    """backup_global_graph() returns None when there is no global graph to back
+    up (nothing to snapshot, never raises)."""
+    global_dir = tmp_path / ".graphify"
+    with _patch_global(global_dir):
+        from graphify.global_graph import backup_global_graph
+
+        assert backup_global_graph() is None
+
+
+# ── merge-driver / merge-graphs class normalization (PR 8) ─────────────────────
+#
+# Both commands run in-process through ``graphify.__main__.main`` with argv
+# monkeypatched (env-isolated, mirroring test_extract_cli / test_query_cli). The
+# go/no-go gate for PR 8: mixed graph inputs never crash through a NetworkX class
+# mismatch AND never silently collapse multigraph input without an explicit
+# simple target. Merge is STATEFUL, so every path is also asserted under REPEATED
+# application (run 2-3×) to prove idempotence — no duplicated edges, no key drift,
+# no profile drift, no re-collapse.
+
+
+def _reload_graph(path):
+    """Rehydrate a graph.json written by a merge command (handles edges/links)."""
+    from networkx.readwrite import json_graph as jg
+
+    data = json.loads(path.read_text(encoding="utf-8"))
+    if "links" not in data and "edges" in data:
+        data = dict(data, links=data["edges"])
+    try:
+        return jg.node_link_graph(data, edges="links"), data
+    except TypeError:
+        return jg.node_link_graph(data), data
+
+
+def _edge_keys(G):
+    """Stable, comparable edge identity: keyed triples for multigraphs, else pairs."""
+    if G.is_multigraph():
+        return sorted((u, v, k) for u, v, k in G.edges(keys=True))
+    return sorted(G.edges())
+
+
+def _run_merge_driver(monkeypatch, base_p, current_p, other_p):
+    """Invoke `graphify merge-driver` in-process; return the exit code (0 on ok)."""
+    monkeypatch.setattr(mainmod, "_check_skill_version", lambda _: None)
+    monkeypatch.setattr(
+        mainmod.sys,
+        "argv",
+        ["graphify", "merge-driver", str(base_p), str(current_p), str(other_p)],
+    )
+    try:
+        mainmod.main()
+        return 0
+    except SystemExit as exc:
+        return exc.code if isinstance(exc.code, int) else 1
+
+
+def _run_merge_graphs(monkeypatch, paths, out_path, *flags):
+    """Invoke `graphify merge-graphs` in-process; return the exit code (0 on ok)."""
+    monkeypatch.setattr(mainmod, "_check_skill_version", lambda _: None)
+    argv = ["graphify", "merge-graphs", *[str(p) for p in paths], "--out", str(out_path), *flags]
+    monkeypatch.setattr(mainmod.sys, "argv", argv)
+    try:
+        mainmod.main()
+        return 0
+    except SystemExit as exc:
+        return exc.code if isinstance(exc.code, int) else 1
+
+
+def _repo_graph(root, repo, G):
+    """Write *G* to <root>/<repo>/graphify-out/graph.json (merge-graphs layout)."""
+    out_dir = root / repo / "graphify-out"
+    out_dir.mkdir(parents=True)
+    gp = out_dir / "graph.json"
+    _graph_to_json(G, gp)
+    return gp
+
+
+def test_merge_driver_mixed_classes_no_crash(monkeypatch, tmp_path):
+    """merge-driver: simple `current` + MultiDiGraph `other` must NOT crash through
+    a NetworkX class mismatch; the result is a keyed multidigraph that preserves
+    both sides' edges. This is the core go/no-go gate."""
+    current = _make_graph(
+        [
+            {"id": "a", "label": "A", "source_file": "a.py"},
+            {"id": "b", "label": "B", "source_file": "b.py"},
+        ],
+        [{"source": "a", "target": "b", "relation": "calls", "confidence": "EXTRACTED"}],
+    )
+    other = _make_multidigraph(
+        [
+            {"id": "a", "label": "A", "source_file": "a.py"},
+            {"id": "c", "label": "C", "source_file": "c.py"},
+        ],
+        [
+            {"source": "a", "target": "c", "key": 0, "relation": "imports"},
+            {"source": "a", "target": "c", "key": 1, "relation": "calls"},
+        ],
+    )
+    base_p = tmp_path / "base.json"
+    current_p = tmp_path / "current.json"
+    other_p = tmp_path / "other.json"
+    _graph_to_json(_make_graph([]), base_p)
+    _graph_to_json(current, current_p)
+    _graph_to_json(other, other_p)
+
+    code = _run_merge_driver(monkeypatch, base_p, current_p, other_p)
+    assert code == 0  # no class-mismatch crash → clean exit, not a surfaced conflict
+
+    merged, data = _reload_graph(current_p)
+    assert merged.is_multigraph()  # upgraded to the multi target, not collapsed
+    assert data["graph"]["graphify_profile"]["graph_type"] == "multidigraph"
+    # Both the simple edge and BOTH parallel multi edges survive.
+    assert ("a", "b", 0) in _edge_keys(merged)
+    assert ("a", "c", 0) in _edge_keys(merged)
+    assert ("a", "c", 1) in _edge_keys(merged)
+    assert merged.number_of_edges() == 3
+
+
+def test_merge_driver_idempotent_under_repeat(monkeypatch, tmp_path):
+    """STATEFUL idempotence: running merge-driver on the SAME inputs 3× must keep
+    the edge count, edge KEYS and stored profile identical every time — no
+    duplicated edges, no key drift, no re-collapse. The merge-driver writes back
+    to `current`, so each rerun re-loads its own multidigraph output as `current`;
+    the keyed compose must overwrite the same (u, v, key) slots, not accumulate."""
+    current = _make_graph(
+        [
+            {"id": "a", "label": "A", "source_file": "a.py"},
+            {"id": "b", "label": "B", "source_file": "b.py"},
+        ],
+        [{"source": "a", "target": "b", "relation": "calls", "confidence": "EXTRACTED"}],
+    )
+    other = _make_multidigraph(
+        [
+            {"id": "a", "label": "A", "source_file": "a.py"},
+            {"id": "c", "label": "C", "source_file": "c.py"},
+        ],
+        [
+            {"source": "a", "target": "c", "key": 0, "relation": "imports"},
+            {"source": "a", "target": "c", "key": 1, "relation": "calls"},
+        ],
+    )
+    base_p = tmp_path / "base.json"
+    current_p = tmp_path / "current.json"
+    other_p = tmp_path / "other.json"
+    _graph_to_json(_make_graph([]), base_p)
+    _graph_to_json(current, current_p)
+    _graph_to_json(other, other_p)
+
+    snapshots = []
+    for _ in range(3):
+        assert _run_merge_driver(monkeypatch, base_p, current_p, other_p) == 0
+        merged, data = _reload_graph(current_p)
+        snapshots.append(
+            (
+                merged.number_of_edges(),
+                _edge_keys(merged),
+                data["graph"]["graphify_profile"]["graph_type"],
+            )
+        )
+
+    # The exact stability assertion: edges + keys + profile identical across all 3 runs.
+    assert snapshots[0] == snapshots[1] == snapshots[2]
+    assert snapshots[0][0] == 3  # 1 simple + 2 parallel, never duplicated
+    assert snapshots[0][2] == "multidigraph"
+
+
+def test_merge_graphs_multidigraph_preserves_parallel_edges(monkeypatch, tmp_path):
+    """merge-graphs over a multigraph + a simple input keeps parallel edges with
+    distinct keys (resolved target inferred as multidigraph, not collapsed)."""
+    multi = _make_multidigraph(
+        [
+            {"id": "x", "label": "X", "source_file": "x.py"},
+            {"id": "y", "label": "Y", "source_file": "y.py"},
+        ],
+        [
+            {"source": "x", "target": "y", "key": 0, "relation": "calls"},
+            {"source": "x", "target": "y", "key": 1, "relation": "imports"},
+        ],
+    )
+    simple = _make_graph(
+        [
+            {"id": "z", "label": "Z", "source_file": "z.py"},
+            {"id": "w", "label": "W", "source_file": "w.py"},
+        ],
+        [{"source": "z", "target": "w", "relation": "uses"}],
+    )
+    g1 = _repo_graph(tmp_path, "repo1", multi)
+    g2 = _repo_graph(tmp_path, "repo2", simple)
+    out_p = tmp_path / "merged.json"
+
+    assert _run_merge_graphs(monkeypatch, [g1, g2], out_p) == 0
+    merged, data = _reload_graph(out_p)
+    assert merged.is_multigraph()
+    assert data["graph"]["graphify_profile"]["graph_type"] == "multidigraph"
+    # Both parallel edges survive (prefixed by repo tag), distinct keys retained.
+    keys = _edge_keys(merged)
+    assert ("repo1::x", "repo1::y", 0) in keys
+    assert ("repo1::x", "repo1::y", 1) in keys
+    assert merged.number_of_edges() == 3  # 2 parallel + 1 simple
+
+
+def test_merge_graphs_idempotent_under_repeat(monkeypatch, tmp_path):
+    """STATEFUL idempotence: the SAME merge-graphs run repeated 3× yields a stable
+    output — edge count, keys and profile unchanged (no duplicated parallel edges,
+    no key drift, no re-collapse). Inputs are read fresh each run; only the output
+    is overwritten, so stability proves the keyed compose is deterministic."""
+    multi = _make_multidigraph(
+        [
+            {"id": "x", "label": "X", "source_file": "x.py"},
+            {"id": "y", "label": "Y", "source_file": "y.py"},
+        ],
+        [
+            {"source": "x", "target": "y", "key": 0, "relation": "calls"},
+            {"source": "x", "target": "y", "key": 1, "relation": "imports"},
+        ],
+    )
+    simple = _make_graph(
+        [{"id": "z", "label": "Z", "source_file": "z.py"}],
+        [],
+    )
+    g1 = _repo_graph(tmp_path, "repo1", multi)
+    g2 = _repo_graph(tmp_path, "repo2", simple)
+    out_p = tmp_path / "merged.json"
+
+    snapshots = []
+    for _ in range(3):
+        assert _run_merge_graphs(monkeypatch, [g1, g2], out_p) == 0
+        merged, data = _reload_graph(out_p)
+        snapshots.append(
+            (
+                merged.number_of_edges(),
+                _edge_keys(merged),
+                data["graph"]["graphify_profile"]["graph_type"],
+            )
+        )
+    assert snapshots[0] == snapshots[1] == snapshots[2]
+    assert snapshots[0][0] == 2  # 2 parallel edges, never duplicated to 4
+    assert snapshots[0][2] == "multidigraph"
+
+
+def test_merge_graphs_explicit_simple_target_warns_on_multi(monkeypatch, tmp_path, capsys):
+    """An EXPLICIT --simple target over a multigraph input projects DOWN to simple
+    WITH a warning (intentional, audible collapse) — never a silent collapse."""
+    multi = _make_multidigraph(
+        [
+            {"id": "x", "label": "X", "source_file": "x.py"},
+            {"id": "y", "label": "Y", "source_file": "y.py"},
+        ],
+        [
+            {"source": "x", "target": "y", "key": 0, "relation": "calls"},
+            {"source": "x", "target": "y", "key": 1, "relation": "imports"},
+        ],
+    )
+    simple = _make_graph(
+        [{"id": "z", "label": "Z", "source_file": "z.py"}],
+        [],
+    )
+    g1 = _repo_graph(tmp_path, "repo1", multi)
+    g2 = _repo_graph(tmp_path, "repo2", simple)
+    out_p = tmp_path / "merged.json"
+
+    with pytest.warns(UserWarning, match="multigraph"):
+        assert _run_merge_graphs(monkeypatch, [g1, g2], out_p, "--simple") == 0
+
+    # Loud collapse: a WARNING is also emitted on stderr, and the result is simple.
+    err = capsys.readouterr().err
+    assert "WARNING" in err and "multigraph" in err
+    merged, data = _reload_graph(out_p)
+    assert not merged.is_multigraph() and not merged.is_directed()
+    assert data["graph"]["graphify_profile"]["graph_type"] == "simple"
+    # Parallel edges folded onto a single (x, y) pair (the explicit, warned choice).
+    assert merged.number_of_edges() == 1
+
+
+def test_merge_simple_only_regression(monkeypatch, tmp_path):
+    """Pure simple inputs → simple output, byte-stable across repeated runs (the
+    default no-flag path must not upgrade or perturb a simple-only merge)."""
+    s1 = _make_graph(
+        [
+            {"id": "a", "label": "A", "source_file": "a.py"},
+            {"id": "b", "label": "B", "source_file": "b.py"},
+        ],
+        [{"source": "a", "target": "b", "relation": "calls", "confidence": "EXTRACTED"}],
+    )
+    s2 = _make_graph(
+        [
+            {"id": "c", "label": "C", "source_file": "c.py"},
+            {"id": "d", "label": "D", "source_file": "d.py"},
+        ],
+        [{"source": "c", "target": "d", "relation": "uses", "confidence": "EXTRACTED"}],
+    )
+    g1 = _repo_graph(tmp_path, "repo1", s1)
+    g2 = _repo_graph(tmp_path, "repo2", s2)
+    out_p = tmp_path / "merged.json"
+
+    assert _run_merge_graphs(monkeypatch, [g1, g2], out_p) == 0
+    first = out_p.read_text(encoding="utf-8")
+    assert _run_merge_graphs(monkeypatch, [g1, g2], out_p) == 0
+    second = out_p.read_text(encoding="utf-8")
+
+    assert first == second  # byte-stable under repeat
+    merged, data = _reload_graph(out_p)
+    assert not merged.is_multigraph() and not merged.is_directed()  # stays simple
+    assert data["graph"]["graphify_profile"]["graph_type"] == "simple"
+    assert merged.number_of_edges() == 2  # no silent multi upgrade
+
+
+def test_merge_backs_up_before_overwrite(monkeypatch, tmp_path):
+    """An overwriting merge writes a dated .bak sibling of the pre-merge target
+    first, so the previous state is recoverable."""
+    s1 = _make_graph([{"id": "a", "label": "A", "source_file": "a.py"}], [])
+    s2 = _make_graph([{"id": "b", "label": "B", "source_file": "b.py"}], [])
+    g1 = _repo_graph(tmp_path, "repo1", s1)
+    g2 = _repo_graph(tmp_path, "repo2", s2)
+    out_p = tmp_path / "merged.json"
+
+    # Pre-seed an existing output so the merge OVERWRITES it (triggers backup).
+    sentinel = _make_graph([{"id": "old", "label": "OLD", "source_file": "old.py"}], [])
+    _graph_to_json(sentinel, out_p)
+    sentinel_bytes = out_p.read_bytes()
+
+    monkeypatch.delenv("GRAPHIFY_NO_BACKUP", raising=False)
+    assert _run_merge_graphs(monkeypatch, [g1, g2], out_p) == 0
+
+    backups = list(tmp_path.glob("merged.*.bak"))
+    assert len(backups) == 1  # exactly one dated backup sibling, no proliferation
+    assert backups[0].read_bytes() == sentinel_bytes  # holds the PRE-overwrite state
+
+
+def test_merge_pre_profile_refused(monkeypatch, tmp_path):
+    """Merging that would UPGRADE a pre-profile graph (no graphify_profile /
+    multigraph / directed markers) to a multidigraph target is refused with a
+    recovery message, leaving the target file unmutated — its lost parallel edges
+    cannot be reconstructed by an in-place upgrade."""
+    from networkx.readwrite import json_graph as jg
+
+    # A pre-profile `current`: strip the multigraph/directed flags AND any profile
+    # so detect_pre_profile() classifies it as predating class tracking.
+    current = _make_graph(
+        [
+            {"id": "a", "label": "A", "source_file": "a.py"},
+            {"id": "b", "label": "B", "source_file": "b.py"},
+        ],
+        [{"source": "a", "target": "b", "relation": "calls"}],
+    )
+    current_p = tmp_path / "current.json"
+    raw = jg.node_link_data(current, edges="links")
+    raw.pop("multigraph", None)
+    raw.pop("directed", None)
+    raw.pop("graph", None)  # no graphify_profile anywhere → pre-profile
+    current_p.write_text(json.dumps(raw), encoding="utf-8")
+    pre_bytes = current_p.read_bytes()
+
+    # `other` is a multigraph → the merge would upgrade `current` to multidigraph.
+    other = _make_multidigraph(
+        [
+            {"id": "a", "label": "A", "source_file": "a.py"},
+            {"id": "c", "label": "C", "source_file": "c.py"},
+        ],
+        [
+            {"source": "a", "target": "c", "key": 0, "relation": "imports"},
+            {"source": "a", "target": "c", "key": 1, "relation": "calls"},
+        ],
+    )
+    base_p = tmp_path / "base.json"
+    other_p = tmp_path / "other.json"
+    _graph_to_json(_make_graph([]), base_p)
+    _graph_to_json(other, other_p)
+
+    code = _run_merge_driver(monkeypatch, base_p, current_p, other_p)
+    assert code == 1  # refused → surfaced as a conflict, not silently upgraded
+    assert current_p.read_bytes() == pre_bytes  # target left unmutated
+
+
+def _write_pre_profile_graph(path, nodes, edges):
+    """Write a LEGACY pre-profile graph.json: bare nodes + links, NO graphify_profile
+    and NO multigraph/directed flags, so detect_pre_profile() treats it as predating
+    class tracking (it may already be a silently-collapsed simple graph)."""
+    from networkx.readwrite import json_graph as jg
+
+    G = _make_graph(nodes, edges)
+    raw = jg.node_link_data(G, edges="links")
+    raw.pop("multigraph", None)
+    raw.pop("directed", None)
+    raw.pop("graph", None)
+    path.write_text(json.dumps(raw), encoding="utf-8")
+
+
+def test_merge_driver_pre_profile_other_does_not_block(monkeypatch, tmp_path):
+    """REGRESSION for the narrowed pre-profile refusal: when `current` is a real
+    MultiDiGraph and `other` is a LEGACY pre-profile simple graph, the merge must
+    SUCCEED — `other` is read-only (merged in, never rewritten), so its pre-profile
+    status implies no unreconstructable in-place loss. Before the fix the refusal
+    loop also inspected `other`, false-positive-blocking this valid merge with a
+    misleading 'global-graph.json / rebuild from source' recovery message."""
+    # `current`: a genuine MultiDiGraph (carries multigraph:true + parallel edges).
+    current = _make_multidigraph(
+        [
+            {"id": "a", "label": "A", "source_file": "a.py"},
+            {"id": "b", "label": "B", "source_file": "b.py"},
+        ],
+        [
+            {"source": "a", "target": "b", "key": 0, "relation": "calls"},
+            {"source": "a", "target": "b", "key": 1, "relation": "imports"},
+        ],
+    )
+    current_p = tmp_path / "current.json"
+    _graph_to_json(current, current_p)
+
+    # `other`: a legacy pre-profile simple graph (no profile / multigraph flags).
+    other_p = tmp_path / "other.json"
+    _write_pre_profile_graph(
+        other_p,
+        [
+            {"id": "a", "label": "A", "source_file": "a.py"},
+            {"id": "c", "label": "C", "source_file": "c.py"},
+        ],
+        [{"source": "a", "target": "c", "relation": "uses"}],
+    )
+
+    base_p = tmp_path / "base.json"
+    _graph_to_json(_make_graph([]), base_p)
+
+    code = _run_merge_driver(monkeypatch, base_p, current_p, other_p)
+    assert code == 0  # NOT refused — a pre-profile `other` must not block the merge
+
+    merged, data = _reload_graph(current_p)
+    assert merged.is_multigraph()  # current stays multidigraph, not collapsed
+    assert data["graph"]["graphify_profile"]["graph_type"] == "multidigraph"
+    keys = _edge_keys(merged)
+    # current's parallel edges preserved...
+    assert ("a", "b", 0) in keys
+    assert ("a", "b", 1) in keys
+    # ...and other's edge is merged in (keyed onto the multi target).
+    assert ("a", "c", 0) in keys
+    assert merged.number_of_edges() == 3
+
+
+def test_merge_driver_pre_profile_current_still_refused(monkeypatch, tmp_path):
+    """Confirm the guard STILL fires for the legitimate case after the fix narrowed
+    its scope: `current` is a pre-profile simple graph and `other` is a MultiDiGraph,
+    so the inferred target is multidigraph and the merge would upgrade the
+    OVERWRITTEN current in place (its lost parallels unreconstructable). merge-driver
+    must REFUSE (exit 1, recovery message) and leave current unmutated — proving the
+    fix did not disable the real protection."""
+    current_p = tmp_path / "current.json"
+    _write_pre_profile_graph(
+        current_p,
+        [
+            {"id": "a", "label": "A", "source_file": "a.py"},
+            {"id": "b", "label": "B", "source_file": "b.py"},
+        ],
+        [{"source": "a", "target": "b", "relation": "calls"}],
+    )
+    pre_bytes = current_p.read_bytes()
+
+    other = _make_multidigraph(
+        [
+            {"id": "a", "label": "A", "source_file": "a.py"},
+            {"id": "c", "label": "C", "source_file": "c.py"},
+        ],
+        [
+            {"source": "a", "target": "c", "key": 0, "relation": "imports"},
+            {"source": "a", "target": "c", "key": 1, "relation": "calls"},
+        ],
+    )
+    base_p = tmp_path / "base.json"
+    other_p = tmp_path / "other.json"
+    _graph_to_json(_make_graph([]), base_p)
+    _graph_to_json(other, other_p)
+
+    monkeypatch.setattr(mainmod, "_check_skill_version", lambda _: None)
+    monkeypatch.setattr(
+        mainmod.sys,
+        "argv",
+        ["graphify", "merge-driver", str(base_p), str(current_p), str(other_p)],
+    )
+    import pytest as _pytest
+
+    with _pytest.raises(SystemExit) as exc_info:
+        mainmod.main()
+    assert exc_info.value.code == 1  # the real protection still fires
+    assert current_p.read_bytes() == pre_bytes  # current left unmutated
+
+
+def test_merge_driver_pre_profile_current_refusal_message(monkeypatch, tmp_path, capsys):
+    """Companion to the refusal test: the refusal prints the recovery message
+    (rebuild-from-source guidance), not a silent failure."""
+    current_p = tmp_path / "current.json"
+    _write_pre_profile_graph(
+        current_p,
+        [{"id": "a", "label": "A", "source_file": "a.py"}],
+        [],
+    )
+    other = _make_multidigraph(
+        [
+            {"id": "a", "label": "A", "source_file": "a.py"},
+            {"id": "c", "label": "C", "source_file": "c.py"},
+        ],
+        [
+            {"source": "a", "target": "c", "key": 0, "relation": "imports"},
+            {"source": "a", "target": "c", "key": 1, "relation": "calls"},
+        ],
+    )
+    base_p = tmp_path / "base.json"
+    other_p = tmp_path / "other.json"
+    _graph_to_json(_make_graph([]), base_p)
+    _graph_to_json(other, other_p)
+
+    code = _run_merge_driver(monkeypatch, base_p, current_p, other_p)
+    assert code == 1
+    err = capsys.readouterr().err
+    assert "pre-profile" in err
+    assert "multidigraph" in err
+    assert "remove" in err and "add" in err  # rebuild-from-source recovery guidance
+
+
+def test_merge_backup_suppressed_by_env(monkeypatch, tmp_path):
+    """`_backup_merge_target` honors the GRAPHIFY_NO_BACKUP env var: with it set, an
+    overwriting merge writes NO .bak; without it, the dated .bak sibling is created.
+    Confirms the env-suppression path Copilot flagged as only indirectly exercised."""
+    s1 = _make_graph([{"id": "a", "label": "A", "source_file": "a.py"}], [])
+    s2 = _make_graph([{"id": "b", "label": "B", "source_file": "b.py"}], [])
+    g1 = _repo_graph(tmp_path, "repo1", s1)
+    g2 = _repo_graph(tmp_path, "repo2", s2)
+
+    # --- with GRAPHIFY_NO_BACKUP=1: overwrite an existing target, expect NO .bak ---
+    out_suppressed = tmp_path / "merged_suppressed.json"
+    _graph_to_json(_make_graph([{"id": "old", "label": "OLD", "source_file": "old.py"}], []), out_suppressed)
+    monkeypatch.setenv("GRAPHIFY_NO_BACKUP", "1")
+    assert _run_merge_graphs(monkeypatch, [g1, g2], out_suppressed) == 0
+    assert list(tmp_path.glob("merged_suppressed.*.bak")) == []  # suppressed → no backup
+
+    # --- without it: overwrite an existing target, expect the .bak to appear ---
+    out_enabled = tmp_path / "merged_enabled.json"
+    sentinel = _make_graph([{"id": "old", "label": "OLD", "source_file": "old.py"}], [])
+    _graph_to_json(sentinel, out_enabled)
+    sentinel_bytes = out_enabled.read_bytes()
+    monkeypatch.delenv("GRAPHIFY_NO_BACKUP", raising=False)
+    assert _run_merge_graphs(monkeypatch, [g1, g2], out_enabled) == 0
+    backups = list(tmp_path.glob("merged_enabled.*.bak"))
+    assert len(backups) == 1  # backup created when env is unset
+    assert backups[0].read_bytes() == sentinel_bytes  # holds the PRE-overwrite state

From 96dc491222c4e0de4db56c5595fab6edfbbbebfe Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Sat, 30 May 2026 04:04:59 -0500
Subject: [PATCH 17/21] fix(multigraph): harden PR 8 recovery paths

---
 graphify/__main__.py       | 11 +++++++-
 graphify/global_graph.py   | 38 ++++++++++++++++-----------
 tests/test_global_graph.py | 54 ++++++++++++++++++++++++++------------
 3 files changed, 70 insertions(+), 33 deletions(-)

diff --git a/graphify/__main__.py b/graphify/__main__.py
index 8a558dec5..c52cf2eef 100644
--- a/graphify/__main__.py
+++ b/graphify/__main__.py
@@ -2953,7 +2953,16 @@ def _load_graph(p: str):
         # status implies no unreconstructable in-place loss and must not block.
         if target_type == "multidigraph":
             try:
-                refuse_pre_profile_upgrade(current_data, target_type)
+                refuse_pre_profile_upgrade(
+                    current_data,
+                    target_type,
+                    graph_label="merge target graph",
+                    graph_path=_current_path,
+                    recovery_hint=(
+                        "Regenerate or recreate this graph.json from source before retrying "
+                        "the merge, or resolve the file manually from source-backed inputs"
+                    ),
+                )
             except GlobalGraphRecoveryError as exc:
                 print(f"[graphify merge-driver] {exc}", file=sys.stderr)
                 sys.exit(1)
diff --git a/graphify/global_graph.py b/graphify/global_graph.py
index 99fea850c..90bd1ec5b 100644
--- a/graphify/global_graph.py
+++ b/graphify/global_graph.py
@@ -112,9 +112,7 @@ def normalize_graphs_for_global(
         ValueError - *target_type* is not a recognized graph_type token.
     """
     if target_type is not None and target_type not in _GRAPH_TYPES:
-        raise ValueError(
-            f"target_type must be one of {sorted(_GRAPH_TYPES)}, got {target_type!r}"
-        )
+        raise ValueError(f"target_type must be one of {sorted(_GRAPH_TYPES)}, got {target_type!r}")
 
     explicit = target_type is not None
     resolved = target_type if explicit else _infer_target_type(graphs)
@@ -168,6 +166,9 @@ def refuse_pre_profile_upgrade(
     target_type: str,
     *,
     backup_hint: Path | None = None,
+    graph_label: str = "global graph",
+    graph_path: str = "global-graph.json",
+    recovery_hint: str | None = None,
 ) -> None:
     """Refuse to upgrade a pre-profile global graph to multigraph.
 
@@ -189,17 +190,25 @@ def refuse_pre_profile_upgrade(
     backup_line = (
         f" A pre-overwrite backup was saved at {backup_hint}."
         if backup_hint is not None
-        else " Check ~/.graphify for a dated .bak snapshot of the previous graph."
+        else " Check for a dated .bak snapshot of the previous graph."
     )
+    if recovery_hint is None:
+        recovery_hint = (
+            "To rebuild safely, remove the affected repos and re-add them from source "
+            "(`graphify global remove <tag>` then `graphify global add`), which "
+            "regenerates keyed parallel edges from the per-repo graph.json."
+        )
+    else:
+        recovery_hint = recovery_hint.strip()
+        if recovery_hint and not recovery_hint.endswith("."):
+            recovery_hint += "."
+    recovery_line = f" {recovery_hint}" if recovery_hint else ""
+    article = "an" if graph_label[:1].lower() in {"a", "e", "i", "o", "u"} else "a"
     raise GlobalGraphRecoveryError(
-        "refusing to upgrade a pre-profile global graph to a multidigraph: the "
-        "existing global-graph.json has no graphify_profile or multigraph/directed "
-        "flags, so it predates class tracking and may already have collapsed "
-        "parallel edges that cannot be reconstructed by upgrading in place."
-        + backup_line
-        + " To rebuild safely, remove the affected repos and re-add them from source "
-        "(`graphify global remove <tag>` then `graphify global add`), which "
-        "regenerates keyed parallel edges from the per-repo graph.json."
+        f"refusing to upgrade {article} pre-profile {graph_label} to a multidigraph: "
+        f"{graph_path} has no graphify_profile or multigraph/directed flags, "
+        "so it predates class tracking and may already have collapsed parallel "
+        "edges that cannot be reconstructed by upgrading in place." + backup_line + recovery_line
     )
 
 
@@ -398,9 +407,7 @@ def global_add(source_path: Path, repo_tag: str) -> dict:
 
     # Normalize the surviving global graph and the prefixed source to the common
     # target class. normalize_graphs_for_global returns them in the same order.
-    (G, prefixed), target_type = normalize_graphs_for_global(
-        [G, prefixed], target_type=target_type
-    )
+    (G, prefixed), target_type = normalize_graphs_for_global([G, prefixed], target_type=target_type)
 
     # Merge external-library nodes (no source_file) by label to avoid duplication
     external_labels = {
@@ -462,6 +469,7 @@ def global_remove(repo_tag: str) -> int:
 
     G = _load_global_graph()
     removed = prune_repo_from_graph(G, repo_tag)
+    backup_global_graph()
     _save_global_graph(G)
 
     del manifest["repos"][repo_tag]
diff --git a/tests/test_global_graph.py b/tests/test_global_graph.py
index 2f273157a..ee8176946 100644
--- a/tests/test_global_graph.py
+++ b/tests/test_global_graph.py
@@ -242,6 +242,31 @@ def test_global_remove(tmp_path):
     assert "repoA" not in repos
 
 
+def test_global_remove_backs_up_before_overwrite(tmp_path):
+    """Removing a repo mutates global-graph.json, so recovery policy requires a backup."""
+    src_graph = tmp_path / "graph.json"
+    G = _make_graph([{"id": "userservice", "label": "UserService", "source_file": "src/user.py"}])
+    _graph_to_json(G, src_graph)
+
+    global_dir = tmp_path / ".graphify"
+    global_graph_path = global_dir / "global-graph.json"
+    with (
+        patch("graphify.global_graph._GLOBAL_DIR", global_dir),
+        patch("graphify.global_graph._GLOBAL_GRAPH", global_graph_path),
+        patch("graphify.global_graph._GLOBAL_MANIFEST", global_dir / "global-manifest.json"),
+    ):
+        from graphify.global_graph import global_add, global_remove
+
+        global_add(src_graph, "repoA")
+        before_remove = global_graph_path.read_bytes()
+        removed = global_remove("repoA")
+
+    assert removed > 0
+    backups = list(global_dir.glob("global-graph.*.bak"))
+    assert len(backups) == 1
+    assert backups[0].read_bytes() == before_remove
+
+
 def test_global_remove_unknown_tag_raises(tmp_path):
     global_dir = tmp_path / ".graphify"
     with (
@@ -562,9 +587,7 @@ def test_normalize_graphs_for_global_infers_target(recwarn):
     input warns and projects the multigraph down to simple."""
     from graphify.global_graph import normalize_graphs_for_global
 
-    simple = _make_graph(
-        [{"id": "x"}, {"id": "y"}], [{"source": "x", "target": "y"}]
-    )
+    simple = _make_graph([{"id": "x"}, {"id": "y"}], [{"source": "x", "target": "y"}])
     multi = _make_multidigraph(
         [{"id": "a"}, {"id": "b"}],
         [
@@ -582,9 +605,7 @@ def test_normalize_graphs_for_global_infers_target(recwarn):
 
     # Explicit simple target with a multi input → WARNING + projection to simple.
     with pytest.warns(UserWarning, match="collaps"):
-        normalized2, target2 = normalize_graphs_for_global(
-            [simple, multi], target_type="simple"
-        )
+        normalized2, target2 = normalize_graphs_for_global([simple, multi], target_type="simple")
     assert target2 == "simple"
     assert all(type(g) is nx.Graph for g in normalized2)
     # Parallel edges collapse to a single (a, b) pair on the simple projection.
@@ -602,9 +623,7 @@ def test_detect_pre_profile_global_graph():
 
     assert detect_pre_profile({"nodes": [{"id": "a"}], "links": []}) is True
     # Top-level profile present → not pre-profile.
-    assert (
-        detect_pre_profile({"nodes": [], "links": [], "graphify_profile": {}}) is False
-    )
+    assert detect_pre_profile({"nodes": [], "links": [], "graphify_profile": {}}) is False
     # Nested profile under "graph" → not pre-profile.
     assert (
         detect_pre_profile(
@@ -666,13 +685,9 @@ def test_global_add_backs_up_before_overwrite(tmp_path):
     global_dir = tmp_path / ".graphify"
 
     g1 = tmp_path / "g1.json"
-    _graph_to_json(
-        _make_graph([{"id": "u", "label": "U", "source_file": "src/u.py"}]), g1
-    )
+    _graph_to_json(_make_graph([{"id": "u", "label": "U", "source_file": "src/u.py"}]), g1)
     g2 = tmp_path / "g2.json"
-    _graph_to_json(
-        _make_graph([{"id": "w", "label": "W", "source_file": "src/w.py"}]), g2
-    )
+    _graph_to_json(_make_graph([{"id": "w", "label": "W", "source_file": "src/w.py"}]), g2)
 
     with _patch_global(global_dir):
         from graphify.global_graph import global_add
@@ -1225,7 +1240,10 @@ def test_merge_driver_pre_profile_current_refusal_message(monkeypatch, tmp_path,
     err = capsys.readouterr().err
     assert "pre-profile" in err
     assert "multidigraph" in err
-    assert "remove" in err and "add" in err  # rebuild-from-source recovery guidance
+    assert str(current_p) in err
+    assert "regenerate" in err or "recreate" in err
+    assert "global-graph.json" not in err
+    assert "graphify global remove" not in err
 
 
 def test_merge_backup_suppressed_by_env(monkeypatch, tmp_path):
@@ -1239,7 +1257,9 @@ def test_merge_backup_suppressed_by_env(monkeypatch, tmp_path):
 
     # --- with GRAPHIFY_NO_BACKUP=1: overwrite an existing target, expect NO .bak ---
     out_suppressed = tmp_path / "merged_suppressed.json"
-    _graph_to_json(_make_graph([{"id": "old", "label": "OLD", "source_file": "old.py"}], []), out_suppressed)
+    _graph_to_json(
+        _make_graph([{"id": "old", "label": "OLD", "source_file": "old.py"}], []), out_suppressed
+    )
     monkeypatch.setenv("GRAPHIFY_NO_BACKUP", "1")
     assert _run_merge_graphs(monkeypatch, [g1, g2], out_suppressed) == 0
     assert list(tmp_path.glob("merged_suppressed.*.bak")) == []  # suppressed → no backup

From 19b00e7f2b3070d7e969ac5d4c00aabf29ad54a5 Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Sat, 30 May 2026 05:08:07 -0500
Subject: [PATCH 18/21] feat(multigraph): PR 9 public --multigraph CLI, sticky
 profile, docs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Exposes the MultiDiGraph feature publicly now that every stateful path
maintains it safely (PR 1-8). End-to-end public workflow is green.

graphify extract:
- New --multigraph / --simple flags (mutually exclusive). --multigraph builds
  a keyed MultiDiGraph preserving parallel edges; capability is gated by
  require_multigraph_capabilities() surfaced as a clean CLI error (no
  traceback).
- STICKY profile: with NO flag, extract/update inherit the existing graph.json
  profile (via watch._existing_is_multigraph), so a graph built --multigraph
  stays multigraph across later commands without restating the flag.
- --simple is the explicit downgrade; over an existing multigraph it WARNS
  before collapsing parallel edges (never silent).
- Help text lists the flags + the sticky default.

Docs (behavior is now real): README + skill.md + skill-windows.md document
--multigraph / --simple and the sticky behavior.

Tests: full CLI lifecycle — simple default, --multigraph (real subprocess),
sticky stays-multigraph across repeated default extract+update (idempotence-
under-repeat), explicit --simple lossy-downgrade warning, capability-failure
message, and a multigraph extract->query parallel-relationship roundtrip.
Full suite 1714 passed; ruff + pyright clean. No private testing-tool refs.

gost
---
 README.md                 |   2 +
 graphify/__main__.py      | 124 ++++++++++++++-
 graphify/skill-windows.md |   7 +
 graphify/skill.md         |   7 +-
 tests/test_extract_cli.py | 320 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 456 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index a25195af2..525b61415 100644
--- a/README.md
+++ b/README.md
@@ -496,6 +496,8 @@ graphify extract ./docs --mode deep            # richer semantic extraction via
 graphify extract ./docs --no-cluster           # raw extraction only, skip clustering
 graphify extract ./docs --force                # overwrite graph.json even if new graph has fewer nodes (use after refactors or to clear ghost duplicates)
 graphify extract ./docs --dedup-llm            # LLM tiebreaker for ambiguous entity pairs (uses same API key)
+graphify extract ./docs --multigraph           # build a MultiDiGraph that preserves parallel edges (e.g. A calls B AND A imports B)
+graphify extract ./docs --simple               # force a simple graph even over an existing multigraph (lossy — warns on collapse)
 graphify extract ./docs --global --as myrepo   # extract and register into the cross-project global graph
 GRAPHIFY_MAX_OUTPUT_TOKENS=32768 graphify extract ./docs --backend claude  # raise output cap for dense corpora
 
diff --git a/graphify/__main__.py b/graphify/__main__.py
index c52cf2eef..9a442e153 100644
--- a/graphify/__main__.py
+++ b/graphify/__main__.py
@@ -1598,6 +1598,9 @@ def main() -> None:
         print(
             "  update <path>           re-extract code files and update the graph (no LLM needed)"
         )
+        print(
+            "                            (inherits the existing graph.json profile — a multigraph stays a multigraph)"
+        )
         print(
             "    --force                 overwrite graph.json even if the rebuild has fewer nodes"
         )
@@ -1666,6 +1669,15 @@ def main() -> None:
             "    --google-workspace      export .gdoc/.gsheet/.gslides shortcuts via gws before extraction"
         )
         print("    --no-cluster            skip clustering, write raw extraction only")
+        print(
+            "    --multigraph            build a keyed MultiDiGraph (preserves parallel edges between the same pair)"
+        )
+        print(
+            "    --simple                force a simple graph even over an existing multigraph (lossy downgrade)"
+        )
+        print(
+            "                            (default: STICKY — inherit the existing graph.json profile)"
+        )
         print("    --global                also merge the resulting graph into the global graph")
         print("    --as <tag>              repo tag for --global (default: target directory name)")
         print(
@@ -3564,6 +3576,7 @@ def _load_graph(p: str):
             print(
                 "Usage: graphify extract <path> [--backend gemini|kimi|claude|openai|deepseek|ollama] "
                 "[--model M] [--mode deep] [--out DIR] [--google-workspace] [--no-cluster] "
+                "[--multigraph|--simple] "
                 "[--max-workers N] [--token-budget N] [--max-concurrency N] "
                 "[--api-timeout S]",
                 file=sys.stderr,
@@ -3584,6 +3597,13 @@ def _load_graph(p: str):
         google_workspace = False
         global_merge = False
         global_repo_tag: str | None = None
+        # Graph class selection (PR 9). None = STICKY: inherit the existing
+        # graphify-out/graph.json profile (multidigraph stays multidigraph,
+        # otherwise the historical simple/directed default). True = force a
+        # keyed MultiDiGraph (parallel edges). False = explicit downgrade to a
+        # simple graph even when the existing graph.json is a multigraph.
+        # --multigraph and --simple are mutually exclusive.
+        multigraph_flag: bool | None = None
         # Performance/tuning knobs (issue #792). None means "use library default".
         cli_max_workers: int | None = None
         cli_token_budget: int | None = None
@@ -3656,6 +3676,24 @@ def _parse_float(name: str, raw: str) -> float:
             elif a == "--global":
                 global_merge = True
                 i += 1
+            elif a == "--multigraph":
+                if multigraph_flag is False:
+                    print(
+                        "error: --multigraph and --simple are mutually exclusive",
+                        file=sys.stderr,
+                    )
+                    sys.exit(2)
+                multigraph_flag = True
+                i += 1
+            elif a == "--simple":
+                if multigraph_flag is True:
+                    print(
+                        "error: --multigraph and --simple are mutually exclusive",
+                        file=sys.stderr,
+                    )
+                    sys.exit(2)
+                multigraph_flag = False
+                i += 1
             elif a == "--as" and i + 1 < len(args):
                 global_repo_tag = args[i + 1]
                 i += 2
@@ -4021,17 +4059,83 @@ def _progress(idx: int, total: int, _result: dict) -> None:
             for ftype, flist in files_by_type.items()
         }
 
+        # Resolve the effective graph class (PR 9 sticky profile). When neither
+        # --multigraph nor --simple is given the build must INHERIT the existing
+        # graph.json profile so a multigraph never silently downgrades to a
+        # simple graph on a default re-extract (mirrors watch._rebuild_code and
+        # build_merge's inherit-on-None contract). --multigraph forces multi,
+        # --simple forces a simple downgrade.
+        from graphify.watch import _existing_is_multigraph as _detect_multigraph
+
+        _existing_multigraph = False
+        if existing_graph_path.exists():
+            try:
+                _existing_data = json.loads(existing_graph_path.read_text(encoding="utf-8"))
+                _existing_multigraph = _detect_multigraph(_existing_data)
+            except Exception as exc:
+                print(
+                    f"[graphify extract] warning: could not inspect existing graph.json "
+                    f"profile ({exc}); treating as simple.",
+                    file=sys.stderr,
+                )
+        if multigraph_flag is None:
+            resolved_multigraph = _existing_multigraph
+        else:
+            resolved_multigraph = multigraph_flag
+
+        # Lossy-projection warning: an EXPLICIT --simple over an existing
+        # multigraph graph.json collapses keyed parallel edges. This is an
+        # intentional downgrade, so warn loudly (not silently) and proceed.
+        if multigraph_flag is False and _existing_multigraph:
+            print(
+                "[graphify extract] WARNING: --simple requested over an existing "
+                "multigraph graph.json; parallel edges between the same pair will be "
+                "collapsed onto a single edge (lossy downgrade). Omit --simple to "
+                "preserve them, or re-extract with --multigraph.",
+                file=sys.stderr,
+            )
+
+        # Capability gate: surface the MultiDiGraph capability probe failure as a
+        # clean CLI error (exit 1) instead of letting the RuntimeError raised deep
+        # inside build_from_json escape as a traceback. The probe is cheap and
+        # lru_cached, so running it up front costs nothing on the happy path.
+        if resolved_multigraph:
+            from graphify.multigraph_compat import require_multigraph_capabilities
+
+            try:
+                require_multigraph_capabilities()
+            except RuntimeError as exc:
+                print(str(exc), file=sys.stderr)
+                sys.exit(1)
+
         if no_cluster:
             # --no-cluster: dump the raw merged extraction as graph.json.
             # No NetworkX, no community detection, no analysis sidecar.
             from graphify.export import backup_if_protected as _backup
 
             _backup(graphify_out)
-            graph_json_path.write_text(json.dumps(merged, indent=2), encoding="utf-8")
+            if resolved_multigraph:
+                # A multigraph profile (sticky-inherited or explicit --multigraph)
+                # cannot be expressed by the raw-merged dump: parallel edges would
+                # be written without keys and the file would lack the multigraph
+                # flag + graphify_profile, silently collapsing on the next load.
+                # Build a keyed MultiDiGraph and serialize it via to_json (with no
+                # communities) so the no-cluster file still round-trips losslessly.
+                from graphify.build import build_from_json as _build_from_json
+                from graphify.export import to_json as _nc_to_json
+
+                _nc_graph = _build_from_json(merged, multigraph=True, root=target)
+                _nc_to_json(_nc_graph, {}, str(graph_json_path), force=True)
+                n_nodes = _nc_graph.number_of_nodes()
+                n_edges = _nc_graph.number_of_edges()
+            else:
+                graph_json_path.write_text(json.dumps(merged, indent=2), encoding="utf-8")
+                n_nodes = len(merged["nodes"])
+                n_edges = len(merged["edges"])
             cost = _estimate_cost(backend, merged["input_tokens"], merged["output_tokens"])
             print(
                 f"[graphify extract] wrote {graph_json_path} — "
-                f"{len(merged['nodes'])} nodes, {len(merged['edges'])} edges "
+                f"{n_nodes} nodes, {n_edges} edges "
                 f"(no clustering)"
             )
             if merged["input_tokens"] or merged["output_tokens"]:
@@ -4078,6 +4182,10 @@ def _progress(idx: int, total: int, _result: dict) -> None:
 
         dedup_backend = backend if dedup_llm else None
         if incremental_mode:
+            # Pass multigraph_flag straight through: None lets build_merge
+            # INHERIT the saved graph.json profile (the sticky default), while an
+            # explicit --multigraph/--simple overrides it (build_merge warns on
+            # an explicit override of the saved flag).
             G = _build_merge(
                 [merged],
                 graph_path=existing_graph_path,
@@ -4085,9 +4193,19 @@ def _progress(idx: int, total: int, _result: dict) -> None:
                 dedup=True,
                 dedup_llm_backend=dedup_backend,
                 root=target,
+                multigraph=multigraph_flag,
             )
         else:
-            G = _build([merged], dedup=True, dedup_llm_backend=dedup_backend, root=target)
+            # Fresh build: no saved graph.json to inherit from, so the resolved
+            # value already collapses to the requested flag (or the historical
+            # simple default when no flag is given).
+            G = _build(
+                [merged],
+                dedup=True,
+                dedup_llm_backend=dedup_backend,
+                root=target,
+                multigraph=resolved_multigraph,
+            )
         if G.number_of_nodes() == 0:
             print(
                 "[graphify extract] graph is empty — extraction produced no nodes. "
diff --git a/graphify/skill-windows.md b/graphify/skill-windows.md
index 24d6800a0..41ca18329 100644
--- a/graphify/skill-windows.md
+++ b/graphify/skill-windows.md
@@ -16,6 +16,8 @@ Turn any folder of files into a navigable knowledge graph with community detecti
 /graphify <path> --mode deep                          # thorough extraction, richer INFERRED edges
 /graphify <path> --update                             # incremental - re-extract only new/changed files
 /graphify <path> --directed                            # build directed graph (preserves edge direction: source→target)
+/graphify <path> --multigraph                          # build a MultiDiGraph preserving parallel edges (multiple distinct relationships between the same pair)
+/graphify <path> --simple                              # force a simple graph even over an existing multigraph (lossy downgrade — warns on collapse)
 /graphify <path> --cluster-only                       # rerun clustering on existing graph
 /graphify <path> --no-viz                             # skip visualization, just report + JSON
 /graphify <path> --html                               # (HTML is generated by default - this flag is a no-op)
@@ -484,6 +486,11 @@ Remove-Item -ErrorAction SilentlyContinue graphify-out\.graphify_step_3_extract_
 
 ### Step 4 - Build graph, cluster, analyze, generate outputs
 
+**Before starting:** note whether `--directed`, `--multigraph`, or `--simple` was given.
+- `--directed`: pass `directed=True` to `build_from_json()` — builds a `DiGraph` (source→target direction preserved).
+- `--multigraph`: pass `multigraph=True` to `build_from_json()` — builds a `MultiDiGraph` that keeps every distinct relationship between the same pair as a separate keyed edge (e.g. A both calls and imports B). Once built with `--multigraph`, subsequent `graphify extract` or `graphify update` calls without any flag stay multigraph (sticky profile). Run `--simple` to deliberately downgrade.
+- `--simple`: forces a plain directed graph even over an existing multigraph (lossy — parallel edges collapse, a warning is printed). Mutually exclusive with `--multigraph`.
+
 ```powershell
 New-Item -ItemType Directory -Force -Path graphify-out | Out-Null
 @'
diff --git a/graphify/skill.md b/graphify/skill.md
index 50ecd0c8a..1ae7b66bb 100644
--- a/graphify/skill.md
+++ b/graphify/skill.md
@@ -19,6 +19,8 @@ Turn any folder of files into a navigable knowledge graph with community detecti
 /graphify <path> --mode deep                          # thorough extraction, richer INFERRED edges
 /graphify <path> --update                             # incremental - re-extract only new/changed files
 /graphify <path> --directed                            # build directed graph (preserves edge direction: source→target)
+/graphify <path> --multigraph                          # build a MultiDiGraph preserving parallel edges (multiple distinct relationships between the same pair)
+/graphify <path> --simple                              # force a simple graph even over an existing multigraph (lossy downgrade — warns on collapse)
 /graphify <path> --whisper-model medium                # use a larger Whisper model for better transcription accuracy
 /graphify <path> --cluster-only                       # rerun clustering on existing graph
 /graphify <path> --no-viz                             # skip visualization, just report + JSON
@@ -511,7 +513,10 @@ print(f'Merged: {total} nodes, {edges} edges ({len(ast[\"nodes\"])} AST + {len(s
 
 ### Step 4 - Build graph, cluster, analyze, generate outputs
 
-**Before starting:** note whether `--directed` was given. If so, pass `directed=True` to `build_from_json()` in the code block below. This builds a `DiGraph` that preserves edge direction (source→target) instead of the default undirected `Graph`.
+**Before starting:** note whether `--directed`, `--multigraph`, or `--simple` was given.
+- `--directed`: pass `directed=True` to `build_from_json()` — builds a `DiGraph` that preserves edge direction (source→target) instead of the default undirected `Graph`.
+- `--multigraph`: pass `multigraph=True` to `build_from_json()` — builds a `MultiDiGraph` that keeps every distinct relationship between the same pair of nodes as a separate keyed edge (e.g. node A both calls and imports node B). Use when you need all relationships between two nodes, not just the first. Once built with `--multigraph`, subsequent `graphify extract` or `graphify update` calls without any flag stay multigraph (sticky profile). Run `--simple` to deliberately downgrade.
+- `--simple`: forces a plain directed graph even when the existing `graph.json` was built as a multigraph. This is a lossy downgrade — parallel edges collapse and a warning is printed. `--multigraph` and `--simple` are mutually exclusive.
 
 ```bash
 mkdir -p graphify-out
diff --git a/tests/test_extract_cli.py b/tests/test_extract_cli.py
index a3ca1e4ed..79fa8132f 100644
--- a/tests/test_extract_cli.py
+++ b/tests/test_extract_cli.py
@@ -2,9 +2,329 @@
 
 from __future__ import annotations
 
+import json
+import os
+import subprocess
+import sys
+from pathlib import Path
+
+import networkx as nx
 import pytest
 
 import graphify.__main__ as mainmod
+from graphify.build import build_from_json
+from graphify.export import to_json
+from graphify.graph_loader import load_graph
+from graphify.llm import BACKENDS, _backend_env_keys
+
+
+PYTHON = sys.executable
+
+
+def _clean_env() -> dict:
+    """Return os.environ with every backend API key stripped out.
+
+    Mirrors tests/test_incremental._clean_env so subprocess runs do not pick up
+    a real key from the developer's shell and accidentally hit a live LLM.
+    """
+    env = dict(os.environ)
+    for backend in BACKENDS:
+        for env_key in _backend_env_keys(backend):
+            env.pop(env_key, None)
+    for extra in (
+        "AWS_PROFILE",
+        "AWS_REGION",
+        "AWS_DEFAULT_REGION",
+        "OLLAMA_BASE_URL",
+        "OLLAMA_API_KEY",
+    ):
+        env.pop(extra, None)
+    return env
+
+
+def _run(args: list[str], cwd: Path, *, env: dict | None = None) -> subprocess.CompletedProcess:
+    """Run `python -m graphify <args>` as a sanitized subprocess."""
+    return subprocess.run(
+        [PYTHON, "-m", "graphify"] + args,
+        cwd=cwd,
+        capture_output=True,
+        text=True,
+        env=env if env is not None else _clean_env(),
+    )
+
+
+def _make_code_corpus(tmp_path: Path) -> Path:
+    """A tiny AST-only code corpus — no docs, so semantic/LLM extraction never runs.
+
+    The functions reference each other so AST extraction produces real edges.
+    """
+    corpus = tmp_path / "corpus"
+    corpus.mkdir()
+    (corpus / "app.py").write_text(
+        "def helper():\n    return 1\n\n\n"
+        "def main():\n    return helper()\n\n\n"
+        "def extra():\n    return main()\n",
+        encoding="utf-8",
+    )
+    return corpus
+
+
+def _write_multidigraph_graph_json(corpus: Path) -> Path:
+    """Seed corpus/graphify-out/graph.json as a multidigraph with parallel edges.
+
+    Built exactly the way the pipeline persists it (build_from_json multigraph=True
+    -> export.to_json), so the file carries the top-level ``multigraph: true`` flag
+    and ``graphify_profile.graph_type == multidigraph``. Two parallel main->helper
+    edges (different relations) prove parallels survive a sticky re-extract.
+    """
+    nodes = [
+        {
+            "id": n,
+            "label": f"{n}()",
+            "file_type": "code",
+            "source_file": "app.py",
+            "source_location": "L1",
+        }
+        for n in ("main", "helper")
+    ]
+    edges = [
+        {
+            "source": "main",
+            "target": "helper",
+            "relation": rel,
+            "confidence": "EXTRACTED",
+            "source_file": "app.py",
+            "source_location": f"L{i}",
+        }
+        for i, rel in enumerate(["calls", "imports"])
+    ]
+    G = build_from_json({"nodes": nodes, "edges": edges}, multigraph=True)
+    assert isinstance(G, nx.MultiDiGraph)
+    assert G.number_of_edges("main", "helper") == 2
+    out = corpus / "graphify-out"
+    out.mkdir(exist_ok=True)
+    graph_json = out / "graph.json"
+    to_json(G, {0: ["main", "helper"]}, str(graph_json), force=True)
+    # Persist the scan root so a later `update` (no path arg) can recover it.
+    (out / ".graphify_root").write_text(str(corpus), encoding="utf-8")
+    return graph_json
+
+
+def _graph_type(graph_data: dict) -> str | None:
+    return graph_data.get("graph", {}).get("graphify_profile", {}).get("graph_type")
+
+
+def _parallel_edges(graph_data: dict, src: str, tgt: str) -> list[dict]:
+    links = graph_data.get("links", graph_data.get("edges", []))
+    return [e for e in links if e.get("source") == src and e.get("target") == tgt]
+
+
+# ───────────────────────────── PR 9: public --multigraph / --simple ─────────────
+#
+# extract exposes the MultiDiGraph build publicly. Default is STICKY: a default
+# re-extract inherits the existing graph.json profile (a multigraph stays a
+# multigraph). --multigraph forces a keyed MultiDiGraph; --simple is the explicit,
+# warned, lossy downgrade. Capability failures surface as a clean CLI error.
+
+
+def test_extract_simple_default(tmp_path):
+    """No flag on a fresh corpus → a simple graph (historical behavior).
+
+    A fresh corpus has no existing graph.json to inherit, so the sticky default
+    collapses to the historical simple build: multigraph:false / graph_type simple.
+    """
+    corpus = _make_code_corpus(tmp_path)
+    env = _clean_env()
+    env["ANTHROPIC_API_KEY"] = "sk-test-fake-key"  # code-only corpus never calls the LLM
+    r = _run(["extract", str(corpus), "--backend", "claude"], tmp_path, env=env)
+    assert r.returncode == 0, f"fresh simple extract should succeed: {r.stderr}"
+
+    graph_json = corpus / "graphify-out" / "graph.json"
+    assert graph_json.exists(), f"graph.json must be written: {r.stderr}"
+    data = json.loads(graph_json.read_text(encoding="utf-8"))
+    assert data.get("multigraph") is False, "default fresh build must be a simple graph"
+    assert _graph_type(data) == "simple"
+
+
+def test_extract_multigraph_flag(tmp_path):
+    """`extract --multigraph` → graph.json is a keyed MultiDiGraph.
+
+    Real end-to-end CLI subprocess: multigraph:true + graphify_profile.graph_type
+    == "multidigraph", and it reloads as an actual nx.MultiDiGraph.
+    """
+    corpus = _make_code_corpus(tmp_path)
+    env = _clean_env()
+    env["ANTHROPIC_API_KEY"] = "sk-test-fake-key"
+    r = _run(["extract", str(corpus), "--backend", "claude", "--multigraph"], tmp_path, env=env)
+    assert r.returncode == 0, f"extract --multigraph should succeed: {r.stderr}"
+
+    graph_json = corpus / "graphify-out" / "graph.json"
+    data = json.loads(graph_json.read_text(encoding="utf-8"))
+    assert data.get("multigraph") is True, "--multigraph must produce a multigraph graph.json"
+    assert data.get("directed") is True, "a MultiDiGraph is always directed"
+    assert _graph_type(data) == "multidigraph"
+    # Reloads as a real MultiDiGraph.
+    G = load_graph(data)
+    assert G.is_multigraph(), "graph.json must reload as a MultiDiGraph"
+
+
+def test_extract_multigraph_then_update_sticky(tmp_path):
+    """`extract --multigraph`, then default re-extract/update STAYS multigraph.
+
+    The second build is run WITHOUT any flag 3 times in a row; the profile must
+    stay multidigraph each time (idempotence-under-repeat), with the keyed
+    parallel-edge capability intact — never a silent collapse to simple.
+    """
+    corpus = _make_code_corpus(tmp_path)
+    env = _clean_env()
+    env["ANTHROPIC_API_KEY"] = "sk-test-fake-key"
+
+    r0 = _run(["extract", str(corpus), "--backend", "claude", "--multigraph"], tmp_path, env=env)
+    assert r0.returncode == 0, f"initial --multigraph extract failed: {r0.stderr}"
+    graph_json = corpus / "graphify-out" / "graph.json"
+
+    # Seed two parallel main->helper edges so we can prove parallels persist.
+    _write_multidigraph_graph_json(corpus)
+    seeded = json.loads(graph_json.read_text(encoding="utf-8"))
+    assert seeded.get("multigraph") is True
+    assert len(_parallel_edges(seeded, "main", "helper")) == 2
+
+    # Default re-extract (NO flag) 3×; sticky must keep it multigraph every time.
+    for attempt in range(1, 4):
+        r = _run(["extract", str(corpus), "--backend", "claude"], tmp_path, env=env)
+        assert r.returncode == 0, f"sticky re-extract #{attempt} failed: {r.stderr}"
+        data = json.loads(graph_json.read_text(encoding="utf-8"))
+        assert data.get("multigraph") is True, (
+            f"re-extract #{attempt} must STAY multigraph (sticky), "
+            f"got multigraph={data.get('multigraph')!r}"
+        )
+        assert _graph_type(data) == "multidigraph", f"re-extract #{attempt} profile drifted"
+        # Parallel edges are not collapsed away by the sticky rebuild.
+        par = _parallel_edges(data, "main", "helper")
+        assert len(par) == 2, f"re-extract #{attempt} must preserve keyed parallel edges"
+        assert sorted(e["relation"] for e in par) == ["calls", "imports"]
+        # Reloads as a MultiDiGraph with the parallels intact.
+        G = load_graph(data)
+        assert G.is_multigraph()
+        assert G.number_of_edges("main", "helper") == 2
+
+    # A default `update` (the watch entrypoint) also stays multigraph.
+    ru = _run(["update", str(corpus)], tmp_path, env=env)
+    assert ru.returncode == 0, f"sticky update failed: {ru.stderr}"
+    after_update = json.loads(graph_json.read_text(encoding="utf-8"))
+    assert after_update.get("multigraph") is True, "update must inherit the multigraph profile"
+    assert _graph_type(after_update) == "multidigraph"
+
+
+def test_extract_explicit_simple_downgrade_warns(tmp_path):
+    """Existing multigraph graph.json + `extract --simple` → builds simple AND warns.
+
+    The downgrade collapses parallel edges, so it requires explicit intent and a
+    loud lossy-collapse WARNING — never a silent collapse. A manifest is seeded so
+    the run takes the incremental (preserve+merge) path, where the existing
+    multigraph's parallel edges are loaded and then collapsed under the simple
+    target — the real lossy projection we want to prove.
+    """
+    from graphify.detect import save_manifest
+
+    corpus = _make_code_corpus(tmp_path)
+    graph_json = _write_multidigraph_graph_json(corpus)
+    out = corpus / "graphify-out"
+    save_manifest(
+        {"code": [str(corpus / "app.py")]},
+        manifest_path=str(out / "manifest.json"),
+        kind="both",
+    )
+    before = json.loads(graph_json.read_text(encoding="utf-8"))
+    assert before.get("multigraph") is True
+    assert len(_parallel_edges(before, "main", "helper")) == 2
+
+    env = _clean_env()
+    env["ANTHROPIC_API_KEY"] = "sk-test-fake-key"
+    r = _run(["extract", str(corpus), "--backend", "claude", "--simple"], tmp_path, env=env)
+    assert r.returncode == 0, f"--simple downgrade should succeed: {r.stderr}"
+    # Lossy-collapse WARNING must be printed (explicit, audible downgrade).
+    assert "WARNING" in r.stderr and "--simple" in r.stderr, (
+        f"explicit --simple downgrade must warn about lossy collapse, got: {r.stderr}"
+    )
+    assert "collaps" in r.stderr.lower()
+
+    after = json.loads(graph_json.read_text(encoding="utf-8"))
+    assert after.get("multigraph") is False, "--simple must produce a non-multigraph graph"
+    assert _graph_type(after) != "multidigraph"
+    # The two parallel edges from the seeded multigraph collapse onto a single
+    # main->helper edge (the lossy projection — one survivor, not two parallels).
+    assert len(_parallel_edges(after, "main", "helper")) == 1, (
+        "explicit --simple must collapse the existing parallel edges onto one"
+    )
+
+
+def test_extract_multigraph_capability_failure_message(monkeypatch, tmp_path, capsys):
+    """A MultiDiGraph capability failure surfaces as a clean CLI error, exit 1.
+
+    The probe RuntimeError must be caught and printed (no traceback), and no
+    graph.json may be written. Run in-process so we can monkeypatch the probe.
+    """
+    corpus = _make_code_corpus(tmp_path)
+    monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-test-fake-key")
+
+    def _boom():
+        raise RuntimeError(
+            "error: --multigraph requires NetworkX keyed MultiDiGraph node-link "
+            "round-trip support. Simulated capability failure."
+        )
+
+    # Patch where the extract handler imports it from.
+    monkeypatch.setattr("graphify.multigraph_compat.require_multigraph_capabilities", _boom)
+    monkeypatch.setattr(mainmod, "_check_skill_version", lambda _: None)
+    monkeypatch.setattr(
+        mainmod.sys,
+        "argv",
+        ["graphify", "extract", str(corpus), "--backend", "claude", "--multigraph"],
+    )
+
+    with pytest.raises(SystemExit) as exc_info:
+        mainmod.main()
+    assert exc_info.value.code == 1, (
+        f"capability failure must exit 1, got {exc_info.value.code}"
+    )
+
+    err = capsys.readouterr().err
+    assert "--multigraph requires" in err, f"clean capability message expected, got: {err}"
+    assert "Traceback" not in err, "capability failure must not leak a traceback"
+    assert not (corpus / "graphify-out" / "graph.json").exists(), (
+        "no graph.json may be written when the capability gate fails"
+    )
+
+
+def test_extract_multigraph_query_roundtrip(tmp_path, capsys, monkeypatch):
+    """End-to-end public workflow: a multigraph corpus with same-endpoint different
+    relations exposes the parallel relationships through the public query/path path.
+
+    Builds the multigraph graph.json the way `extract --multigraph` persists it,
+    then runs `graphify path` (a public query surface) and asserts BOTH parallel
+    relations show — the parallel relationships are visible, not collapsed.
+    """
+    corpus = _make_code_corpus(tmp_path)
+    graph_json = _write_multidigraph_graph_json(corpus)
+
+    # Sanity: the persisted file is a multidigraph with both parallel relations.
+    data = json.loads(graph_json.read_text(encoding="utf-8"))
+    assert data.get("multigraph") is True
+    G = load_graph(data)
+    assert G.is_multigraph() and G.number_of_edges("main", "helper") == 2
+
+    # Public query surface: `graphify path main helper` bundles all relations.
+    monkeypatch.setattr(mainmod, "_check_skill_version", lambda _: None)
+    monkeypatch.setattr(
+        mainmod.sys,
+        "argv",
+        ["graphify", "path", "main", "helper", "--graph", str(graph_json)],
+    )
+    mainmod.main()
+    out = capsys.readouterr().out
+    assert "calls" in out, f"parallel 'calls' relation must appear in path output: {out}"
+    assert "imports" in out, f"parallel 'imports' relation must appear in path output: {out}"
 
 
 def _make_corpus(tmp_path):

From 814718f62612b8fec876c8bfdf783e8cf0ac73df Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Sat, 30 May 2026 05:29:43 -0500
Subject: [PATCH 19/21] fix(multigraph): preserve no-cluster sticky extracts

---
 graphify/__main__.py      | 23 +++++++++++++++++++-
 tests/test_extract_cli.py | 45 ++++++++++++++++++++++++++++++++++++---
 2 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/graphify/__main__.py b/graphify/__main__.py
index 9a442e153..3c2f37d50 100644
--- a/graphify/__main__.py
+++ b/graphify/__main__.py
@@ -4114,7 +4114,28 @@ def _progress(idx: int, total: int, _result: dict) -> None:
             from graphify.export import backup_if_protected as _backup
 
             _backup(graphify_out)
-            if resolved_multigraph:
+            if incremental_mode:
+                # Incremental no-cluster scans are still deltas. If no files
+                # changed, ``merged`` is empty; writing it directly would erase
+                # the saved graph. Reuse build_merge so unchanged nodes/edges,
+                # deleted-file pruning, and sticky multigraph profile handling
+                # match the clustered path while still skipping clustering.
+                from graphify.build import build_merge as _nc_build_merge
+                from graphify.export import to_json as _nc_to_json
+
+                _nc_graph = _nc_build_merge(
+                    [merged],
+                    graph_path=existing_graph_path,
+                    prune_sources=deleted_files or None,
+                    dedup=True,
+                    dedup_llm_backend=backend if dedup_llm else None,
+                    root=target,
+                    multigraph=multigraph_flag,
+                )
+                _nc_to_json(_nc_graph, {}, str(graph_json_path), force=True)
+                n_nodes = _nc_graph.number_of_nodes()
+                n_edges = _nc_graph.number_of_edges()
+            elif resolved_multigraph:
                 # A multigraph profile (sticky-inherited or explicit --multigraph)
                 # cannot be expressed by the raw-merged dump: parallel edges would
                 # be written without keys and the file would lack the multigraph
diff --git a/tests/test_extract_cli.py b/tests/test_extract_cli.py
index 79fa8132f..36116a5d9 100644
--- a/tests/test_extract_cli.py
+++ b/tests/test_extract_cli.py
@@ -216,6 +216,47 @@ def test_extract_multigraph_then_update_sticky(tmp_path):
     assert _graph_type(after_update) == "multidigraph"
 
 
+def test_extract_multigraph_no_cluster_sticky_idempotent(tmp_path):
+    """`--no-cluster` still preserves a sticky multigraph across no-op re-runs.
+
+    A no-cluster incremental scan with no changed files produces an empty fresh
+    extraction. The command must merge that empty delta with the saved graph,
+    not overwrite graph.json with zero nodes/edges.
+    """
+    corpus = _make_code_corpus(tmp_path)
+    env = _clean_env()
+    env["ANTHROPIC_API_KEY"] = "sk-test-fake-key"
+
+    r0 = _run(
+        ["extract", str(corpus), "--backend", "claude", "--multigraph", "--no-cluster"],
+        tmp_path,
+        env=env,
+    )
+    assert r0.returncode == 0, f"initial no-cluster --multigraph failed: {r0.stderr}"
+
+    graph_json = corpus / "graphify-out" / "graph.json"
+    first = json.loads(graph_json.read_text(encoding="utf-8"))
+    first_nodes = len(first.get("nodes", []))
+    first_edges = len(first.get("links", first.get("edges", [])))
+    assert first.get("multigraph") is True
+    assert _graph_type(first) == "multidigraph"
+    assert first_nodes > 0
+    assert first_edges > 0
+
+    for attempt in range(1, 4):
+        r = _run(
+            ["extract", str(corpus), "--backend", "claude", "--no-cluster"],
+            tmp_path,
+            env=env,
+        )
+        assert r.returncode == 0, f"sticky no-cluster re-extract #{attempt} failed: {r.stderr}"
+        data = json.loads(graph_json.read_text(encoding="utf-8"))
+        assert data.get("multigraph") is True
+        assert _graph_type(data) == "multidigraph"
+        assert len(data.get("nodes", [])) == first_nodes
+        assert len(data.get("links", data.get("edges", []))) == first_edges
+
+
 def test_extract_explicit_simple_downgrade_warns(tmp_path):
     """Existing multigraph graph.json + `extract --simple` → builds simple AND warns.
 
@@ -285,9 +326,7 @@ def _boom():
 
     with pytest.raises(SystemExit) as exc_info:
         mainmod.main()
-    assert exc_info.value.code == 1, (
-        f"capability failure must exit 1, got {exc_info.value.code}"
-    )
+    assert exc_info.value.code == 1, f"capability failure must exit 1, got {exc_info.value.code}"
 
     err = capsys.readouterr().err
     assert "--multigraph requires" in err, f"clean capability message expected, got: {err}"

From 526f8480a3de25c38fd294b7e42147ed5a8bf8de Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Sat, 30 May 2026 21:56:38 -0500
Subject: [PATCH 20/21] fix(multigraph): preserve graph on empty rebuilds

Normalize legacy edges-keyed graph JSON during watch/update comparisons and add a zero-node overwrite floor across export, watch, and no-cluster extract write paths. Preserve upstream provider registry behavior while clearing post-rebase lint/type/security gates.
---
 graphify/__main__.py            | 305 ++++++++++++++++++++------------
 graphify/analyze.py             |  13 +-
 graphify/export.py              |  24 +++
 graphify/extract.py             |   1 +
 graphify/llm.py                 |  15 +-
 graphify/report.py              |   1 +
 graphify/watch.py               |  93 +++++++++-
 tests/test_analyze.py           |   6 +-
 tests/test_detect.py            |  18 +-
 tests/test_export.py            | 114 ++++++++++++
 tests/test_extract_cli.py       | 208 ++++++++++++++++++++++
 tests/test_provider_registry.py | 118 +++++++-----
 tests/test_watch.py             | 219 +++++++++++++++++++++++
 uv.lock                         |   2 +-
 14 files changed, 955 insertions(+), 182 deletions(-)

diff --git a/graphify/__main__.py b/graphify/__main__.py
index 3c2f37d50..b19355518 100644
--- a/graphify/__main__.py
+++ b/graphify/__main__.py
@@ -1533,13 +1533,138 @@ def _clone_repo(url: str, branch: str | None = None, out_dir: Path | None = None
     return dest
 
 
+def _read_provider_registry(path: Path) -> dict:
+    if not path.is_file():
+        return {}
+    try:
+        data = json.loads(path.read_text(encoding="utf-8"))
+    except Exception:
+        return {}
+    if isinstance(data, dict):
+        return data
+    return {}
+
+
+def _provider_cmd(args: list[str]) -> None:
+    from graphify.llm import BACKENDS, _custom_providers_path
+
+    subcmd = args[0] if args else ""
+    global_path = _custom_providers_path(global_=True)
+
+    if subcmd == "list":
+        global_path.parent.mkdir(parents=True, exist_ok=True)
+        existing = _read_provider_registry(global_path)
+        if not existing:
+            print("No custom providers registered.")
+        else:
+            for name in existing:
+                print(f"  {name}  ({existing[name].get('base_url', '')})")
+
+    elif subcmd == "show":
+        name = args[1] if len(args) > 1 else ""
+        if not name:
+            print("Usage: graphify provider show <name>", file=sys.stderr)
+            sys.exit(1)
+        existing = _read_provider_registry(global_path)
+        if name not in existing:
+            print(f"Provider '{name}' not found.", file=sys.stderr)
+            sys.exit(1)
+        print(json.dumps({name: existing[name]}, indent=2))
+
+    elif subcmd == "add":
+        add_args = args[1:]
+        name = add_args[0] if add_args and not add_args[0].startswith("-") else ""
+        if not name:
+            print(
+                "Usage: graphify provider add <name> --base-url URL --default-model MODEL --env-key KEY",
+                file=sys.stderr,
+            )
+            sys.exit(1)
+        if name in BACKENDS:
+            print(
+                f"Error: '{name}' is a built-in provider and cannot be overridden.",
+                file=sys.stderr,
+            )
+            sys.exit(1)
+        base_url = ""
+        default_model = ""
+        env_key = ""
+        pricing_input = 0.0
+        pricing_output = 0.0
+        i = 1
+        while i < len(add_args):
+            a = add_args[i]
+            if a == "--base-url" and i + 1 < len(add_args):
+                base_url = add_args[i + 1]
+                i += 2
+            elif a.startswith("--base-url="):
+                base_url = a.split("=", 1)[1]
+                i += 1
+            elif a == "--default-model" and i + 1 < len(add_args):
+                default_model = add_args[i + 1]
+                i += 2
+            elif a.startswith("--default-model="):
+                default_model = a.split("=", 1)[1]
+                i += 1
+            elif a == "--env-key" and i + 1 < len(add_args):
+                env_key = add_args[i + 1]
+                i += 2
+            elif a.startswith("--env-key="):
+                env_key = a.split("=", 1)[1]
+                i += 1
+            elif a == "--pricing-input" and i + 1 < len(add_args):
+                pricing_input = float(add_args[i + 1])
+                i += 2
+            elif a == "--pricing-output" and i + 1 < len(add_args):
+                pricing_output = float(add_args[i + 1])
+                i += 2
+            else:
+                i += 1
+        if not base_url or not default_model or not env_key:
+            print(
+                "Error: --base-url, --default-model, and --env-key are required.",
+                file=sys.stderr,
+            )
+            sys.exit(1)
+        global_path.parent.mkdir(parents=True, exist_ok=True)
+        existing = _read_provider_registry(global_path)
+        existing[name] = {
+            "base_url": base_url,
+            "default_model": default_model,
+            "env_key": env_key,
+            "pricing": {"input": pricing_input, "output": pricing_output},
+            "temperature": 0,
+        }
+        global_path.write_text(json.dumps(existing, indent=2) + "\n", encoding="utf-8")
+        print(f"Provider '{name}' added. Use with: graphify extract . --backend {name}")
+
+    elif subcmd == "remove":
+        name = args[1] if len(args) > 1 else ""
+        if not name:
+            print("Usage: graphify provider remove <name>", file=sys.stderr)
+            sys.exit(1)
+        existing = _read_provider_registry(global_path)
+        if name not in existing:
+            print(f"Provider '{name}' not found.", file=sys.stderr)
+            sys.exit(1)
+        del existing[name]
+        global_path.write_text(json.dumps(existing, indent=2) + "\n", encoding="utf-8")
+        print(f"Provider '{name}' removed.")
+
+    else:
+        print("Usage: graphify provider [add|list|show|remove]", file=sys.stderr)
+        if subcmd:
+            sys.exit(1)
+
+
 def main() -> None:
     for _stream in (sys.stdout, sys.stderr):
-        if _stream is not None and hasattr(_stream, "reconfigure"):
+        reconfigure = getattr(_stream, "reconfigure", None) if _stream is not None else None
+        if callable(reconfigure):
             try:
-                _stream.reconfigure(encoding="utf-8", errors="replace")
-            except Exception:
-                pass
+                reconfigure(encoding="utf-8", errors="replace")
+            except Exception as exc:
+                _ = exc
     # Check all known skill install locations for a stale version stamp.
     # Skip during install/uninstall (hook writes trigger a fresh check anyway).
     # Skip during hook-check — it runs on every editor tool use and must be silent.
@@ -1968,118 +2093,7 @@ def main() -> None:
             print("Usage: graphify antigravity [install|uninstall]", file=sys.stderr)
             sys.exit(1)
     elif cmd == "provider":
-        from graphify.llm import _custom_providers_path, BACKENDS
-        import json as _json
-        subcmd = sys.argv[2] if len(sys.argv) > 2 else ""
-        global_path = _custom_providers_path(global_=True)
-
-        if subcmd == "list":
-            global_path.parent.mkdir(parents=True, exist_ok=True)
-            existing: dict = {}
-            if global_path.is_file():
-                try:
-                    existing = _json.loads(global_path.read_text(encoding="utf-8"))
-                except Exception:
-                    pass
-            if not existing:
-                print("No custom providers registered.")
-            else:
-                for name in existing:
-                    print(f"  {name}  ({existing[name].get('base_url', '')})")
-
-        elif subcmd == "show":
-            name = sys.argv[3] if len(sys.argv) > 3 else ""
-            if not name:
-                print("Usage: graphify provider show <name>", file=sys.stderr)
-                sys.exit(1)
-            existing = {}
-            if global_path.is_file():
-                try:
-                    existing = _json.loads(global_path.read_text(encoding="utf-8"))
-                except Exception:
-                    pass
-            if name not in existing:
-                print(f"Provider '{name}' not found.", file=sys.stderr)
-                sys.exit(1)
-            print(_json.dumps({name: existing[name]}, indent=2))
-
-        elif subcmd == "add":
-            args = sys.argv[3:]
-            name = args[0] if args and not args[0].startswith("-") else ""
-            if not name:
-                print("Usage: graphify provider add <name> --base-url URL --default-model MODEL --env-key KEY", file=sys.stderr)
-                sys.exit(1)
-            if name in BACKENDS:
-                print(f"Error: '{name}' is a built-in provider and cannot be overridden.", file=sys.stderr)
-                sys.exit(1)
-            base_url = ""
-            default_model = ""
-            env_key = ""
-            pricing_input = 0.0
-            pricing_output = 0.0
-            i = 1
-            while i < len(args):
-                a = args[i]
-                if a == "--base-url" and i + 1 < len(args):
-                    base_url = args[i + 1]; i += 2
-                elif a.startswith("--base-url="):
-                    base_url = a.split("=", 1)[1]; i += 1
-                elif a == "--default-model" and i + 1 < len(args):
-                    default_model = args[i + 1]; i += 2
-                elif a.startswith("--default-model="):
-                    default_model = a.split("=", 1)[1]; i += 1
-                elif a == "--env-key" and i + 1 < len(args):
-                    env_key = args[i + 1]; i += 2
-                elif a.startswith("--env-key="):
-                    env_key = a.split("=", 1)[1]; i += 1
-                elif a == "--pricing-input" and i + 1 < len(args):
-                    pricing_input = float(args[i + 1]); i += 2
-                elif a == "--pricing-output" and i + 1 < len(args):
-                    pricing_output = float(args[i + 1]); i += 2
-                else:
-                    i += 1
-            if not base_url or not default_model or not env_key:
-                print("Error: --base-url, --default-model, and --env-key are required.", file=sys.stderr)
-                sys.exit(1)
-            global_path.parent.mkdir(parents=True, exist_ok=True)
-            existing = {}
-            if global_path.is_file():
-                try:
-                    existing = _json.loads(global_path.read_text(encoding="utf-8"))
-                except Exception:
-                    pass
-            existing[name] = {
-                "base_url": base_url,
-                "default_model": default_model,
-                "env_key": env_key,
-                "pricing": {"input": pricing_input, "output": pricing_output},
-                "temperature": 0,
-            }
-            global_path.write_text(_json.dumps(existing, indent=2) + "\n", encoding="utf-8")
-            print(f"Provider '{name}' added. Use with: graphify extract . --backend {name}")
-
-        elif subcmd == "remove":
-            name = sys.argv[3] if len(sys.argv) > 3 else ""
-            if not name:
-                print("Usage: graphify provider remove <name>", file=sys.stderr)
-                sys.exit(1)
-            existing = {}
-            if global_path.is_file():
-                try:
-                    existing = _json.loads(global_path.read_text(encoding="utf-8"))
-                except Exception:
-                    pass
-            if name not in existing:
-                print(f"Provider '{name}' not found.", file=sys.stderr)
-                sys.exit(1)
-            del existing[name]
-            global_path.write_text(_json.dumps(existing, indent=2) + "\n", encoding="utf-8")
-            print(f"Provider '{name}' removed.")
-
-        else:
-            print("Usage: graphify provider [add|list|show|remove]", file=sys.stderr)
-            if subcmd:
-                sys.exit(1)
+        _provider_cmd(sys.argv[2:])
     elif cmd == "prs":
         from graphify.prs import cmd_prs
 
@@ -4132,7 +4146,21 @@ def _progress(idx: int, total: int, _result: dict) -> None:
                     root=target,
                     multigraph=multigraph_flag,
                 )
-                _nc_to_json(_nc_graph, {}, str(graph_json_path), force=True)
+                # RISK 4 — Guard 1 signaling: to_json's empty-merge floor returns
+                # False (and PRESERVES the populated graph.json) when the merged
+                # graph has 0 nodes over a populated file. force=True bypasses the
+                # shrink guard (Guard 2), so under force the ONLY False return is
+                # that 0-node floor — never a legitimate non-zero shrink. Honor the
+                # refusal: do NOT fall through to the success line. A 0-node merge
+                # over a populated graph is an aborted extraction, so signal it
+                # (exit 1) instead of falsely reporting "wrote ... 0 nodes".
+                if not _nc_to_json(_nc_graph, {}, str(graph_json_path), force=True):
+                    print(
+                        "[graphify extract] extraction aborted: the merge produced an "
+                        "empty (0-node) graph; the previous graph.json was preserved.",
+                        file=sys.stderr,
+                    )
+                    sys.exit(1)
                 n_nodes = _nc_graph.number_of_nodes()
                 n_edges = _nc_graph.number_of_edges()
             elif resolved_multigraph:
@@ -4146,10 +4174,51 @@ def _progress(idx: int, total: int, _result: dict) -> None:
                 from graphify.export import to_json as _nc_to_json
 
                 _nc_graph = _build_from_json(merged, multigraph=True, root=target)
-                _nc_to_json(_nc_graph, {}, str(graph_json_path), force=True)
+                # RISK 4 — Guard 1 signaling (multigraph sibling): identical to the
+                # incremental site above. A non-incremental run can still see a
+                # populated graph.json on disk (graph.json present, manifest.json
+                # absent), so to_json's 0-node floor can refuse and preserve it.
+                # Honor the False return — exit 1 rather than print the misleading
+                # "wrote ... 0 nodes" success line. Under force=True the only False
+                # return is the 0-node floor, so a legitimate non-zero multigraph
+                # build (True) is completely unaffected.
+                if not _nc_to_json(_nc_graph, {}, str(graph_json_path), force=True):
+                    print(
+                        "[graphify extract] extraction aborted: the merge produced an "
+                        "empty (0-node) graph; the previous graph.json was preserved.",
+                        file=sys.stderr,
+                    )
+                    sys.exit(1)
                 n_nodes = _nc_graph.number_of_nodes()
                 n_edges = _nc_graph.number_of_edges()
             else:
+                # Empty-merge floor (RISK 4 — Guard 3): this raw write is the one
+                # no-cluster path that does NOT route through to_json (Guard 1) or
+                # _check_shrink (Guard 2), so a 0-node ``merged`` here would silently
+                # overwrite a populated graph.json — the exact failed/aborted-
+                # extraction wipe the clustered sibling already blocks via its
+                # ``if G.number_of_nodes() == 0`` exit. Refuse the overwrite when the
+                # merged extraction is empty AND an existing graph.json on disk is
+                # populated. Read the existing node count defensively: any error
+                # (missing/corrupt file) is treated as 0 nodes so a fresh or
+                # unreadable target leaves the floor inert and the write proceeds
+                # exactly as before (no new exit on a legitimately-empty fresh run).
+                if len(merged.get("nodes", [])) == 0 and graph_json_path.exists():
+                    try:
+                        _existing_n = len(
+                            json.loads(graph_json_path.read_text(encoding="utf-8")).get("nodes", [])
+                        )
+                    except Exception:
+                        _existing_n = 0
+                    if _existing_n > 0:
+                        print(
+                            f"[graphify] ERROR: refusing to overwrite a populated "
+                            f"graph.json ({_existing_n} nodes) with an EMPTY (0-node) "
+                            f"graph - this is a failed/aborted extraction, not a real "
+                            f"result. The previous graph is preserved.",
+                            file=sys.stderr,
+                        )
+                        sys.exit(1)
                 graph_json_path.write_text(json.dumps(merged, indent=2), encoding="utf-8")
                 n_nodes = len(merged["nodes"])
                 n_edges = len(merged["edges"])
diff --git a/graphify/analyze.py b/graphify/analyze.py
index c94406027..5004dc412 100644
--- a/graphify/analyze.py
+++ b/graphify/analyze.py
@@ -689,6 +689,7 @@ def find_import_cycles(
           "why": "circular dependency"
         }
     """
+
     def _endpoint_source_file(node_id: str) -> str:
         attrs = G.nodes.get(node_id, {})
         src_file = attrs.get("source_file", "")
@@ -760,10 +761,12 @@ def _endpoint_source_file(node_id: str) -> str:
 
     result: list[dict] = []
     for cycle in unique_cycles:
-        result.append({
-            "cycle": cycle,
-            "length": len(cycle),
-            "why": "circular dependency",
-        })
+        result.append(
+            {
+                "cycle": cycle,
+                "length": len(cycle),
+                "why": "circular dependency",
+            }
+        )
 
     return result
diff --git a/graphify/export.py b/graphify/export.py
index 302becc9d..e8344c698 100644
--- a/graphify/export.py
+++ b/graphify/export.py
@@ -559,6 +559,30 @@ def to_json(
 ) -> bool:
     # Safety check: refuse to silently shrink an existing graph (#479)
     existing_path = Path(output_path)
+    # Empty-merge floor (RISK 4): refuse to overwrite a populated graph.json
+    # (>0 nodes) with an EMPTY (0-node) graph. A 0-node write over a populated
+    # graph is a failed/aborted extraction, never a real result, so this floor
+    # engages REGARDLESS of force — no legitimate caller writes 0 nodes over a
+    # populated graph, and force=True is exactly the bug enabler. Read the
+    # existing node count defensively: any error (missing/corrupt file) is
+    # treated as 0 nodes so a corrupt existing file cannot crash the write
+    # (the floor then stays inert, which is acceptable — there is no verified
+    # populated graph to protect).
+    if existing_path.exists() and G.number_of_nodes() == 0:
+        try:
+            existing_data = json.loads(existing_path.read_text(encoding="utf-8"))
+            existing_n = len(existing_data.get("nodes", []))
+        except Exception:
+            existing_n = 0
+        if existing_n > 0:
+            print(
+                f"[graphify] ERROR: refusing to overwrite a populated graph.json "
+                f"({existing_n} nodes) with an EMPTY (0-node) graph - this is a "
+                f"failed/aborted extraction, not a real result. The previous "
+                f"graph is preserved.",
+                file=sys.stderr,
+            )
+            return False
     if not force and existing_path.exists():
         try:
             from graphify.security import check_graph_file_size_cap
diff --git a/graphify/extract.py b/graphify/extract.py
index 96413d6bf..0f4ad60f0 100644
--- a/graphify/extract.py
+++ b/graphify/extract.py
@@ -134,6 +134,7 @@ def _safe_extract(extractor: Callable, path: Path) -> dict:
     except Exception as e:
         if os.environ.get("GRAPHIFY_DEBUG"):
             import traceback
+
             traceback.print_exc(file=sys.stderr)
         print(f"  warning: skipped {path} ({type(e).__name__}: {e})", file=sys.stderr, flush=True)
         return {"nodes": [], "edges": [], "error": f"{type(e).__name__}: {e}"}
diff --git a/graphify/llm.py b/graphify/llm.py
index 743594401..7268059d9 100644
--- a/graphify/llm.py
+++ b/graphify/llm.py
@@ -137,8 +137,8 @@ def _load_custom_providers() -> dict[str, dict]:
                             if "pricing" not in cfg:
                                 cfg = dict(cfg, pricing={"input": 0.0, "output": 0.0})
                             providers[name] = cfg
-            except Exception:
-                pass
+            except Exception as exc:
+                _ = exc
     return providers
 
 
@@ -1284,7 +1284,16 @@ def detect_backend() -> str | None:
         _validate_ollama_base_url(ollama_url)
         return "ollama"
     for name in BACKENDS:
-        if name not in ("gemini", "kimi", "claude", "openai", "deepseek", "bedrock", "ollama", "claude-cli"):
+        if name not in (
+            "gemini",
+            "kimi",
+            "claude",
+            "openai",
+            "deepseek",
+            "bedrock",
+            "ollama",
+            "claude-cli",
+        ):
             if _get_backend_api_key(name):
                 return name
     return None
diff --git a/graphify/report.py b/graphify/report.py
index 80c214417..4445f0d5e 100644
--- a/graphify/report.py
+++ b/graphify/report.py
@@ -146,6 +146,7 @@ def generate(
 
     # Circular imports surfaced from file-level dependency graph.
     from .analyze import find_import_cycles
+
     cycles = find_import_cycles(G)
     lines += ["", "## Import Cycles"]
     if cycles:
diff --git a/graphify/watch.py b/graphify/watch.py
index b342e092b..0cf43dbdc 100644
--- a/graphify/watch.py
+++ b/graphify/watch.py
@@ -258,7 +258,53 @@ def _canonical_graph_for_compare(graph_data: dict) -> dict:
     # no-change short-circuit. Hyperedge topology is preserved via the
     # authoritative top-level "hyperedges" key sorted below.
     canonical.pop("graph", None)
-    for key in ("nodes", "links", "edges", "hyperedges"):
+    # Fold the legacy edge-list key so a graph.json written under the old "edges"
+    # key compares equal to a modern candidate keyed "links". Without this fold a
+    # legacy file ({"edges": [...]}) and the fresh candidate ({"links": [...]})
+    # canonicalise to dicts with DIFFERENT keys and the no-change short-circuit
+    # flaps — every watcher tick rewrites graph.json forever (RISK 3). Mirrors the
+    # edge handling in _canonical_topology_for_compare; the deliberate difference
+    # is that node fields (community/norm_label) are NOT stripped here so existing
+    # watch graph-compare tests keep passing.
+    if "links" not in canonical and "edges" in canonical:
+        canonical["links"] = canonical.pop("edges")
+    # Treat a missing hyperedges key as an empty list so a null-vs-[] history
+    # (older writers omitted the key; modern ones persist []) does not register
+    # as a change.
+    if "hyperedges" not in canonical:
+        canonical["hyperedges"] = []
+
+    links = canonical.get("links")
+    if isinstance(links, list):
+        norm_links = []
+        for edge in links:
+            if not isinstance(edge, dict):
+                norm_links.append(edge)
+                continue
+            e = dict(edge)
+            # to_json overwrites source/target with the canonical _src/_tgt before
+            # serialising, so the on-disk graph has no _src/_tgt while a candidate
+            # fresh from node_link_data still does. Pop and reassign so both sides
+            # compare on the same directed endpoints (existing gets no-op pops).
+            true_src = e.pop("_src", None)
+            true_tgt = e.pop("_tgt", None)
+            if true_src is not None and true_tgt is not None:
+                e["source"] = true_src
+                e["target"] = true_tgt
+            # VOLATILE: confidence_score is recomputed from confidence on every
+            # export, so a legacy file that stamped it must not differ from a
+            # candidate that has not yet had it recomputed.
+            e.pop("confidence_score", None)
+            # PRESERVE key: NetworkX guarantees `key` is unique within a
+            # (source, target) pair, so parallel edges differ only by key and must
+            # stay distinct in the sorted comparison. The json.dumps sort key below
+            # already includes it because we never strip it; this is explicit.
+            if "key" in edge:
+                e["key"] = edge["key"]
+            norm_links.append(e)
+        canonical["links"] = norm_links
+
+    for key in ("nodes", "links", "hyperedges"):
         if key in canonical and isinstance(canonical[key], list):
             canonical[key] = sorted(
                 canonical[key],
@@ -279,6 +325,19 @@ def _canonical_topology_for_compare(graph_data: dict) -> dict:
     # built_at_commit. Hyperedge topology is still compared via the authoritative
     # top-level "hyperedges" key normalised below.
     canonical.pop("graph", None)
+    # Fold the legacy edge-list key so a graph.json written under the old "edges"
+    # key compares equal to a modern candidate keyed "links". The clustered
+    # compare otherwise normalises "edges" and "links" under their OWN keys, so a
+    # legacy file ({"edges": [...]}) and a fresh candidate ({"links": [...]})
+    # canonicalise to dicts with DIFFERENT keys and the topology compare flaps —
+    # needlessly re-running cluster() on every tick (same root cause as the
+    # no-cluster RISK 3 flap fixed in _canonical_graph_for_compare).
+    if "links" not in canonical and "edges" in canonical:
+        canonical["links"] = canonical.pop("edges")
+    # Treat a missing hyperedges key as an empty list so a legacy file that
+    # dropped the key does not differ from a candidate carrying [].
+    if "hyperedges" not in canonical:
+        canonical["hyperedges"] = []
 
     nodes = canonical.get("nodes")
     if isinstance(nodes, list):
@@ -295,7 +354,7 @@ def _canonical_topology_for_compare(graph_data: dict) -> dict:
             key=lambda item: json.dumps(item, sort_keys=True, ensure_ascii=False, default=str),
         )
 
-    for key in ("links", "edges"):
+    for key in ("links",):
         items = canonical.get(key)
         if not isinstance(items, list):
             continue
@@ -470,10 +529,28 @@ def _check_shrink(
     a ``D`` in ``git diff --name-only``) and a smaller graph is the expected
     outcome — skip the guard so legitimate refactors don't require ``--force``.
     """
+    existing_n = len(existing_data.get("nodes", [])) if existing_data else 0
+    new_n = len(new_data.get("nodes", []))
+    # ABSOLUTE 0-floor: a populated graph must NEVER be overwritten with an empty
+    # (0-node) one. A 0-node candidate over a populated graph is the signature of
+    # a failed/aborted extraction (a crashed or half-written pass), not a real
+    # result — even a total delete-all leaves the corpus build at the "no code
+    # files" early return rather than a 0-node write. This floor sits BEFORE the
+    # force/had_explicit_deletions short-circuit on purpose: neither --force nor a
+    # declared deletion is a license to wipe a populated graph to nothing.
+    if existing_n > 0 and new_n == 0:
+        if tmp is not None:
+            tmp.unlink(missing_ok=True)
+        print(
+            f"[graphify] ERROR: refusing to overwrite a populated graph.json "
+            f"({existing_n} nodes) with an EMPTY (0-node) graph - this is a "
+            f"failed/aborted extraction, not a real result. The previous graph "
+            f"is preserved.",
+            file=sys.stderr,
+        )
+        return False
     if force or not existing_data or had_explicit_deletions:
         return True
-    existing_n = len(existing_data.get("nodes", []))
-    new_n = len(new_data.get("nodes", []))
     if new_n < existing_n:
         if tmp is not None:
             tmp.unlink(missing_ok=True)
@@ -913,6 +990,14 @@ def _rebuild_code(
             + "\n"
         )
         graph_tmp = out / ".graph.tmp.json"
+        # NOTE: Guard 1 in to_json (the empty-merge floor that refuses to
+        # overwrite a populated graph.json with 0 nodes) is INERT here.
+        # graph_tmp is a not-yet-existing temp file, so existing_path.exists()
+        # is False and the guard never engages.  The RISK 4 protection on this
+        # code path comes from Guard 2 (_check_shrink, called below), which
+        # fires before the force/had_explicit_deletions short-circuit and
+        # refuses the graph_tmp.replace(existing_graph) when the candidate has
+        # 0 nodes and the on-disk graph is populated.
         json_written = to_json(G, communities, str(graph_tmp), force=True, built_at_commit=commit)
         if not json_written:
             return False
diff --git a/tests/test_analyze.py b/tests/test_analyze.py
index e93cd573a..478d9a924 100644
--- a/tests/test_analyze.py
+++ b/tests/test_analyze.py
@@ -812,6 +812,8 @@ def test_god_nodes_filter_is_case_insensitive():
     labels = [r["label"] for r in result]
     for variant in ("Start", "START", "Name", "ID"):
         assert variant not in labels, f"`{variant}` should be filtered as JSON-key noise"
+
+
 # ── find_import_cycles tests ──────────────────────────────────────────────────
 
 
@@ -854,7 +856,9 @@ def _make_cycle_graph_directed() -> nx.DiGraph:
     G.add_edge(a_id, ext_id, relation="contains", source_file="src/a.ts", confidence="EXTRACTED")
 
     # Edge whose target has no source_file: must be skipped, no garbage label fallback
-    G.add_edge(a_id, ext_id, relation="imports_from", source_file="src/a.ts", confidence="EXTRACTED")
+    G.add_edge(
+        a_id, ext_id, relation="imports_from", source_file="src/a.ts", confidence="EXTRACTED"
+    )
 
     return G
 
diff --git a/tests/test_detect.py b/tests/test_detect.py
index 0357bab57..cdc0708f9 100644
--- a/tests/test_detect.py
+++ b/tests/test_detect.py
@@ -531,9 +531,11 @@ def test_negation_ancestor_itself_reincluded(tmp_path):
 
 # Regression tests for #1087 - anchored patterns must not match basename deep in tree
 
+
 def test_anchored_dir_not_matched_at_depth(tmp_path):
     """/inbox/ must not match src/inbox/ — only inbox/ at the anchor root."""
     from graphify.detect import _is_ignored, _load_graphifyignore
+
     src_inbox = tmp_path / "src" / "inbox"
     src_inbox.mkdir(parents=True)
     f = src_inbox / "main.rs"
@@ -551,35 +553,32 @@ def test_anchored_dir_not_matched_at_depth(tmp_path):
 def test_anchored_dir_matches_at_root(tmp_path):
     """/inbox/ must still match inbox/ at the anchor root (positive case)."""
     from graphify.detect import _is_ignored, _load_graphifyignore
+
     inbox = tmp_path / "inbox"
     inbox.mkdir()
     f = inbox / "data.json"
     f.write_text("{}")
     (tmp_path / ".graphifyignore").write_text("/inbox/\n")
     patterns = _load_graphifyignore(tmp_path)
-    assert _is_ignored(f, tmp_path, patterns), (
-        "inbox/data.json must be ignored by /inbox/"
-    )
-    assert _is_ignored(inbox, tmp_path, patterns), (
-        "inbox/ must be ignored by /inbox/"
-    )
+    assert _is_ignored(f, tmp_path, patterns), "inbox/data.json must be ignored by /inbox/"
+    assert _is_ignored(inbox, tmp_path, patterns), "inbox/ must be ignored by /inbox/"
 
 
 def test_anchored_file_not_matched_at_depth(tmp_path):
     """/build must not match src/build."""
     from graphify.detect import _is_ignored, _load_graphifyignore
+
     src_build = tmp_path / "src" / "build"
     src_build.mkdir(parents=True)
     (tmp_path / ".graphifyignore").write_text("/build\n")
     patterns = _load_graphifyignore(tmp_path)
-    assert not _is_ignored(src_build, tmp_path, patterns), (
-        "src/build must NOT be ignored by /build"
-    )
+    assert not _is_ignored(src_build, tmp_path, patterns), "src/build must NOT be ignored by /build"
 
 
 def test_unanchored_dir_still_matches_at_depth(tmp_path):
     """inbox/ (no leading /) must still match src/inbox/ anywhere in the tree."""
     from graphify.detect import _is_ignored, _load_graphifyignore
+
     src_inbox = tmp_path / "src" / "inbox"
     src_inbox.mkdir(parents=True)
     f = src_inbox / "main.rs"
@@ -594,6 +593,7 @@ def test_unanchored_dir_still_matches_at_depth(tmp_path):
 def test_anchored_multi_segment_pattern(tmp_path):
     """/src/inbox/ must match src/inbox/ but not x/src/inbox/."""
     from graphify.detect import _is_ignored, _load_graphifyignore
+
     (tmp_path / "src" / "inbox").mkdir(parents=True)
     (tmp_path / "x" / "src" / "inbox").mkdir(parents=True)
     target_ok = tmp_path / "src" / "inbox" / "a.py"
diff --git a/tests/test_export.py b/tests/test_export.py
index 7fe740729..7e1269c2e 100644
--- a/tests/test_export.py
+++ b/tests/test_export.py
@@ -404,3 +404,117 @@ def test_to_json_simple_graph_regression():
     assert data["multigraph"] is False
     for node in data["nodes"]:
         assert "id" in node and "community" in node
+
+
+# ── RISK 4: empty-merge floor in to_json (Guard 1) ───────────────────────────
+#
+# to_json must refuse to overwrite a populated on-disk graph.json (>0 nodes)
+# with an EMPTY (0-node) graph — a 0-node write over a populated graph is a
+# failed/aborted extraction, never a real result. This floor engages
+# REGARDLESS of force=True (force is the bug enabler here), and only when the
+# *new* graph has 0 nodes AND the existing file is populated. It must NOT block
+# a fresh empty write (no existing file), a non-zero dedup shrink, or a
+# 0-over-0 write (nothing populated to protect).
+
+
+def test_to_json_floor_blocks_zero_over_populated_even_with_force(tmp_path):
+    """Existing populated graph.json + a 0-node graph with force=True must be
+    refused (return False) and leave the on-disk graph untouched. This is the
+    RED-before-fix case: without the floor, force=True wipes 4 nodes to 0."""
+    out = tmp_path / "graph.json"
+
+    # Seed a populated graph.json (4 nodes) via the real write path.
+    populated = build_from_json(_build_extraction())
+    assert populated.number_of_nodes() == 4
+    assert to_json(populated, cluster(populated), str(out), force=True) is True
+    assert len(json.loads(out.read_text())["nodes"]) == 4
+
+    # Attempt to overwrite with a 0-node graph, force=True.
+    empty = nx.Graph()
+    assert empty.number_of_nodes() == 0
+    assert to_json(empty, {}, str(out), force=True) is False
+
+    # The previous populated graph is preserved on disk.
+    assert len(json.loads(out.read_text())["nodes"]) == 4
+
+
+def test_to_json_floor_blocks_zero_over_populated_without_force(tmp_path, capsys):
+    """Guard 1 (not the pre-existing shrink guard) fires for force=False + 0-node
+    over populated.  Pre-fix the shrink guard fired and emitted a WARNING; Guard 1
+    emits a distinct ERROR message.  Asserting the exact Guard-1 text makes this
+    test red-before-fix / green-after-fix, eliminating the vacuousness identified
+    by the bug-hunter."""
+    out = tmp_path / "graph.json"
+
+    populated = build_from_json(_build_extraction())
+    assert populated.number_of_nodes() == 4
+    assert to_json(populated, cluster(populated), str(out), force=True) is True
+    assert len(json.loads(out.read_text())["nodes"]) == 4
+
+    empty = nx.Graph()
+    result = to_json(empty, {}, str(out), force=False)
+
+    # Guard 1 must have fired: return False and preserve the on-disk graph.
+    assert result is False
+    assert len(json.loads(out.read_text())["nodes"]) == 4
+
+    # The exact Guard-1 ERROR message must appear on stderr.  Pre-fix the shrink
+    # guard fires instead and emits a WARNING with different text, making the
+    # assertion below fail on unfixed code.
+    captured = capsys.readouterr()
+    assert (
+        "[graphify] ERROR: refusing to overwrite a populated graph.json "
+        "(4 nodes) with an EMPTY (0-node) graph - this is a "
+        "failed/aborted extraction, not a real result. The previous "
+        "graph is preserved."
+    ) in captured.err
+
+
+def test_to_json_allows_fresh_empty_no_existing_file(tmp_path):
+    """A7: no existing file + 0-node graph + force=True is allowed — the floor
+    must NOT engage when existing_path.exists() is False. Writes a valid
+    0-node graph.json."""
+    out = tmp_path / "graph.json"
+    assert not out.exists()
+
+    empty = nx.Graph()
+    assert to_json(empty, {}, str(out), force=True) is True
+
+    data = json.loads(out.read_text())
+    assert data["nodes"] == []
+
+
+def test_to_json_allows_nonzero_dedup_shrink_with_force(tmp_path):
+    """A10: existing 4 nodes, new 2-node graph, force=True is allowed — only a
+    new graph with 0 nodes trips the floor. A non-zero dedup/shrink under force
+    is a legitimate result."""
+    out = tmp_path / "graph.json"
+
+    populated = build_from_json(_build_extraction())
+    assert populated.number_of_nodes() == 4
+    assert to_json(populated, cluster(populated), str(out), force=True) is True
+    assert len(json.loads(out.read_text())["nodes"]) == 4
+
+    smaller = nx.Graph()
+    smaller.add_node("a")
+    smaller.add_node("b")
+    assert smaller.number_of_nodes() == 2
+    assert to_json(smaller, {}, str(out), force=True) is True
+
+    assert len(json.loads(out.read_text())["nodes"]) == 2
+
+
+def test_to_json_allows_zero_over_empty_existing(tmp_path):
+    """An existing file with 0 nodes + a new 0-node graph is allowed — there is
+    nothing populated to protect, so the floor must NOT engage."""
+    out = tmp_path / "graph.json"
+
+    # Seed a 0-node graph.json (no existing file → floor inert on first write).
+    first_empty = nx.Graph()
+    assert to_json(first_empty, {}, str(out), force=True) is True
+    assert json.loads(out.read_text())["nodes"] == []
+
+    # Overwrite 0-over-0: allowed.
+    second_empty = nx.Graph()
+    assert to_json(second_empty, {}, str(out), force=True) is True
+    assert json.loads(out.read_text())["nodes"] == []
diff --git a/tests/test_extract_cli.py b/tests/test_extract_cli.py
index 36116a5d9..4d4be4087 100644
--- a/tests/test_extract_cli.py
+++ b/tests/test_extract_cli.py
@@ -470,3 +470,211 @@ def _one_chunk_succeeded(paths, **kwargs):
     assert (out_dir / "graphify-out" / "graph.json").exists(), (
         "graph.json must be written on the happy path"
     )
+
+
+def test_extract_no_cluster_refuses_to_zero_populated_graph(monkeypatch, tmp_path, capsys):
+    """RISK 4 — Guard 3: the non-incremental no-cluster simple path must NOT wipe a
+    populated graph.json with a 0-node extraction.
+
+    The bug: with an existing populated (simple) graph.json but NO manifest.json
+    (so the run is non-incremental) the ``--no-cluster`` branch falls to the raw
+    ``graph_json_path.write_text(json.dumps(merged, ...))`` ``else`` case. That raw
+    write bypasses both existing empty-merge guards (``export.to_json`` /
+    ``watch._check_shrink``). When AST extraction aborts (returns 0 nodes) the raw
+    write overwrites the saved graph with an EMPTY one — a failed extraction
+    silently destroys real data. The clustered sibling already guards this with
+    ``if G.number_of_nodes() == 0: ... sys.exit(1)``; the no-cluster simple path
+    must do the same. The command must instead exit non-zero, print the byte-
+    identical guard message, and leave the populated graph.json untouched.
+    """
+    corpus = _make_code_corpus(tmp_path)
+    out = corpus / "graphify-out"
+    out.mkdir(exist_ok=True)
+    graph_json = out / "graph.json"
+
+    # Seed a POPULATED *simple* graph.json the way the pipeline persists it
+    # (build_from_json default-simple -> to_json). Simple (not multigraph) so the
+    # sticky profile resolves to non-multigraph and the run takes the raw-write
+    # ``else`` branch — exactly the unguarded site. NO manifest.json is written,
+    # so the run is non-incremental (the path the incremental build_merge floor
+    # never protects).
+    seed_nodes = [
+        {
+            "id": n,
+            "label": f"{n}()",
+            "file_type": "code",
+            "source_file": "app.py",
+            "source_location": "L1",
+        }
+        for n in ("main", "helper", "extra")
+    ]
+    seed_edges = [
+        {
+            "source": "main",
+            "target": "helper",
+            "relation": "calls",
+            "confidence": "EXTRACTED",
+            "source_file": "app.py",
+            "source_location": "L5",
+        }
+    ]
+    G_seed = build_from_json({"nodes": seed_nodes, "edges": seed_edges})
+    assert not G_seed.is_multigraph(), "seed must be a simple graph (non-multigraph)"
+    to_json(G_seed, {0: ["main", "helper", "extra"]}, str(graph_json), force=True)
+    before = json.loads(graph_json.read_text(encoding="utf-8"))
+    seeded_n = len(before.get("nodes", []))
+    assert seeded_n == 3, "seed graph.json must start populated with 3 nodes"
+    assert before.get("multigraph") is False, "seed graph.json must be simple"
+    assert not (out / "manifest.json").exists(), "no manifest → non-incremental run"
+
+    # Force the AST extraction to abort so the merged extraction yields 0 nodes.
+    # This mirrors the real trigger (a parser/extractor blowing up): the extract
+    # handler's ``except`` resets ast_result to an empty dict, and a code-only
+    # corpus has no semantic pass, so ``merged`` collapses to 0 nodes. The extract
+    # handler imports ``extract`` from graphify.extract at call time, so patching
+    # the source symbol is picked up.
+    def _ast_boom(paths, **kwargs):
+        raise RuntimeError("simulated AST extractor failure (parser crash)")
+
+    import graphify.extract as _extract_mod
+
+    monkeypatch.setattr(_extract_mod, "extract", _ast_boom)
+    monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-test-fake-key")  # code-only: LLM never called
+    monkeypatch.setattr(mainmod, "_check_skill_version", lambda _: None)
+    monkeypatch.setattr(
+        mainmod.sys,
+        "argv",
+        ["graphify", "extract", str(corpus), "--backend", "claude", "--no-cluster"],
+    )
+
+    with pytest.raises(SystemExit) as exc_info:
+        mainmod.main()
+    assert exc_info.value.code == 1, (
+        f"a 0-node no-cluster extraction over a populated graph must exit 1, "
+        f"got {exc_info.value.code}"
+    )
+
+    err = capsys.readouterr().err
+    # Byte-identical to the Guard 1 / Guard 2 message.
+    assert (
+        f"[graphify] ERROR: refusing to overwrite a populated graph.json "
+        f"({seeded_n} nodes) with an EMPTY (0-node) graph - this is a "
+        f"failed/aborted extraction, not a real result. The previous graph "
+        f"is preserved." in err
+    ), f"guard message must match Guards 1/2 byte-for-byte, got: {err!r}"
+
+    # The populated graph.json must be PRESERVED — not wiped to an empty graph.
+    after = json.loads(graph_json.read_text(encoding="utf-8"))
+    assert len(after.get("nodes", [])) == seeded_n, (
+        "the populated graph.json must NOT be overwritten with a 0-node graph"
+    )
+
+
+def test_extract_no_cluster_incremental_zero_merge_exits_nonzero_and_preserves_graph(
+    monkeypatch, tmp_path, capsys
+):
+    """RISK 4 — Guard 1 signaling gap: the INCREMENTAL no-cluster path must SIGNAL
+    failure (exit non-zero, no false-success line) when the merge yields 0 nodes.
+
+    The incremental no-cluster branch writes through
+    ``to_json(_nc_graph, {}, ..., force=True)`` (Guard 1). When ``build_merge``
+    collapses to a 0-node graph over a populated graph.json, Guard 1's empty-merge
+    floor correctly *returns False and PRESERVES the data* — but the caller ignored
+    that return value: it fell through, printed the success line
+    ``[graphify extract] wrote ... graph.json — 0 nodes, 0 edges (no clustering)``
+    and exited 0. The data was safe, but a failed/aborted extraction reported a
+    misleading false success (wrong exit code + message).
+
+    The fix captures Guard 1's ``False`` return at the no-cluster incremental write
+    site and, on refusal only, emits an aborted-extraction stderr note and exits 1
+    — never the bogus "wrote ... 0 nodes" success line. A populated graph.json plus
+    a manifest.json makes the run incremental; ``build_merge`` is forced to yield an
+    empty graph to model the aborted/pruned-to-empty merge. The legitimate sticky
+    no-cluster case (``test_extract_multigraph_no_cluster_sticky_idempotent``) keeps
+    exit 0 because ``build_merge`` preserves the existing nodes there (True return).
+    """
+    from graphify.detect import save_manifest
+
+    corpus = _make_code_corpus(tmp_path)
+    out = corpus / "graphify-out"
+    out.mkdir(exist_ok=True)
+    graph_json = out / "graph.json"
+
+    # Seed a POPULATED *simple* graph.json the way the pipeline persists it.
+    seed_nodes = [
+        {
+            "id": n,
+            "label": f"{n}()",
+            "file_type": "code",
+            "source_file": "app.py",
+            "source_location": "L1",
+        }
+        for n in ("main", "helper", "extra")
+    ]
+    seed_edges = [
+        {
+            "source": "main",
+            "target": "helper",
+            "relation": "calls",
+            "confidence": "EXTRACTED",
+            "source_file": "app.py",
+            "source_location": "L5",
+        }
+    ]
+    G_seed = build_from_json({"nodes": seed_nodes, "edges": seed_edges})
+    assert not G_seed.is_multigraph(), "seed must be a simple graph (non-multigraph)"
+    to_json(G_seed, {0: ["main", "helper", "extra"]}, str(graph_json), force=True)
+
+    # A manifest.json alongside the populated graph.json makes the run INCREMENTAL
+    # (incremental_mode = manifest.exists() and graph.json.exists()), so the write
+    # routes through the incremental ``to_json(..., force=True)`` site, not the
+    # raw-write else-branch the Guard 3 sibling covers.
+    save_manifest(
+        {"code": [str(corpus / "app.py")]},
+        manifest_path=str(out / "manifest.json"),
+        kind="both",
+    )
+
+    before = json.loads(graph_json.read_text(encoding="utf-8"))
+    seeded_n = len(before.get("nodes", []))
+    assert seeded_n == 3, "seed graph.json must start populated with 3 nodes"
+    assert before.get("multigraph") is False, "seed graph.json must be simple"
+    assert (out / "manifest.json").exists(), "manifest → incremental run"
+
+    # Force the incremental merge to yield a 0-node graph (aborted / pruned-to-empty
+    # extraction). The no-cluster incremental branch imports build_merge from
+    # graphify.build at call time, so patching the source symbol is picked up.
+    def _empty_merge(*args, **kwargs):
+        return build_from_json({"nodes": [], "edges": []})
+
+    import graphify.build as _build_mod
+
+    monkeypatch.setattr(_build_mod, "build_merge", _empty_merge)
+    monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-test-fake-key")  # code-only: LLM never called
+    monkeypatch.setattr(mainmod, "_check_skill_version", lambda _: None)
+    monkeypatch.setattr(
+        mainmod.sys,
+        "argv",
+        ["graphify", "extract", str(corpus), "--backend", "claude", "--no-cluster"],
+    )
+
+    with pytest.raises(SystemExit) as exc_info:
+        mainmod.main()
+    assert exc_info.value.code == 1, (
+        f"a 0-node incremental no-cluster merge over a populated graph must exit 1, "
+        f"got {exc_info.value.code}"
+    )
+
+    captured = capsys.readouterr()
+    # The misleading false-success line must NOT be printed.
+    assert "0 nodes, 0 edges" not in captured.out, (
+        f"a 0-node aborted merge must NOT print the 'wrote ... 0 nodes' success "
+        f"line, got stdout: {captured.out!r}"
+    )
+
+    # The populated graph.json must be PRESERVED — not wiped to an empty graph.
+    after = json.loads(graph_json.read_text(encoding="utf-8"))
+    assert len(after.get("nodes", [])) == seeded_n, (
+        "the populated graph.json must NOT be overwritten with a 0-node graph"
+    )
+    assert after == before, "graph.json must be byte-for-byte unchanged after the refused write"
diff --git a/tests/test_provider_registry.py b/tests/test_provider_registry.py
index bbf082ca8..f2a8b2fe4 100644
--- a/tests/test_provider_registry.py
+++ b/tests/test_provider_registry.py
@@ -1,6 +1,4 @@
 import json
-import pytest
-from pathlib import Path
 
 
 def test_custom_provider_add_list_show_remove(tmp_path, monkeypatch):
@@ -9,18 +7,28 @@ def test_custom_provider_add_list_show_remove(tmp_path, monkeypatch):
     providers_file.write_text("{}", encoding="utf-8")
 
     from graphify import llm
-    monkeypatch.setattr(llm, "_custom_providers_path", lambda global_=True: providers_file if global_ else tmp_path / "local.json")
+
+    monkeypatch.setattr(
+        llm,
+        "_custom_providers_path",
+        lambda global_=True: providers_file if global_ else tmp_path / "local.json",
+    )
     monkeypatch.setattr(llm, "BACKENDS", {**llm.BACKENDS})
 
-    providers_file.write_text(json.dumps({
-        "nvidia": {
-            "base_url": "https://integrate.api.nvidia.com/v1",
-            "default_model": "minimaxai/minimax-m2.7",
-            "env_key": "NVIDIA_API_KEY",
-            "pricing": {"input": 0.0, "output": 0.0},
-            "temperature": 0,
-        }
-    }), encoding="utf-8")
+    providers_file.write_text(
+        json.dumps(
+            {
+                "nvidia": {
+                    "base_url": "https://integrate.api.nvidia.com/v1",
+                    "default_model": "minimaxai/minimax-m2.7",
+                    "env_key": "NVIDIA_API_KEY",
+                    "pricing": {"input": 0.0, "output": 0.0},
+                    "temperature": 0,
+                }
+            }
+        ),
+        encoding="utf-8",
+    )
 
     loaded = llm._load_custom_providers()
     assert "nvidia" in loaded
@@ -30,19 +38,27 @@ def test_custom_provider_add_list_show_remove(tmp_path, monkeypatch):
 def test_custom_provider_pricing_defaults_to_zero(tmp_path):
     """Missing pricing field defaults to zero so estimate_cost doesn't blow up."""
     providers_file = tmp_path / "providers.json"
-    providers_file.write_text(json.dumps({
-        "mymodel": {
-            "base_url": "http://localhost:8080/v1",
-            "default_model": "llama3",
-            "env_key": "MY_API_KEY",
-        }
-    }), encoding="utf-8")
+    providers_file.write_text(
+        json.dumps(
+            {
+                "mymodel": {
+                    "base_url": "http://localhost:8080/v1",
+                    "default_model": "llama3",
+                    "env_key": "MY_API_KEY",
+                }
+            }
+        ),
+        encoding="utf-8",
+    )
 
     from graphify import llm
-    import importlib
     from unittest.mock import patch
 
-    with patch.object(llm, "_custom_providers_path", side_effect=lambda global_=True: providers_file if global_ else tmp_path / "local.json"):
+    with patch.object(
+        llm,
+        "_custom_providers_path",
+        side_effect=lambda global_=True: providers_file if global_ else tmp_path / "local.json",
+    ):
         loaded = llm._load_custom_providers()
 
     assert "mymodel" in loaded
@@ -52,18 +68,27 @@ def test_custom_provider_pricing_defaults_to_zero(tmp_path):
 def test_custom_provider_cannot_shadow_builtin(tmp_path):
     """Built-in provider names are protected from being overridden."""
     providers_file = tmp_path / "providers.json"
-    providers_file.write_text(json.dumps({
-        "claude": {
-            "base_url": "http://evil.example.com/v1",
-            "default_model": "evil-model",
-            "env_key": "EVIL_KEY",
-        }
-    }), encoding="utf-8")
+    providers_file.write_text(
+        json.dumps(
+            {
+                "claude": {
+                    "base_url": "http://evil.example.com/v1",
+                    "default_model": "evil-model",
+                    "env_key": "EVIL_KEY",
+                }
+            }
+        ),
+        encoding="utf-8",
+    )
 
     from graphify import llm
     from unittest.mock import patch
 
-    with patch.object(llm, "_custom_providers_path", side_effect=lambda global_=True: providers_file if global_ else tmp_path / "local.json"):
+    with patch.object(
+        llm,
+        "_custom_providers_path",
+        side_effect=lambda global_=True: providers_file if global_ else tmp_path / "local.json",
+    ):
         loaded = llm._load_custom_providers()
 
     assert "claude" not in loaded
@@ -73,19 +98,30 @@ def test_detect_backend_custom_provider_after_builtins(monkeypatch):
     """Custom providers appear after all built-ins in detect_backend() priority."""
     from graphify import llm
 
-    monkeypatch.setattr(llm, "BACKENDS", {
-        **llm.BACKENDS,
-        "myprovider": {
-            "base_url": "http://example.com/v1",
-            "default_model": "mymodel",
-            "env_key": "MY_CUSTOM_KEY",
-            "pricing": {"input": 0.0, "output": 0.0},
-            "temperature": 0,
-        }
-    })
+    monkeypatch.setattr(
+        llm,
+        "BACKENDS",
+        {
+            **llm.BACKENDS,
+            "myprovider": {
+                "base_url": "http://example.com/v1",
+                "default_model": "mymodel",
+                "env_key": "MY_CUSTOM_KEY",
+                "pricing": {"input": 0.0, "output": 0.0},
+                "temperature": 0,
+            },
+        },
+    )
     monkeypatch.setenv("MY_CUSTOM_KEY", "test-key")
-    for key in ("GEMINI_API_KEY", "GOOGLE_API_KEY", "MOONSHOT_API_KEY", "ANTHROPIC_API_KEY",
-                 "OPENAI_API_KEY", "DEEPSEEK_API_KEY", "OLLAMA_BASE_URL"):
+    for key in (
+        "GEMINI_API_KEY",
+        "GOOGLE_API_KEY",
+        "MOONSHOT_API_KEY",
+        "ANTHROPIC_API_KEY",
+        "OPENAI_API_KEY",
+        "DEEPSEEK_API_KEY",
+        "OLLAMA_BASE_URL",
+    ):
         monkeypatch.delenv(key, raising=False)
     monkeypatch.delenv("AWS_PROFILE", raising=False)
     monkeypatch.delenv("AWS_REGION", raising=False)
diff --git a/tests/test_watch.py b/tests/test_watch.py
index 71a04350f..6dcc7d4a6 100644
--- a/tests/test_watch.py
+++ b/tests/test_watch.py
@@ -1320,3 +1320,222 @@ def same_relationship(edge: dict) -> bool:
     matching = [edge for edge in rebuilt["links"] if same_relationship(edge)]
     assert {edge.get("key") for edge in matching} == {"parallel-a", "parallel-b"}
     assert len(matching) == 2
+
+
+# --- RISK 3: no-cluster compare must not flap on a legacy edges-keyed graph.json ---
+
+
+def _downgrade_to_legacy_edges(graph_path: Path) -> None:
+    """Rewrite ``graph_path`` in the pre-modern on-disk shape that triggered the
+    no-cluster flap: the edge list keyed as ``edges`` (not ``links``), a
+    ``confidence_score`` stamped on every edge (a recomputed/volatile field), and
+    the top-level ``hyperedges`` key dropped entirely (null-vs-[] history).
+
+    All three deviations are required to reproduce the full bug: a fixture that
+    only renames ``links``->``edges`` (without injecting ``confidence_score`` and
+    without dropping ``hyperedges``) would falsely pass once the key fold lands,
+    masking the volatile-field and missing-hyperedges legs of the same flap.
+    """
+    data = json.loads(graph_path.read_text(encoding="utf-8"))
+    links = data.pop("links", data.pop("edges", []))
+    data["edges"] = [{**edge, "confidence_score": 0.9} for edge in links]
+    data.pop("hyperedges", None)
+    graph_path.write_text(json.dumps(data, indent=2) + "\n", encoding="utf-8")
+
+
+@pytest.mark.skipif(sys.platform == "win32", reason="git CLI behaviour varies on Windows runners")
+def test_rebuild_code_no_cluster_does_not_flap_on_legacy_edges_key(tmp_path):
+    """RISK 3: a no-op ``--no-cluster`` rebuild over a graph.json written in the
+    legacy ``edges``-keyed shape must detect "no change" and leave graph.json
+    byte-for-byte untouched (no flap).
+
+    The legacy downgrade renames ``links``->``edges``, stamps a volatile
+    ``confidence_score`` on each edge, and drops the top-level ``hyperedges``
+    key. ``_canonical_graph_for_compare`` must fold all three back so the
+    on-disk legacy graph compares EQUAL to the freshly-extracted candidate;
+    otherwise every watcher tick rewrites graph.json forever.
+    """
+    from graphify.watch import _rebuild_code
+
+    repo = tmp_path / "corpus"
+    repo.mkdir()
+    _git_init(repo)
+    (repo / "app.py").write_text(
+        "def alpha():\n    return 1\n\ndef beta():\n    return alpha()\n",
+        encoding="utf-8",
+    )
+
+    cwd = os.getcwd()
+    try:
+        os.chdir(repo)
+        # Real build to get an authentic no-cluster graph.json for this corpus.
+        assert _rebuild_code(repo, no_cluster=True, acquire_lock=False) is True
+        graph_path = repo / "graphify-out" / "graph.json"
+        assert graph_path.exists()
+
+        # The idempotence requirement is >=3 consecutive no-op rebuilds.  We
+        # re-apply the legacy downgrade IMMEDIATELY before each measured run
+        # because a buggy _rebuild_code rewrites edges->links on the first
+        # flap; without re-downgrading, subsequent runs would compare links vs
+        # links and falsely pass (masking the regression).
+        #
+        # Each run captures its own before-state (mtime + bytes) AFTER the
+        # downgrade but BEFORE the sleep+rebuild, then asserts the rebuild
+        # leaves the file untouched.  Comparing within each run (not across
+        # runs) is correct because _downgrade_to_legacy_edges itself writes the
+        # file and therefore changes its mtime — only the rebuild must be a
+        # no-op.
+        for run_idx in range(3):
+            _downgrade_to_legacy_edges(graph_path)
+            pre_bytes = graph_path.read_bytes()
+            pre_mtime = graph_path.stat().st_mtime_ns
+
+            # No source change — a correct compare must short-circuit to "no change".
+            time.sleep(0.01)  # ensure any rewrite would move mtime measurably
+            assert _rebuild_code(repo, no_cluster=True, acquire_lock=False) is True
+
+            post_bytes = graph_path.read_bytes()
+            post_mtime = graph_path.stat().st_mtime_ns
+
+            assert post_bytes == pre_bytes, (
+                f"run {run_idx + 1}: legacy edges-keyed graph.json was rewritten "
+                "on a no-op no-cluster rebuild — the canonical compare flapped "
+                "(edges->links / confidence_score / missing-hyperedges not folded)"
+            )
+            assert post_mtime == pre_mtime, (
+                f"run {run_idx + 1}: graph.json mtime changed on a no-op rebuild (flap)"
+            )
+    finally:
+        os.chdir(cwd)
+
+
+# --- RISK 4 Guard 2: a failed/aborted extraction must not wipe a populated graph ---
+
+
+@pytest.mark.skipif(sys.platform == "win32", reason="git CLI behaviour varies on Windows runners")
+def test_watch_no_cluster_delete_all_preserves_graph(tmp_path, monkeypatch):
+    """RISK 4: when a declared-deletion rebuild ends with 0 nodes because the
+    remaining files' extraction aborted (a failed/half-written extraction, not a
+    real empty result), the no-cluster raw-write site must REFUSE to overwrite a
+    populated graph.json and preserve the previous graph.
+
+    Reproduction: a two-file corpus is built, ``a.py`` is deleted (declared via
+    ``changed_paths`` so ``had_explicit_deletions`` is True and the existing
+    shrink guard is bypassed), ``b.py`` stays on disk (so the "no code files"
+    early return does NOT fire), and ``extract`` is stubbed to return nothing
+    (the aborted extraction). Without the 0-floor the graph is wiped to 0 nodes.
+    """
+    import graphify.extract as extract_mod
+    from graphify.watch import _rebuild_code
+
+    repo = tmp_path / "corpus"
+    repo.mkdir()
+    _git_init(repo)
+    (repo / "a.py").write_text("def f():\n    return 1\n", encoding="utf-8")
+    (repo / "b.py").write_text("def g():\n    return 2\n", encoding="utf-8")
+
+    cwd = os.getcwd()
+    try:
+        os.chdir(repo)
+        assert _rebuild_code(repo, no_cluster=True, acquire_lock=False) is True
+        graph_path = repo / "graphify-out" / "graph.json"
+        before = json.loads(graph_path.read_text(encoding="utf-8"))
+        nodes_before = len(before.get("nodes", []))
+        assert nodes_before > 0
+        before_bytes = graph_path.read_bytes()
+
+        # Delete a.py (declared deletion -> had_explicit_deletions=True). Keep
+        # b.py on disk so detect() still returns code files, but make extraction
+        # abort to empty so the merged candidate has 0 nodes.
+        (repo / "a.py").unlink()
+
+        def aborted_extract(_targets, cache_root=None):
+            return {
+                "nodes": [],
+                "edges": [],
+                "hyperedges": [],
+                "input_tokens": 0,
+                "output_tokens": 0,
+            }
+
+        monkeypatch.setattr(extract_mod, "extract", aborted_extract)
+        result = _rebuild_code(
+            repo,
+            changed_paths=[Path("a.py"), Path("b.py")],
+            no_cluster=True,
+            acquire_lock=False,
+        )
+
+        after = json.loads(graph_path.read_text(encoding="utf-8"))
+        after_bytes = graph_path.read_bytes()
+    finally:
+        os.chdir(cwd)
+
+    assert result is False, "rebuild must refuse the empty overwrite"
+    assert len(after.get("nodes", [])) == nodes_before, (
+        "populated graph.json must be preserved when a failed extraction yields 0 nodes"
+    )
+    assert after_bytes == before_bytes, "graph.json must be byte-for-byte untouched"
+
+
+@pytest.mark.skipif(sys.platform == "win32", reason="git CLI behaviour varies on Windows runners")
+def test_watch_clustered_delete_all_preserves_graph(tmp_path, monkeypatch):
+    """RISK 4: the clustered ``tmp.replace`` write site must likewise refuse to
+    overwrite a populated graph.json with an empty (0-node) graph produced by a
+    failed/aborted extraction during a declared-deletion rebuild.
+
+    Same reproduction as the no-cluster sibling, exercising the clustered path
+    (the ``graph_tmp.replace(existing_graph)`` write guarded by ``_check_shrink``).
+    """
+    import graphify.extract as extract_mod
+    from graphify.watch import _rebuild_code
+
+    repo = tmp_path / "corpus"
+    repo.mkdir()
+    _git_init(repo)
+    (repo / "a.py").write_text("def f():\n    return 1\n", encoding="utf-8")
+    (repo / "b.py").write_text("def g():\n    return 2\n", encoding="utf-8")
+
+    cwd = os.getcwd()
+    try:
+        os.chdir(repo)
+        # no_viz keeps the clustered path fast (skips graph.html generation).
+        assert _rebuild_code(repo, no_viz=True, acquire_lock=False) is True
+        graph_path = repo / "graphify-out" / "graph.json"
+        before = json.loads(graph_path.read_text(encoding="utf-8"))
+        nodes_before = len(before.get("nodes", []))
+        assert nodes_before > 0
+        before_bytes = graph_path.read_bytes()
+
+        (repo / "a.py").unlink()
+
+        def aborted_extract(_targets, cache_root=None):
+            return {
+                "nodes": [],
+                "edges": [],
+                "hyperedges": [],
+                "input_tokens": 0,
+                "output_tokens": 0,
+            }
+
+        monkeypatch.setattr(extract_mod, "extract", aborted_extract)
+        result = _rebuild_code(
+            repo,
+            changed_paths=[Path("a.py"), Path("b.py")],
+            no_viz=True,
+            acquire_lock=False,
+        )
+
+        after = json.loads(graph_path.read_text(encoding="utf-8"))
+        after_bytes = graph_path.read_bytes()
+    finally:
+        os.chdir(cwd)
+
+    assert result is False, "clustered rebuild must refuse the empty overwrite"
+    assert len(after.get("nodes", [])) == nodes_before, (
+        "populated graph.json must be preserved when a failed extraction yields 0 nodes "
+        "(clustered path)"
+    )
+    assert after_bytes == before_bytes, (
+        "graph.json must be byte-for-byte untouched (clustered path)"
+    )
diff --git a/uv.lock b/uv.lock
index 8e24a8df1..3f1dfb760 100644
--- a/uv.lock
+++ b/uv.lock
@@ -1109,7 +1109,7 @@ wheels = [
 
 [[package]]
 name = "graphifyy"
-version = "0.8.25"
+version = "0.8.26"
 source = { editable = "." }
 dependencies = [
     { name = "datasketch" },

From 322c0744d32203889c1629b745f790bcdec06616 Mon Sep 17 00:00:00 2001
From: hypnwtykvmpr <narcolepticsun@gmail.com>
Date: Sat, 30 May 2026 23:47:47 -0500
Subject: [PATCH 21/21] fix(multigraph): preserve saved graph during stateful
 dedup

---
 graphify/build.py   |  74 ++++++++++++++-----
 tests/test_build.py | 168 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 224 insertions(+), 18 deletions(-)

diff --git a/graphify/build.py b/graphify/build.py
index fc955ea34..0fe4a9695 100644
--- a/graphify/build.py
+++ b/graphify/build.py
@@ -520,19 +520,7 @@ def build(
     """
     from graphify.dedup import deduplicate_entities
 
-    combined: dict = {
-        "nodes": [],
-        "edges": [],
-        "hyperedges": [],
-        "input_tokens": 0,
-        "output_tokens": 0,
-    }
-    for ext in extractions:
-        combined["nodes"].extend(ext.get("nodes", []))
-        combined["edges"].extend(ext.get("edges", []))
-        combined["hyperedges"].extend(ext.get("hyperedges", []))
-        combined["input_tokens"] += ext.get("input_tokens", 0)
-        combined["output_tokens"] += ext.get("output_tokens", 0)
+    combined = _combine_extractions(extractions)
     dedup_diagnostics: dict = {}
     if dedup and combined["nodes"]:
         combined["nodes"], combined["edges"] = deduplicate_entities(
@@ -550,6 +538,23 @@ def build(
     return G
 
 
+def _combine_extractions(extractions: list[dict]) -> dict:
+    combined: dict = {
+        "nodes": [],
+        "edges": [],
+        "hyperedges": [],
+        "input_tokens": 0,
+        "output_tokens": 0,
+    }
+    for ext in extractions:
+        combined["nodes"].extend(ext.get("nodes", []))
+        combined["edges"].extend(ext.get("edges", []))
+        combined["hyperedges"].extend(ext.get("hyperedges", []))
+        combined["input_tokens"] += ext.get("input_tokens", 0)
+        combined["output_tokens"] += ext.get("output_tokens", 0)
+    return combined
+
+
 def _norm_label(label: str) -> str:
     """Canonical dedup key — Unicode-aware, preserves CJK/word characters."""
     label = unicodedata.normalize("NFKC", label)
@@ -602,6 +607,12 @@ def deduplicate_by_label(nodes: list[dict], edges: list[dict]) -> tuple[list[dic
     return deduped_nodes, deduped_edges
 
 
+def _chunk_has_graph_records(chunk: dict) -> bool:
+    return bool(
+        chunk.get("nodes") or chunk.get("edges") or chunk.get("links") or chunk.get("hyperedges")
+    )
+
+
 def build_merge(
     new_chunks: list[dict],
     graph_path: str | Path = "graphify-out/graph.json",
@@ -700,15 +711,41 @@ def build_merge(
         existing_nodes = []
         base = []
 
-    all_chunks = base + list(new_chunks)
+    incoming_chunks = list(new_chunks)
+    incoming_has_records = any(_chunk_has_graph_records(chunk) for chunk in incoming_chunks)
+    dedup_diagnostics: dict = {}
+    if graph_path.exists() and dedup:
+        effective_dedup = False
+        if incoming_has_records:
+            from graphify.dedup import deduplicate_entities
+
+            incoming = _combine_extractions(incoming_chunks)
+            if incoming["nodes"]:
+                incoming["nodes"], incoming["edges"] = deduplicate_entities(
+                    incoming["nodes"],
+                    incoming["edges"],
+                    communities={},
+                    dedup_llm_backend=dedup_llm_backend,
+                    diagnostics=dedup_diagnostics,
+                )
+            all_chunks = base + [incoming]
+        else:
+            all_chunks = base + incoming_chunks
+    else:
+        effective_dedup = dedup
+        all_chunks = base + incoming_chunks
     G = build(
         all_chunks,
         directed=directed,
-        dedup=dedup,
+        dedup=effective_dedup,
         dedup_llm_backend=dedup_llm_backend,
         root=root,
         multigraph=multigraph,
     )
+    if multigraph and dedup_diagnostics:
+        existing = G.graph.get("graphify_multigraph_diagnostics", {})
+        existing.update(dedup_diagnostics)
+        G.graph["graphify_multigraph_diagnostics"] = existing
 
     # Prune nodes and edges from deleted source files
     if prune_sources:
@@ -775,9 +812,10 @@ def build_merge(
                 file=sys.stderr,
             )
 
-    # Safety check: refuse to shrink the graph silently (#479)
-    # Skip when dedup or prune_sources is active — shrinkage is intentional there.
-    if graph_path.exists() and not dedup and not prune_sources:
+    # Safety check: refuse to shrink the graph silently (#479).
+    # Stateful dedup applies only to incoming chunks, so only explicit pruning
+    # may reduce the saved graph's node count.
+    if graph_path.exists() and not prune_sources:
         existing_n = len(existing_nodes)
         new_n = G.number_of_nodes()
         if new_n < existing_n:
diff --git a/tests/test_build.py b/tests/test_build.py
index 71c0eff1b..d97180efd 100644
--- a/tests/test_build.py
+++ b/tests/test_build.py
@@ -1142,6 +1142,174 @@ def _three_parallel_edges_one_pair() -> dict:
     }
 
 
+def _dedup_pressure_multigraph_extraction() -> dict:
+    """A saved graph containing legitimate same-label nodes.
+
+    Stateful no-op merges must preserve these records exactly. Re-running entity
+    deduplication over the already-saved base graph would collapse ``worker_a``
+    and ``worker_b`` because they share a label and source_file, even though no
+    new extraction data asked for a merge.
+    """
+    return {
+        "nodes": [
+            {
+                "id": "worker_a",
+                "label": "Worker",
+                "file_type": "code",
+                "source_file": "keep_shared.py",
+            },
+            {
+                "id": "worker_b",
+                "label": "Worker",
+                "file_type": "code",
+                "source_file": "keep_shared.py",
+            },
+            {
+                "id": "sink",
+                "label": "Sink",
+                "file_type": "code",
+                "source_file": "sink.py",
+            },
+            {
+                "id": "gone",
+                "label": "Gone",
+                "file_type": "code",
+                "source_file": "gone.py",
+            },
+        ],
+        "edges": [
+            {
+                "source": "worker_a",
+                "target": "sink",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "keep_a.py",
+                "source_location": "L1",
+            },
+            {
+                "source": "worker_b",
+                "target": "sink",
+                "relation": "calls",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "keep_b.py",
+                "source_location": "L2",
+            },
+            {
+                "source": "gone",
+                "target": "sink",
+                "relation": "imports",
+                "confidence": "EXTRACTED",
+                "confidence_score": 1.0,
+                "source_file": "gone.py",
+                "source_location": "L3",
+            },
+        ],
+    }
+
+
+def _multigraph_edge_signature(
+    G: nx.MultiDiGraph,
+) -> list[tuple[object, object, str, object, object]]:
+    return sorted(
+        (
+            u,
+            v,
+            str(k),
+            d.get("source_file"),
+            d.get("source_location"),
+        )
+        for u, v, k, d in G.edges(keys=True, data=True)
+    )
+
+
+def test_build_merge_empty_delta_with_dedup_preserves_saved_multigraph(tmp_path):
+    graph_path = tmp_path / "graph.json"
+    _write_multigraph_graph_json(graph_path, _dedup_pressure_multigraph_extraction())
+    before = build_merge([], graph_path=graph_path, dedup=False)
+
+    after = build_merge([], graph_path=graph_path, dedup=True)
+
+    assert isinstance(before, nx.MultiDiGraph)
+    assert isinstance(after, nx.MultiDiGraph)
+    assert type(after) is nx.MultiDiGraph
+    assert set(after.nodes) == set(before.nodes)
+    assert dict(after.nodes["worker_a"]) == dict(before.nodes["worker_a"])
+    assert dict(after.nodes["worker_b"]) == dict(before.nodes["worker_b"])
+    assert _multigraph_edge_signature(after) == _multigraph_edge_signature(before)
+
+
+def test_build_merge_nonempty_delta_with_dedup_preserves_saved_base(tmp_path):
+    graph_path = tmp_path / "graph.json"
+    _write_multigraph_graph_json(graph_path, _dedup_pressure_multigraph_extraction())
+    before = build_merge([], graph_path=graph_path, dedup=False)
+
+    after = build_merge(
+        [
+            {
+                "nodes": [
+                    {
+                        "id": "new_node",
+                        "label": "New Node",
+                        "file_type": "code",
+                        "source_file": "new.py",
+                    }
+                ],
+                "edges": [
+                    {
+                        "source": "new_node",
+                        "target": "sink",
+                        "relation": "calls",
+                        "confidence": "EXTRACTED",
+                        "source_file": "new.py",
+                        "source_location": "L9",
+                    }
+                ],
+                "hyperedges": [],
+            }
+        ],
+        graph_path=graph_path,
+        dedup=True,
+    )
+
+    assert isinstance(before, nx.MultiDiGraph)
+    assert isinstance(after, nx.MultiDiGraph)
+    assert type(after) is nx.MultiDiGraph
+    assert set(before.nodes).issubset(set(after.nodes))
+    assert after.has_node("new_node")
+    assert dict(after.nodes["worker_a"]) == dict(before.nodes["worker_a"])
+    assert dict(after.nodes["worker_b"]) == dict(before.nodes["worker_b"])
+    assert after.has_edge("new_node", "sink")
+    assert set(_multigraph_edge_signature(before)).issubset(set(_multigraph_edge_signature(after)))
+
+
+def test_build_merge_delete_only_delta_with_dedup_prunes_without_rededuplicating_base(
+    tmp_path,
+):
+    graph_path = tmp_path / "graph.json"
+    _write_multigraph_graph_json(graph_path, _dedup_pressure_multigraph_extraction())
+
+    G = build_merge(
+        [{"nodes": [], "edges": [], "hyperedges": []}],
+        graph_path=graph_path,
+        prune_sources=["gone.py"],
+        dedup=True,
+    )
+
+    assert type(G) is nx.MultiDiGraph
+    assert set(G.nodes) == {"worker_a", "worker_b", "sink"}
+    assert all(d.get("source_file") != "gone.py" for _u, _v, d in G.edges(data=True))
+    assert G.has_edge("worker_a", "sink")
+    assert G.has_edge("worker_b", "sink")
+    assert sorted(
+        (u, v, d.get("source_file"), d.get("source_location")) for u, v, d in G.edges(data=True)
+    ) == [
+        ("worker_a", "sink", "keep_a.py", "L1"),
+        ("worker_b", "sink", "keep_b.py", "L2"),
+    ]
+
+
 def test_build_merge_multigraph_unchanged_file_preserves_parallel_edges(tmp_path):
     """PR 7 gate: merging a new chunk that does not touch A/B's files must
     preserve every keyed parallel edge on the existing A→B pair (no silent