Conversation
Review Summary by QodoOptimize ContextGraph pagination with lazy evaluation and edge ID mapping
WalkthroughsDescriptionβ’ Optimize pagination with lazy evaluation using generators β’ Accept both naming conventions for edge source/target IDs β’ Filter invalid edges and nodes during data loading β’ Ensure deterministic pagination with sorted node IDs Diagramflowchart LR
A["Edge data with mixed naming"] -->|Accept source/target OR source_id/target_id| B["Normalize edge IDs"]
B -->|Filter invalid edges| C["Valid edges only"]
D["Node queries"] -->|Lazy generator evaluation| E["Deterministic sorted results"]
E -->|Pagination with islice| F["Paginated output"]
G["Active node filtering"] -->|Generator-based| H["Memory efficient results"]
File Changes1. semantica/context/context_graph.py
|
Code Review by Qodo
1. Sorted IDs TypeError
|
| # Sets are unordered, sort IDs for deterministic pagination | ||
| raw_ids = sorted(self.node_type_index.get(node_type, set())) | ||
| source = (self.nodes[nid] for nid in raw_ids if nid in self.nodes) |
There was a problem hiding this comment.
1. Sorted ids typeerror π Bug β‘ Correctness
find_nodes()/find_active_nodes() call sorted() on node_type_index IDs; if malformed node_ids (e.g., None/int) exist alongside strings, sorting raises TypeError and crashes node-type pagination. Malformed node_ids can enter via add_nodes() and are indexed verbatim in _add_internal_node().
Agent Prompt
### Issue description
`find_nodes()`/`find_active_nodes()` sort IDs from `node_type_index` without validating ID types. If malformed node IDs (e.g., `None`, `int`) are present, `sorted()` raises `TypeError` and breaks pagination.
### Issue Context
Malformed IDs can enter through `add_nodes()` (no validation) and are indexed verbatim in `_add_internal_node()`.
### Fix Focus Areas
- semantica/context/context_graph.py[783-821]
- semantica/context/context_graph.py[349-392]
- semantica/context/context_graph.py[1058-1067]
### Implementation notes
- When building `raw_ids`, filter to non-empty strings only (e.g., `isinstance(nid, str) and nid`) before sorting.
- Preferably also prevent bad IDs at ingestion time: in `add_nodes()`, skip/reject nodes with missing/invalid `id` (and optionally log/count them).
β Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| for e in source if e.source_id and e.target_id | ||
| ) | ||
| stop = skip + limit if limit is not None else None | ||
| return list(itertools.islice(gen, skip, stop)) |
There was a problem hiding this comment.
2. Totals mismatch pagination π Bug β‘ Correctness
find_nodes()/find_edges() now filter out invalid items (missing node_id / missing endpoints), but stats() still counts all stored nodes/edges and index entries. Explorer pagination uses stats() totals, so the API can report totals larger than the number of items find_* can ever return (empty/incorrect pages).
Agent Prompt
### Issue description
`stats()` totals can disagree with what `find_nodes()`/`find_edges()` return because `find_*` filters invalid entities but `stats()` counts everything. Explorer pagination uses `stats()` for totals, producing incorrect totals/empty pages.
### Issue Context
- `find_nodes()` filters `if n.node_id`
- `find_edges()` filters `if e.source_id and e.target_id`
- `stats()` counts `len(self.nodes)` / `len(self.edges)` and raw index sizes
- ExplorerSession uses `stats()` totals with `find_*` paging.
### Fix Focus Areas
- semantica/context/context_graph.py[783-807]
- semantica/context/context_graph.py[970-999]
- semantica/explorer/session.py[150-194]
### Implementation notes
Pick one consistent approach:
1) **Preferred:** prevent invalid nodes/edges from being stored/indexed (validate in `add_nodes()` similar to `add_edges()`), and optionally clean existing invalid entries on load/import.
2) Or compute `stats()` counts using the same validity criteria as `find_*` (and ensure `node_types`/`edge_types` reflect only valid IDs/edges).
β Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Summary
Fixes critical data pipeline bug that caused valid edges to be dropped during the
load_from_filephase and patches a vulnerability that caused the frontend layout engine to crash due to ghost nodes.Changes Made:
add_edgesto gracefully accept both naming conventions ("source" OR "source_id")if not source_id or not target_id: continueinadd_edgesto prevent structurally broken data from entering RAM.if n.node_id and n.is_active(now):) which guarantees mathematically sound data for UI physical engine.