Fix/cg pagination by ZohaibHassan16 · Pull Request #431 · Hawksight-AI/semantica

ZohaibHassan16 · 2026-04-01T18:48:47Z

Summary

Fixes critical data pipeline bug that caused valid edges to be dropped during the load_from_file phase and patches a vulnerability that caused the frontend layout engine to crash due to ghost nodes.

Changes Made:

Updated the add_edges to gracefully accept both naming conventions ("source" OR "source_id")
Added if not source_id or not target_id: continue in add_edges to prevent structurally broken data from entering RAM.
Added if n.node_id and n.is_active(now):) which guarantees mathematically sound data for UI physical engine.

qodo-code-review · 2026-04-01T18:49:03Z

Review Summary by Qodo

Optimize ContextGraph pagination with lazy evaluation and edge ID mapping

🐞 Bug fix ✨ Enhancement

Walkthroughs

Description

• Optimize pagination with lazy evaluation using generators
• Accept both naming conventions for edge source/target IDs
• Filter invalid edges and nodes during data loading
• Ensure deterministic pagination with sorted node IDs

Diagram

flowchart LR
  A["Edge data with mixed naming"] -->|Accept source/target OR source_id/target_id| B["Normalize edge IDs"]
  B -->|Filter invalid edges| C["Valid edges only"]
  D["Node queries"] -->|Lazy generator evaluation| E["Deterministic sorted results"]
  E -->|Pagination with islice| F["Paginated output"]
  G["Active node filtering"] -->|Generator-based| H["Memory efficient results"]

File Changes

1. semantica/context/context_graph.py Enhancement, bug fix +53/-69

Lazy pagination and edge ID mapping normalization

• Added itertools import for lazy pagination evaluation
• Enhanced add_edges() to accept both "source"/"target" and "source_id"/"target_id" naming
 conventions
• Added validation to skip edges with missing source_id or target_id
• Refactored find_nodes() to use lazy generators with sorted deterministic ordering
• Refactored find_active_nodes() to use generator-based filtering with node_id validation
• Refactored find_edges() to use lazy generators with null-safety defaults
• Replaced list slicing with itertools.islice() for memory-efficient pagination

semantica/context/context_graph.py

qodo-code-review · 2026-04-01T18:49:04Z

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0) 📎 Requirement gaps (0)

1. Sorted IDs TypeError 🐞 Bug ≡ Correctness

Description

find_nodes()/find_active_nodes() call sorted() on node_type_index IDs; if malformed node_ids (e.g.,
None/int) exist alongside strings, sorting raises TypeError and crashes node-type pagination.
Malformed node_ids can enter via add_nodes() and are indexed verbatim in _add_internal_node().

Code

semantica/context/context_graph.py[R789-791]

+                # Sets are unordered, sort IDs for deterministic pagination
+                raw_ids = sorted(self.node_type_index.get(node_type, set()))
+                source = (self.nodes[nid] for nid in raw_ids if nid in self.nodes)

Evidence
add_nodes() forwards node.get("id") into ContextNode without validating type/emptiness, and
_add_internal_node() inserts that node_id into node_type_index. find_nodes()/find_active_nodes()
then sort() that set, which will raise at runtime if it contains mixed/non-comparable types (common
with malformed imports).
semantica/context/context_graph.py[349-392]
semantica/context/context_graph.py[1058-1067]
semantica/context/context_graph.py[783-806]
semantica/context/context_graph.py[808-821]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`find_nodes()`/`find_active_nodes()` sort IDs from `node_type_index` without validating ID types. If malformed node IDs (e.g., `None`, `int`) are present, `sorted()` raises `TypeError` and breaks pagination.

### Issue Context
Malformed IDs can enter through `add_nodes()` (no validation) and are indexed verbatim in `_add_internal_node()`.

### Fix Focus Areas
- semantica/context/context_graph.py[783-821]
- semantica/context/context_graph.py[349-392]
- semantica/context/context_graph.py[1058-1067]

### Implementation notes
- When building `raw_ids`, filter to non-empty strings only (e.g., `isinstance(nid, str) and nid`) before sorting.
- Preferably also prevent bad IDs at ingestion time: in `add_nodes()`, skip/reject nodes with missing/invalid `id` (and optionally log/count them).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Totals mismatch pagination 🐞 Bug ≡ Correctness

Description

find_nodes()/find_edges() now filter out invalid items (missing node_id / missing endpoints), but
stats() still counts all stored nodes/edges and index entries. Explorer pagination uses stats()
totals, so the API can report totals larger than the number of items find_* can ever return
(empty/incorrect pages).

Code

semantica/context/context_graph.py[R985-988]

+                for e in source if e.source_id and e.target_id
+            )
+            stop = skip + limit if limit is not None else None
+            return list(itertools.islice(gen, skip, stop))

Evidence

find_nodes filters out nodes with falsy node_id and find_edges filters out edges missing endpoints,
while stats() counts len(self.nodes)/len(self.edges) and raw index sizes. ExplorerSession uses those
stats totals for pagination when no in-memory filtering is applied, so totals can become
inconsistent with the actual page source.

semantica/context/context_graph.py[783-806]
semantica/context/context_graph.py[970-999]
semantica/explorer/session.py[150-194]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`stats()` totals can disagree with what `find_nodes()`/`find_edges()` return because `find_*` filters invalid entities but `stats()` counts everything. Explorer pagination uses `stats()` for totals, producing incorrect totals/empty pages.

### Issue Context
- `find_nodes()` filters `if n.node_id`
- `find_edges()` filters `if e.source_id and e.target_id`
- `stats()` counts `len(self.nodes)` / `len(self.edges)` and raw index sizes
- ExplorerSession uses `stats()` totals with `find_*` paging.

### Fix Focus Areas
- semantica/context/context_graph.py[783-807]
- semantica/context/context_graph.py[970-999]
- semantica/explorer/session.py[150-194]

### Implementation notes
Pick one consistent approach:
1) **Preferred:** prevent invalid nodes/edges from being stored/indexed (validate in `add_nodes()` similar to `add_edges()`), and optionally clean existing invalid entries on load/import.
2) Or compute `stats()` counts using the same validity criteria as `find_*` (and ensure `node_types`/`edge_types` reflect only valid IDs/edges).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. Skipped edges unreported 🐞 Bug ✧ Quality

Description

add_edges() now silently skips edges missing source/target IDs with no logging or accounting, making
malformed imports/load_from_file partially apply without any diagnostics. This reduces debuggability
when edges_added is unexpectedly lower than input size.

Code

semantica/context/context_graph.py[R412-416]

+                source_id = edge.get("source_id") or edge.get("source")
+                target_id = edge.get("target_id") or edge.get("target")
+
+                if not source_id or not target_id:
+                    continue

Evidence
add_edges() drops edges via continue when endpoints are missing, and load_from_file() feeds file
edges directly into add_edges(); no warning is emitted identifying rejected edges or counts.
semantica/context/context_graph.py[394-430]
semantica/context/context_graph.py[722-758]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Edges with missing endpoints are skipped silently in `add_edges()`. When ingesting data (e.g., via `load_from_file()` or Explorer import), this produces partial graphs without any diagnostic output.

### Issue Context
Skipping invalid edges is reasonable, but callers need visibility into how many/which edges were rejected.

### Fix Focus Areas
- semantica/context/context_graph.py[394-430]
- semantica/context/context_graph.py[722-758]

### Implementation notes
- Track `skipped` count inside `add_edges()`.
- After processing the batch, emit a single `logger.warning` when `skipped > 0` (optionally include a small sample of rejected edges, redacting large metadata).
- Consider returning only `count` (to keep API stable) but logging the skipped count; or expose skipped count via an optional callback/diagnostics hook.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

qodo-code-review · 2026-04-01T18:57:34Z

semantica/context/context_graph.py

+                # Sets are unordered, sort IDs for deterministic pagination
+                raw_ids = sorted(self.node_type_index.get(node_type, set()))
+                source = (self.nodes[nid] for nid in raw_ids if nid in self.nodes)


1. Sorted ids typeerror 🐞 Bug ≡ Correctness

find_nodes()/find_active_nodes() call sorted() on node_type_index IDs; if malformed node_ids (e.g., None/int) exist alongside strings, sorting raises TypeError and crashes node-type pagination. Malformed node_ids can enter via add_nodes() and are indexed verbatim in _add_internal_node().

Agent Prompt

### Issue description `find_nodes()`/`find_active_nodes()` sort IDs from `node_type_index` without validating ID types. If malformed node IDs (e.g., `None`, `int`) are present, `sorted()` raises `TypeError` and breaks pagination. ### Issue Context Malformed IDs can enter through `add_nodes()` (no validation) and are indexed verbatim in `_add_internal_node()`. ### Fix Focus Areas - semantica/context/context_graph.py[783-821] - semantica/context/context_graph.py[349-392] - semantica/context/context_graph.py[1058-1067] ### Implementation notes - When building `raw_ids`, filter to non-empty strings only (e.g., `isinstance(nid, str) and nid`) before sorting. - Preferably also prevent bad IDs at ingestion time: in `add_nodes()`, skip/reject nodes with missing/invalid `id` (and optionally log/count them).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-04-01T18:57:34Z

semantica/context/context_graph.py

+                for e in source if e.source_id and e.target_id
+            )
+            stop = skip + limit if limit is not None else None
+            return list(itertools.islice(gen, skip, stop))


2. Totals mismatch pagination 🐞 Bug ≡ Correctness

find_nodes()/find_edges() now filter out invalid items (missing node_id / missing endpoints), but stats() still counts all stored nodes/edges and index entries. Explorer pagination uses stats() totals, so the API can report totals larger than the number of items find_* can ever return (empty/incorrect pages).

Agent Prompt

### Issue description `stats()` totals can disagree with what `find_nodes()`/`find_edges()` return because `find_*` filters invalid entities but `stats()` counts everything. Explorer pagination uses `stats()` for totals, producing incorrect totals/empty pages. ### Issue Context - `find_nodes()` filters `if n.node_id` - `find_edges()` filters `if e.source_id and e.target_id` - `stats()` counts `len(self.nodes)` / `len(self.edges)` and raw index sizes - ExplorerSession uses `stats()` totals with `find_*` paging. ### Fix Focus Areas - semantica/context/context_graph.py[783-807] - semantica/context/context_graph.py[970-999] - semantica/explorer/session.py[150-194] ### Implementation notes Pick one consistent approach: 1) **Preferred:** prevent invalid nodes/edges from being stored/indexed (validate in `add_nodes()` similar to `add_edges()`), and optionally clean existing invalid entries on load/import. 2) Or compute `stats()` counts using the same validity criteria as `find_*` (and ensure `node_types`/`edge_types` reflect only valid IDs/edges).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ZohaibHassan16 added 2 commits April 1, 2026 12:13

fix(context): optimize ContextGraph pagination with lazy evaluation

b0947df

fix(graph): resolve edge ID mapping

88309da

ZohaibHassan16 requested a review from KaifAhmad1 April 1, 2026 18:48

qodo-code-review bot reviewed Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/cg pagination#431

Fix/cg pagination#431
ZohaibHassan16 wants to merge 2 commits intoHawksight-AI:mainfrom
ZohaibHassan16:fix/cg-pagination

ZohaibHassan16 commented Apr 1, 2026

Uh oh!

qodo-code-review bot commented Apr 1, 2026

Uh oh!

qodo-code-review bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

qodo-code-review bot Apr 1, 2026

Uh oh!

qodo-code-review bot Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ZohaibHassan16 commented Apr 1, 2026

Summary

Uh oh!

qodo-code-review bot commented Apr 1, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

qodo-code-review bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qodo-code-review bot commented Apr 1, 2026 •

edited

Loading