-
-
Notifications
You must be signed in to change notification settings - Fork 146
[BUG] ContextGraph pagination does O(N) memory allcoation, causing 502 timeouts on large graphs.Β #430
Description
Bug Description
While testing the phase of the integration of the Semantica Knowledge Explorer with a 50k node / 100k edge graph, the FastAPI backend consistently locks up and throws 502 bad Gateway proxy timeout from Vite.
Root Cause
The bottlenecks is inside semantica/context/context_graph.py withing the find_nodes and find_edges method. The pagination (skip and limit) is currently applied to the list after the entire graph has been iterated and mapped into memory.
# Current Implementation in find_nodes:
if node_type:
node_ids = self.node_type_index.get(node_type, set())
nodes = [self.nodes[nid] for nid in node_ids]
else:
nodes = list(self.nodes.values())
results = [
{
"id": n.node_id,
"type": n.node_type,
"content": n.content,
"metadata": {**(getattr(n, "metadata", {}) or {}), **(getattr(n, "properties", {}) or {})},
}
for n in nodes # <--- Allocates 50,000 dictionaries in memory
]
if limit is not None:
return results[skip: skip + limit] # <--- Discards 49,000 of them
Because the frontend iterates through the dataset to stream into Graphology, a 50k graph loaded into the chunks of 1000 results in 50 sequential network requests. This forces the pythonGIL to allocate 2.5 million dictionaries as fast as possible. The asyncio event loop starves, and the server drops the connection.
Proposed Solution:
We need to apply the skip and limit offset directly to the dictionary iterator (using itertools.islice or generator expressions) before we execute the heavy dictionary comprehension. This shifts the time and space complexity from O(N) to exactly O(limit).
Metadata
Metadata
Assignees
Labels
Type
Projects
Status