Current MCP server uses a traditional request-response model where the entire payload is assembled in memory before being sent to the client. For large knowledge graphs, this creates problems:
- High memory usage, read_graph materializes ALL entities and relations into a single response
- Latency, clients must wait for the entire graph to be serialized before receiving any data
- Large graph reads (read_graph) may OOM
- Poor LLM experience, LLMs could start processing partial results immediately instead of waiting
Proposal
Introduce a streaming API using Spring WebFlux reactive types (Flux) to stream graph data directly to LLM clients as it is read from LadybugDB, instead of buffering the entire result set.
Current MCP server uses a traditional request-response model where the entire payload is assembled in memory before being sent to the client. For large knowledge graphs, this creates problems:
Proposal
Introduce a streaming API using Spring WebFlux reactive types (Flux) to stream graph data directly to LLM clients as it is read from LadybugDB, instead of buffering the entire result set.