Problem Statement
Today, when the product calls an LLM and the model produces a long or slow reply, users do not see any of that text until the full response is finished. There is no way in the UI to read the model's output as it is being generated — only the final, complete result is surfaced.
That creates a poor experience for longer answers: the interface looks idle or stuck while work is happening, and users cannot start reading or scanning partial results early, which is what they expect from modern LLM-powered interfaces.
Proposed Solution
Stream LLM output incrementally to the UI so users can read tokens as they are generated, instead of waiting for the full response.
This is a substantial cross-module change, not a localized fix. Any solution will have to address:
- All LLM nodes under
nodes/src/nodes/llm_* (16+ providers: llm_anthropic, llm_openai, llm_openai_api, llm_vertex, llm_gemini, llm_mistral, llm_vision_mistral, llm_perplexity, llm_xai, llm_ollama, llm_vision_ollama, llm_bedrock, llm_deepseek, llm_qwen, llm_gmi_cloud, llm_ibm_watson). Each currently returns a single complete result; each will need to support partial output.
- A streaming execution pattern for non-agent nodes, which does not exist today. Generic nodes return once at end-of-execution; only agents emit partial state. This is a new node-execution capability.
- The event/notification contract between server and clients (
client-typescript, client-python, client-mcp) needs a new event shape for incremental output (chunk identity, ordering, completion, error semantics).
chat-ui and vscode message rendering, which today renders messages atomically — incremental rendering, reconciliation, and cancellation behavior are new UX work.
- Engine throughput considerations — partial-output notifications happen at much higher frequency than current event types, so backpressure and ordering guarantees need to be validated.
- Pipeline contract — a decision is needed about how partial output relates to lane output consumed by downstream nodes, and that decision must be documented in
ROCKETRIDE_PIPELINE_RULES.md and ROCKETRIDE_COMPONENT_REFERENCE.md so pipeline authors aren't surprised.
The sendSSE() notification path used today by agents (agent_crewai, agent_deepagent, agent_langchain) is a useful reference point but is not a drop-in solution: it was designed for low-frequency agent state events (thinking, acting), not per-token streams from generic nodes.
Out of Scope
- Streaming tool-call deltas (tool argument streaming).
- Persisting chunk-level history — the final node output remains the canonical record.
- Adding a new HTTP transport endpoint parallel to existing notifications.
Alternatives Considered
No response
Affected Modules
Problem Statement
Today, when the product calls an LLM and the model produces a long or slow reply, users do not see any of that text until the full response is finished. There is no way in the UI to read the model's output as it is being generated — only the final, complete result is surfaced.
That creates a poor experience for longer answers: the interface looks idle or stuck while work is happening, and users cannot start reading or scanning partial results early, which is what they expect from modern LLM-powered interfaces.
Proposed Solution
Stream LLM output incrementally to the UI so users can read tokens as they are generated, instead of waiting for the full response.
This is a substantial cross-module change, not a localized fix. Any solution will have to address:
nodes/src/nodes/llm_*(16+ providers:llm_anthropic,llm_openai,llm_openai_api,llm_vertex,llm_gemini,llm_mistral,llm_vision_mistral,llm_perplexity,llm_xai,llm_ollama,llm_vision_ollama,llm_bedrock,llm_deepseek,llm_qwen,llm_gmi_cloud,llm_ibm_watson). Each currently returns a single complete result; each will need to support partial output.client-typescript,client-python,client-mcp) needs a new event shape for incremental output (chunk identity, ordering, completion, error semantics).chat-uiandvscodemessage rendering, which today renders messages atomically — incremental rendering, reconciliation, and cancellation behavior are new UX work.ROCKETRIDE_PIPELINE_RULES.mdandROCKETRIDE_COMPONENT_REFERENCE.mdso pipeline authors aren't surprised.The
sendSSE()notification path used today by agents (agent_crewai,agent_deepagent,agent_langchain) is a useful reference point but is not a drop-in solution: it was designed for low-frequency agent state events (thinking,acting), not per-token streams from generic nodes.Out of Scope
Alternatives Considered
No response
Affected Modules