Stream LLM output incrementally to the UI

### Problem Statement

Today, when the product calls an LLM and the model produces a long or slow reply, users do not see any of that text until the full response is finished. There is no way in the UI to read the model's output as it is being generated — only the final, complete result is surfaced.

That creates a poor experience for longer answers: the interface looks idle or stuck while work is happening, and users cannot start reading or scanning partial results early, which is what they expect from modern LLM-powered interfaces.

### Proposed Solution

Stream LLM output incrementally to the UI so users can read tokens as they are generated, instead of waiting for the full response.

This is a substantial cross-module change, not a localized fix. Any solution will have to address:

- **All LLM nodes** under `nodes/src/nodes/llm_*` (16+ providers: `llm_anthropic`, `llm_openai`, `llm_openai_api`, `llm_vertex`, `llm_gemini`, `llm_mistral`, `llm_vision_mistral`, `llm_perplexity`, `llm_xai`, `llm_ollama`, `llm_vision_ollama`, `llm_bedrock`, `llm_deepseek`, `llm_qwen`, `llm_gmi_cloud`, `llm_ibm_watson`). Each currently returns a single complete result; each will need to support partial output.
- **A streaming execution pattern for non-agent nodes**, which does not exist today. Generic nodes return once at end-of-execution; only agents emit partial state. This is a new node-execution capability.
- **The event/notification contract** between server and clients (`client-typescript`, `client-python`, `client-mcp`) needs a new event shape for incremental output (chunk identity, ordering, completion, error semantics).
- **`chat-ui` and `vscode`** message rendering, which today renders messages atomically — incremental rendering, reconciliation, and cancellation behavior are new UX work.
- **Engine throughput considerations** — partial-output notifications happen at much higher frequency than current event types, so backpressure and ordering guarantees need to be validated.
- **Pipeline contract** — a decision is needed about how partial output relates to lane output consumed by downstream nodes, and that decision must be documented in `ROCKETRIDE_PIPELINE_RULES.md` and `ROCKETRIDE_COMPONENT_REFERENCE.md` so pipeline authors aren't surprised.

The `sendSSE()` notification path used today by agents (`agent_crewai`, `agent_deepagent`, `agent_langchain`) is a useful reference point but is not a drop-in solution: it was designed for low-frequency agent state events (`thinking`, `acting`), not per-token streams from generic nodes.

### Out of Scope

- Streaming tool-call deltas (tool argument streaming).
- Persisting chunk-level history — the final node output remains the canonical record.
- Adding a new HTTP transport endpoint parallel to existing notifications.

### Alternatives Considered

_No response_

### Affected Modules

- [x] server (C++ engine)
- [x] client-typescript
- [x] client-python
- [x] client-mcp
- [x] nodes (pipeline)
- [x] ai
- [x] chat-ui
- [ ] dropper-ui
- [x] vscode
- [ ] tika

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream LLM output incrementally to the UI #730

Problem Statement

Proposed Solution

Out of Scope

Alternatives Considered

Affected Modules

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stream LLM output incrementally to the UI #730

Description

Problem Statement

Proposed Solution

Out of Scope

Alternatives Considered

Affected Modules

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions