State, and long-lived vs. short-lived connections #102
Replies: 25 comments 38 replies
-
Claude's thoughts:
|
Beta Was this translation helpful? Give feedback.
-
Also, although my post almost entirely focuses on protocol state, we could also imagine servers that have application state, like stateful agents. Those might naturally lend themselves to a long-lived connection anyway, even aside from the stateful features in the protocol. |
Beta Was this translation helpful? Give feedback.
-
Speaking from Vercel's perspective option 1 is probably best. You could consider a hybrid version where
I do agree that statefulness is hard to avoid in general (even a trivial stream could be interrupted and ideally the client could resume it), so I would not shy away from it as a general feature–just implement it in a way that is easy for distributed systems to achieve. |
Beta Was this translation helpful? Give feedback.
-
What are some statefull use cases that you ve seen? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Some thoughts that come to mind are:
My take is that it is not clear this is worth solving given the current motivation/problem statement. |
Beta Was this translation helpful? Give feedback.
-
Progressive enhancement could be an option. At the base level, keep a very simple mental model: tool call === procedure call === JSON-RPC. If a client needs real-time notifications, it would call some sort of "subscribe" method to get back an SSE URL (the presence of which can be negotiated with MCP's capability negotiation model). i.e. Don't tightly couple JSON-RPC with SSE. JSON-RPC is very simple to implement. Don't sabotage this simplicity by coupling it with SSE.
AAA can be an envelope/tunnel around MCP and not overcomplicate MCP itself. JSON-RPC and SSE URLs could be signed URLs generated by off-MCP API requests that establish AAA. Apologies if all this is too terse / abstract. Just wanted to quickly dump things at the top of my mind. |
Beta Was this translation helpful? Give feedback.
-
Big fan here 👋 Option 2 feels right and is in line with the other conversations around authorization that enable multiple paths depending on the servers capabilities. The trade-off being the additional complexity to the architecture itself. That said, this feels solvable by trying to look at the protocol payloads as separate from the delivery mechanism and let the delivery mechanisms abide to a separate contract layer which itself could be decoupled from clients as SDKs. With that the protocol shouldn't bifurcate - assuming we solve sufficient parity. |
Beta Was this translation helpful? Give feedback.
-
Thank you @jspahrsummers and others for this discussion. From my perspective (I'm quite new to MCP, so please correct me if I'm wrong), the most interesting would be
Why?
I think stateless enables easier scaling, federation of MCP servers, "tools discovery", and MCP proxies. If we want to create some "HuggingFace" for MCP tools, this would be "MCP Stateless" would make it easier. I currently see all the AAA layer as completely independnet from the Statefull / Statless discussion. I'll be happy to help / contribute if "Stateless MCP" becomes a thing. Just out of curiosity, what does the decision process to change / improve the MCP protocol look like? I mean, other than creating a change in the spec and SDK code changes, how are these proposals reviewed and approved / rejected? |
Beta Was this translation helpful? Give feedback.
-
I'm building a hosting platform for deploying MCPs and SSE makes it hard to scale remote MCPs because we can't use serverless. I did more research into this, and it seems like there's no way to properly route a connection under a protocol like SSE because all the POST requests are independent (REST is stateless, after all). So if you scale up any server to multiple replicas (even in a non-serverless way using VMs or Kubernetes), it's a pain to figure out which spun-up instance to route the messages to. Actually, statefulness isn't the issue here - it's SSE. One way to side-step this is via gRPC or WebSockets due to how they retain the connection on subsequent requests (there's a sense of session affinity). Is there a reason why WS or gRPC wasn't chosen as the primary transport and SSE was chosen instead? Just want to fully understand the motivations. |
Beta Was this translation helpful? Give feedback.
-
I've been mulling this over a bit and wanted to share my (candid and somewhat rambly) thoughts on this. A bit of a recap of the problemThe key issue with the statefulness is the scaling characteristics of long lived connections / inability to use serverless deployments. There is also an issue with the SSE transport where the "side channel" post requests need to be routed to the server instance holding open the SSE stream. The reason we have a stateful bidirectional protocol is to enable some really nice features (quoting justin):
I think these (+ future bidirectional) features will be important in the long run to achieve great UX in user facing apps and rich and efficient communication between agents (somewhat speculative, but I can definitely imagine graphs of agents being well served by stateful bidirectional communication). It's still very early days, but most servers and clients aren't properly leveraging these features. I suspect this is because they are harder to implement, and there aren't many good examples of clients in the wild that support the features. It's important for adoption that we don't add undue complexity/friction to client and server developers early on, but it's also important that we don't close doors on the aspects of the protocol that will enable the long-tail of great features. The direction I'm currently leaning inI really like @atesgoral's approach of progressive enhancement:
I feel like we could update the SSE transport (or just make a new transport) where: All client->server messages go through HTTP POST requests (including initialization) and the responses i.e:
(Note: In the current SSE implementation all server->client messages come through the open SSE channel) And all server initiated messages (i.e. notifications and sampling requests) come through an SSE stream that the client can optionally subscribe to. The implementation of the SSE channel is optional for servers, allowing server implementers to get some value from MCP (tool calls, read resources, evaluate prompts, resource/prompt completions) without needing to support long lived connections. Then, when server implementers and clients decide to implement the richer stateful features, they can implement the SSE channel and tackle the scaling implications. These SSE channels could also be best effort, and it's okay for them to occasionally disconnect (i.e. when a deployment occurs). Pros:
Cons:
There are probably other issues with this that I haven't thought through |
Beta Was this translation helpful? Give feedback.
-
Thank you for this discussion! If I'm understanding the current spec correctly, I think there are two categories of server>client communication to solve for over short-lived and/or interruptable connections, but today they are not distinguished between each other in the spec. I'm wondering if they should be, and if they should happen over distinct connections between client and server, instead of over one monolith streaming connection. My rough stab at how that might look, without perfectly understanding the spec today: Category 1. Notifications about changes to what the server can provide to the clientExamples: Resource/prompt/tool list changes, resource content changes Use case: As the client application, I need to keep track of the resources/prompts/tools that a server can provide to me, so I can reason about using those resources/prompts/tools and/or present that list to the user. Streaming notifications from the server help me keep my local list of resources/prompts/tools up-to-date in real time. If I get disconnected, I can re-build my local list of resources/prompts/tools by calling the server's List/Get APIs, and then connect to a stream for updates. If a server does not support streaming updates, I can poll the server's List/Get APIs periodically to keep my local list up-to-date. For servers that don't support streaming (or clients who don't want to stream): For servers that support streaming:
Category 2. Requests/notifications that are (hopefully?) directly related to some work that the client requestedExamples: Sampling requests, tool progress notifications, (logging?), (roots?) Use case: As the client application, I want to use prompts, tools, and agents from a server. In the course of completing my request to the server, the server may need additional information from me (like LLM samples). Or, it may want to send me occasional updates like progress notifications and logs. I establish a bidirectional communication stream with the server, so that the server can send me the information and requests it needs to complete my work. If the stream is disconnected mid-way, the server may not be able to complete my request and I may need to start a new request. In the spec today, there doesn't seem to be any kind of a "session ID" or "job ID" associated with a request that might take a while to complete and might require some back-and-forth communication. For example, sampling requests and progress notifications from server>client don't seem to be directly associated to the original tool call request initiated from client>server. It seems like today it is technically valid for a server to spam the client with sampling requests and root requests over the long-running connection, without the client ever actually using the server. Let's assume that some kind of session ID is introduced that is assigned to requests from the client for using prompts/tools/agents. For certain types of server>client requests, they must be within the context of a session ID. The original request from the client can be upgraded to a stream for bidirectional communication for that session only. The server completes the session when it has completed the requested work. (I think this pattern is similar to the "transactions" @pcingola was describing in his comment above) For servers that don't support streaming: For servers that support streaming:
If the connection breaks in the middle, the client must send a new request. Optional: For servers that persist session state (for example, by session ID): If the connection is broken (for either streaming or non-streaming servers), the client can make a request to get the results of a session ID. The result comes back immediately if the session is already complete. The request is resumed if the session ID exists. The behavior then depends on whether the server supports streaming, as above - either the HTTP request is held open until the result is ready, or a stream is started for bidirectional communication. |
Beta Was this translation helpful? Give feedback.
-
We have solution. We manage transporter and server per connection. We can handle multiple SSE remote sessions with this way.
|
Beta Was this translation helpful? Give feedback.
-
At Shopify, we're so far mostly using what we call "MCP Lite": Just regular, transactional (POST and get the result in the HTTP response) JSON-RPC, and often just implementing the MCP We have in fact done a PoC implementation of the JSON-RPC-SSE transport when it first came out, but as others in this thread have pointed out, it's awkward to implement: In podded deployments we are forced to use an inter-processes message passing mechanism to link the JSON-RPC POST request to the SSE stream. I proposed progressive enhancement above, without pictures. Time for some pictures. Selective notification subscription"MCP Lite", using plain JSON-RPC. No SSE in sight. Very simple for adoption: sequenceDiagram
participant C as MCP Client
participant S as MCP Server
C->>+S: POST JSON-RPC tools/call
S-->>-C: tool result
Important points:
Discovering notification URLs during initialization, splitting the MCP Server's JSON-RPC and notification endpoints for clarity: sequenceDiagram
participant C as MCP Client
box MCP Server
participant J as JSON-RPC Endpoint
participant N as Notification Endpoint
end
C->>+J: POST JSON-RPC initialize
J-->>-C: Notification URLs
C->>N: Start streaming from a notification URL above
N-->>C: event 1
C->>+J: POST JSON-RPC tools/call
J-->>-C: tool result
N-->>C: event 2
Important points:
|
Beta Was this translation helpful? Give feedback.
-
Sampling without streaming (borderline crazy idea)In an "MCP Lite" world (see above), how can MCP-server-initiated sampling work? Borrowing from HTTP, where servers can emit different response codes to ask clients to take certain actions (e.g. provide credentials, redirect away and forget this URL, I'm busy backoff, etc.) the sequenceDiagram
participant C as MCP Client
participant S as MCP Server
participant U as User
participant L as LLM
C->>+S: POST JSON-RPC tools/call
S-->>-C: sampling request, continuation payload
C->>+U: Get user approval
U-->>-C: Go ahead
C->>+L: Perform completion
L-->-C: Completion
C->>+U: Get user approval
U-->>-C: Go ahead
C->>+S: POST JSON-RPC tools/continue
S-->>-C: tool result
Assumption: The MCP Server will never send an unsolicited sampling request to the client, but these will all be as a response to tool calls. Abstractly, this treats the tool as a finite state machine. When sampling is needed, the state of the tool is bounced back to the client and the client can progress the state of the tool by passing it the state + completion to transition the state back to running. This "state" could simply be a tool call reference if the MCP Server is stateful and can persist the paused tool state on its side. |
Beta Was this translation helpful? Give feedback.
-
Short-lived SSE as JSON-RPC responseI think others might have suggested or alluded to this already. Focusing on tool calling only: POST to the JSON-RPC endpoint, get back an SSE response. The stream only lasts for the duration of a tool call. Simple tool response over a single SSE event: sequenceDiagram
participant C as MCP Client
participant S as MCP Server
C->>+S: POST JSON-RPC tools/call
S-->>-C: tool result over SSE
Certain implementations may support tools emitting intermediate diagnostic events or progress events usually meant for rendering on the UI. sequenceDiagram
participant C as MCP Client
participant S as MCP Server
C->>+S: POST JSON-RPC tools/call
S-->>C: reticulating splines
S-->>C: modulating frequencies
S-->>-C: tool result over SSE
A tool can also emit one or more sampling requests over SSE (even at different times in its processing cycle) and the same continuation mechanism in my post above can be used to resume the tool when all sampling is completed. |
Beta Was this translation helpful? Give feedback.
-
We've been struggling with this too. Long-lived connections are problematic for the reasons others have listed. It seems like robust tool calling needs to satisfy two constraints:
Most cloud APIs solve this by having two types of endpoints:
Good examples of this pattern are Google's AIP-151 for Long-Running Operations and Fal AI's Queue Endpoint. FAL's Queue API is a good reference implementation for long-running operations for models and tools that have streaming output. Modifying this to MCP's JSON-RPC protocol would be relatively straightforward. For simple tools:
For long-running tools:
This gives you a stable job id that you can cancel and reconnect to regardless of connection stability. This is slightly slightly more complicated than just upgrading to SSE on the initial call, as proposed in previous comments, but is easy to understand. I guess you could also support upgrading to SSE directly if optimizing was a priority, but conceptually there is a job. Sequence diagrams end up like: sequenceDiagram
participant Client
participant MCP as MCP (Job Manager)
participant Tool
%% Simple Tool Flow
Client->>MCP: tool/call (simple tool)
MCP->>Tool: Execute simple tool
Tool-->>MCP: Result
MCP-->>Client: Immediate response
%% Long-running Tool Flow
Client->>MCP: tool/call (long-running tool)
MCP->>MCP: Create job record
MCP->>Tool: Start job execution
Note right of MCP: MCP tracks job state
MCP-->>Client: Return Operation reference (job_id)
Client->>MCP: operation/stream?id=xxx
Tool-->>MCP: Job progress updates
MCP-->>Client: Stream updates via SSE
%% Optional Get/Cancel Flow
opt Get Operation State
Client->>MCP: operation/get?id=xxx
MCP-->>Client: Current state/result
end
opt Cancel Operation
Client->>MCP: operation/cancel?id=xxx
MCP->>Tool: Cancel job execution
Tool-->>MCP: Execution cancelled
MCP-->>Client: Cancellation confirmed
end
If there are needs for other types of notifications than job progress updates that seems like a separate Events API. I'd lean toward making that be done via reliable webhook delivery vs. a single long-lived SSE connection. |
Beta Was this translation helpful? Give feedback.
-
I also believe Option 1 makes sense as a way to disconnect sockets from sessions. From the discussion post, I would argue that we can keep the management of session context and state management up to the server to decide upon. In terms of extensibility, a few additions that could be great but are not required to solve the long-running sessions:
|
Beta Was this translation helpful? Give feedback.
-
Coming to this thread a bit late, but speaking for Cloudflare Workers: Statefulness is just fine for us. Durable Objects are all about handling stateful protocols. The original stateful MCP protocol over a WebSocket transport should be a great fit for MCP servers built on Workers. A protocol involving session IDs would also be OK -- it's trivial for Workers to route requests with the same session ID to the same Durable Object, where its state is tracked. The main problem is lifecycle: if the MCP client disappears without explicitly ending the session, how does the MCP server decide when it can clean up? WebSockets are nice because you naturally clean up when the connection is closed. So MCP servers built on Workers would probably prefer a stateful WebSocket-based protocol, but could also live with session IDs. I am not sure how a session token that "Encodes all session state itself" would work exactly, but it sounds like complexity that wouldn't benefit Workers users. |
Beta Was this translation helpful? Give feedback.
-
we're working on solving internal operations things over at SST for our users and letting them ship tools in a lambda is super important. it becomes a no brainer vs something they have to think about if it has to be containerized option 2 is obviously the simplest for us - and we actually already built this in the short term so we can get moving. bridge mcp server that can talk to a stateless implementation of the mcp protocol hosted at some url |
Beta Was this translation helpful? Give feedback.
-
Following. |
Beta Was this translation helpful? Give feedback.
-
For client -> server - Just remove the SSE transport from the spec and have everyone use stdio. Developers are free to implement any protocol they wish to connect to their web service and then expose the client as an MCP-server. This is the "paving the cowpaths" way, it's what most MCP servers in the wild are already doing (e.g Dax's comment), and it leaves developers to come up with the best solution for their needs. This also leaves the door open to future standardization on (possibly multiple) protocols more suited to client -> server.
The SSE transport could still be used - but now via a standard client I think server -> server is a completely different problem (i.e. the host application/mcp client is a web app) - but here tbh I think a completely different protocol would make more sense, so you can take advantage of standard conventions like http callbacks. |
Beta Was this translation helpful? Give feedback.
-
Good morning, folks! Maybe I’m too unfamiliar with this subject to offer a fully informed opinion, but I can share my experience with MCP as a developer user. From my perspective, I’d go all in with HTTP requests. It could significantly increase the number of available servers since it opens up opportunities for people to monetize them. In my experience with MCP, a single request is usually enough to get what I need—I don’t have to listen for ongoing updates. This makes synchronous communication simple to implement and straightforward to use. I suggest keeping the current SSE approach but adding this new HTTP-based option, each with its own pros and cons. The server’s developer can then decide which protocol best suits their needs. Just sharing my two cents—keep rocking! |
Beta Was this translation helpful? Give feedback.
-
From my understanding, the biggest issue with supporting standard HTTP endpoint calls is that there isn't a means for the server to do I can only think of a handful of use cases that would want to support |
Beta Was this translation helpful? Give feedback.
-
Personally, I'd go with Option 3. MCP is supposed to make it easy for AI agents to integrate with tools and resources. This is a data integration problem. The industry standard for integrating data across platforms are REST APIs. This is what 99% of companies will already have up and running. The burden of integration for MCP is largely on the server developers - and expecting them to not only create a new set of endpoints but to run their software in an entirely different way (requiring long-running servers) feels absurd to me. You could argue that it is to support additional capabilities. But the two main capabilities I am seeing above are 'sampling' and the server informing the client about updated resources/capabilities. The latter is easily solved 90% of the time by the client polling the server - and for the last 10%, the server can simply reply with a 400-level error. As far as 'sampling' - I believe this is an anti-pattern and should be out of scope for MCP. If servers need AI capabilities to properly respond to tool/resource requests, they should implement that behind their API. They shouldn't have to depend on unpredictable AI capabilities of an unknown client. I don't think this capability should even be something that servers should be able to do. It creates security issues where servers can covertly request sensitive data that clients may have. It also adds unnecessary risk for client developers since servers can effective utilize the client's AI tokens. I'm not sure why a client developer would even build support for sampling given these concerns (what do they really have to gain?) - speaking of which, none of the current documented clients have support for sampling: https://modelcontextprotocol.io/clients Any other more complex server-client interactions should be handled by multiple separate tool/resource calls. In my opinion, a stateless version is an absolute must. Many developers are using serverless solutions and long-running servers/connections are a non-option for them. So at a minimum, we should go with Option 2. But I would go a step further and simplify the protocol by removing features which (in my opinion) shouldn't be there in the first place. |
Beta Was this translation helpful? Give feedback.
-
Context
MCP is currently a stateful protocol, with a long-lived connection between client and server. This allows us to support behaviors like:
The connection is restartable with fairly little recovery cost (it's not catastrophic, like losing data), but the protocol is definitely not designed around repeatedly opening a connection, issuing one semantic request, then closing.
Problem
This is fairly limiting for serverless deployments, which frequently autoscale up and down, and generally aren't designed around long-lived requests (for example, typically there's a max request lifetime measured in minutes).
Deploying to a Platform-as-a-Service is really nice and convenient as a developer, so not being very compatible with this model creates an impediment to broader MCP adoption.
Possible solutions
I can imagine a few different answers here, each with their own tradeoffs:
Option 1: encapsulate state into a state or session token
Any stateful interaction over a long-lived connection could instead be modeled as independent requests (e.g., webhooks) by passing back and forth some sort of token that either:
Pros:
Cons:
Option 2: offer "stateless" and "stateful" variants of the protocol
Continue supporting all the behaviors I listed up top, but only when used in "stateful" mode. Offer a "stateless" mode that doesn't have those things.
It's possible that some transports could implement this in a fairly gradated way—e.g., HTTP could be stateful if client -> server can use SSE, but gracefully degrade to stateless by just using POSTed webhooks.
Pros:
Cons:
Option 3: make all of MCP "stateless"
Make sweeping changes to completely revamp MCP into a fully stateless protocol. Drop all features that require statefulness, like those mentioned up top.
Pros:
Cons:
Thoughts?
I'd welcome all of:
Beta Was this translation helpful? Give feedback.
All reactions