State, and long-lived vs. short-lived connections #102

jspahrsummers · 2024-12-06T15:33:08Z

jspahrsummers
Dec 6, 2024
Maintainer

Context

MCP is currently a stateful protocol, with a long-lived connection between client and server. This allows us to support behaviors like:

Notifications about changes—e.g., changes to resources, or tools being added/removed. (These can occur in either direction too.)
Server-initiated sampling at any point, enabling agentic workflows.
Passing arbitrary server logging through to the client.
… more stuff in future? …

The connection is restartable with fairly little recovery cost (it's not catastrophic, like losing data), but the protocol is definitely not designed around repeatedly opening a connection, issuing one semantic request, then closing.

Problem

This is fairly limiting for serverless deployments, which frequently autoscale up and down, and generally aren't designed around long-lived requests (for example, typically there's a max request lifetime measured in minutes).

Deploying to a Platform-as-a-Service is really nice and convenient as a developer, so not being very compatible with this model creates an impediment to broader MCP adoption.

Possible solutions

I can imagine a few different answers here, each with their own tradeoffs:

Option 1: encapsulate state into a state or session token

Any stateful interaction over a long-lived connection could instead be modeled as independent requests (e.g., webhooks) by passing back and forth some sort of token that either:

Encodes all session state itself, or…
Merely identifies the stateful session, with server and client having some persistent storage associated with that session token.

Pros:

This is the simplest, incremental evolution from where MCP is today.
Implementable just at the transport layer—protocol and application logic can be (mostly) agnostic to this, I think.

Cons:

Somewhat annoying for servers to implement (and a key design principle of MCP is that servers should be really easy to implement):
- Requires complex state serialization/deserialization. The SDKs could do some of this, but probably not all.
- May require servers to be deployed with persistent storage.

Option 2: offer "stateless" and "stateful" variants of the protocol

Continue supporting all the behaviors I listed up top, but only when used in "stateful" mode. Offer a "stateless" mode that doesn't have those things.

It's possible that some transports could implement this in a fairly gradated way—e.g., HTTP could be stateful if client -> server can use SSE, but gracefully degrade to stateless by just using POSTed webhooks.

Pros:

Stateless servers could be implemented very simply.
No additional complexity for stateful servers.
Backwards compatible.

Cons:

Adds complexity to the spec and SDKs, and creates bifurcation.
Plausibly adds complexity to MCP hosts, which may need to connect to both kinds of servers.

Option 3: make all of MCP "stateless"

Make sweeping changes to completely revamp MCP into a fully stateless protocol. Drop all features that require statefulness, like those mentioned up top.

Pros:

Simplest for spec, SDKs, and client/server/host implementations (where "simple" here means "fewest concerns," not necessarily "easiest" or "best").

Cons:

Agents and "live" behaviors like subscriptions get significantly more difficult to implement.
Any server -> client communication would need to use something like web hooks (although this is transport dependent). Importantly, this means that the client would need to be internet-accessible, which isn't currently a requirement.
Very backwards incompatible. Massive change right after the initial release.

Thoughts?

I'd welcome all of:

Feedback on whether you think this is a problem worth solving in the first place.
Feedback or questions about the options listed here.
Suggestions for other options I didn't think of and list!

jspahrsummers · 2024-12-06T16:10:11Z

jspahrsummers
Dec 6, 2024
Maintainer Author

Claude's thoughts:

Based on the MCP documentation and the problem you've outlined, I'll share my analysis of the situation and the proposed solutions:

First, I agree this is a problem worth solving. The ability to deploy MCP servers in serverless environments would significantly lower the barrier to entry and increase adoption, especially for developers already working in cloud environments.

Let's analyze each option:

Option 1 (State/Session Token):

This feels like the most pragmatic solution while preserving MCP's current capabilities

The main concern is the implementation complexity for server authors, which goes against MCP's principle of being easy to implement

However, this could potentially be mitigated by:

Having the SDKs handle most of the state management complexity

Providing simple default implementations for common cases

Making state persistence optional - servers could choose to only support ephemeral sessions

Option 2 (Stateless/Stateful Variants):

This is an interesting middle ground that provides flexibility

The graceful degradation approach with transports is particularly compelling

However, the bifurcation could lead to:

Confusion about which mode to use

Inconsistent behavior across different server implementations

Increased complexity in host applications that need to handle both modes

Option 3 (Fully Stateless):

While this would simplify the protocol, it seems too restrictive

The loss of server->client communication would significantly limit MCP's capabilities

The requirement for clients to be internet-accessible is particularly problematic for desktop applications and IDE integrations

Additional Considerations:

Hybrid Approach:

What if we kept the protocol stateful but added a "connection pooling" layer in the transport?

The transport could maintain a pool of connections and handle reconnection/state recovery transparently

This could allow serverless deployments while preserving the protocol's capabilities

Servers could optionally implement more sophisticated state management if needed

Event-Sourcing Pattern:

Instead of maintaining direct state, servers could maintain an append-only log of events

State could be reconstructed from this event log when needed

This could simplify state management while preserving functionality

Could be particularly well-suited for serverless environments with managed event stores

My recommendation would be to explore a combination of Options 1 and 4:

Add session token support at the transport layer

Implement connection pooling in the SDKs

Provide simple default implementations for common state management patterns

Keep it optional - servers can choose to be completely stateless if they don't need the additional capabilities

This would preserve MCP's powerful features while making it more accessible for serverless deployments. What are your thoughts on this hybrid approach?

1 reply

jspahrsummers Dec 6, 2024
Maintainer Author

I don't think "connection pooling" really solves anything. The problem with serverless architectures is that you often don't have good ways to coordinate across instances and do this kind of sticky routing (even if possible, it's annoying).

Event sourcing is a good callout that I didn't think of, though. Main question is ~~what the signal-to-noise ratio would be on those logged events~~ how many of those logged events will be redundant. Another way to think about this: what would be the ideal ratio between "checkpoints" and events in the log? If the checkpoints are very frequent, then just using that as the backing state seems better.

jspahrsummers · 2024-12-06T16:28:36Z

jspahrsummers
Dec 6, 2024
Maintainer Author

Also, although my post almost entirely focuses on protocol state, we could also imagine servers that have application state, like stateful agents. Those might naturally lend themselves to a long-lived connection anyway, even aside from the stateful features in the protocol.

0 replies

cramforce · 2024-12-12T18:21:31Z

cramforce
Dec 12, 2024

Speaking from Vercel's perspective option 1 is probably best.

You could consider a hybrid version where

You do have a session token
Clients can use it to resume
Servers can say "I have stateful load-balancing and I wont restart, so I'll just keep state in memory", hence not increasing complexity for such servers
But even serverful solutions would benefit from full resumability

I do agree that statefulness is hard to avoid in general (even a trivial stream could be interrupted and ideally the client could resume it), so I would not shy away from it as a general feature–just implement it in a way that is easy for distributed systems to achieve.

0 replies

orliesaurus · 2024-12-19T04:16:46Z

orliesaurus
Dec 19, 2024

What are some statefull use cases that you ve seen?

1 reply

jspahrsummers Jan 2, 2025
Maintainer Author

For example, if you want to be notified of remote resources appearing or disappearing, this is effectively state. It could be delivered via web hooks or other means, but that's a question of how to represent it.

azdolinski · 2024-12-19T23:22:51Z

azdolinski
Dec 19, 2024

Hi everyone, I've been following this discussion with interest. I already have a working prototype solution for tool invocation, based on a stateful client-server connection, as you can see in the attached diagram.

To truly achieve the full potential of serverless and make it usable in all dimensions, we really just need a robust AAA (Authentication, Authorization, Accounting) mechanism. This would allow both the MCP and the server to recognize:

1A - Who (Authentication) - who is the user or client initiating the connection (or continuing it in the Nth subsequent session...).
2A - What (Authorization) - what resources and operations this user/client has access to (it would be nice to have such a mechanism and have 1000 tools, but not always all tools need to be exposed to the client).
3A - How (Accounting) - how are the resources being used? For example, has there been any conversation exchange within a given tool? What was the result? Was there a previous error? Has the last operation completed? etc.

For me, maintaining context across multiple invocations is actually quite important... one tool writes, another one executes, and a third one analyzes (and all of this on the server side).

I'm planning to use an Agent as a Tool (Swiss Army Knife/Multitool). Although, as of today, the solution is simple because, as a user, I'm working in my own space / the Docker container is entirely for me.

So, if you want MCP to be stateless also - that's a great idea... but I only ask for one thing... please don't abandon the stateful concept because I think that with the AAA function, it would be a beautiful solution for communicating with much more complex scenarios.

Option 2: +1 vote ! 😉

1 reply

soren-olympus Jan 16, 2025

For me, maintaining context across multiple invocations is actually quite important... one tool writes, another one executes, and a third one analyzes (and all of this on the server side).

In particular, it could be valuable to maintain context across multiple invocations within a trusted environment (e.g., the server) without requiring sampling calls to a potentially untrusted LM client. This would enable the server to process intermediate states securely (to "privately think") before exposing final results to the client—an extension of AAA that requires statefulness.

As a motivating extension, a naive form of this exists in ad-tech clean rooms, where advertisers and publishers privately combine data for targeting and measurement with agreed-upon egress rules. This allows compute on data with more manageable privacy and competitive risks. [Seems quite likely similar dynamics will appear in MCP applications.]

allenporter · 2025-01-02T17:58:11Z

allenporter
Jan 2, 2025

Feedback on whether you think this is a problem worth solving in the first place.

Feedback or questions about the options listed here.

Suggestions for other options I didn't think of and list!

Some thoughts that come to mind are:

Is a max request timeout in terms of minutes an actual problem? If so, then i think it would be worth getting more explicit about the specific challenges that introduces. For example, if efficiency/scalability of data transfer of resuming a session is the key problem, then could be worth getting more specific about that with use cases. (But even if the session length is in days, not sure it would change that problem dramatically)
My assumption is the average duration of an LLM user session is probably also measured in minutes.
Even a stateful transport session can disconnect due to poor network connectivity. Are resumable stateful sessions also a requirement? I would say probably not? But the point being that clients already need to handle reconnecting for many reasons
Taking on session state management at a higher level may have other additional follow on problems e.g. garbage collecting old sessions, managing secondary resources associated with a session, leaking connections, etc. The session will need to have some notion of lifetime/TLL/keep alive, etc then at that point are we back to a similar question about the max request timeout.
My impression is that today you can solve this by building a local MCP server that provides a custom transport to use any protocol. Then another way to think about this problem is improving distribution of local MCP servers. This may be an existing problem to solve anyway.

My take is that it is not clear this is worth solving given the current motivation/problem statement.

2 replies

in-op Jan 2, 2025

A pretty common use case would be having your MCP server deployed using Kubernetes with multiple pods due to high traffic. The client connects to 1 pod for the SSE connection, but subsequent POST requests get round-robined to any other pod which totally breaks the protocol.

allenporter Jan 2, 2025

A pretty common use case would be having your MCP server deployed using Kubernetes with multiple pods due to high traffic. The client connects to 1 pod for the SSE connection, but subsequent POST requests get round-robined to any other pod which totally breaks the protocol.

Sure but what are you proposing? I am aware of thats how SSE works today. The sse implementation in the python-sdk already handles this by encoding a session is in the follow up post urls and associates it with the original request. That doesn't need an additional protocol extension as it's already supported with external state if needed.. unless you are arguing for removing state from the protocol.

atesgoral · 2025-01-06T15:24:45Z

atesgoral
Jan 6, 2025
Collaborator

Progressive enhancement could be an option.

At the base level, keep a very simple mental model: tool call === procedure call === JSON-RPC.

If a client needs real-time notifications, it would call some sort of "subscribe" method to get back an SSE URL (the presence of which can be negotiated with MCP's capability negotiation model).

i.e. Don't tightly couple JSON-RPC with SSE. JSON-RPC is very simple to implement. Don't sabotage this simplicity by coupling it with SSE.

Also, tool calling should ideally be stateless / idempotent: Tool calls are just procedures that take a bunch of arguments and return some result. Application state should only be part of it by including a chunk of grounding data / conversation context as an argument to a tool call. Tools would remain portable/reusable with that. EDIT: Not so sure about this.

AAA can be an envelope/tunnel around MCP and not overcomplicate MCP itself. JSON-RPC and SSE URLs could be signed URLs generated by off-MCP API requests that establish AAA.

Apologies if all this is too terse / abstract. Just wanted to quickly dump things at the top of my mind.

6 replies

jspahrsummers Jan 6, 2025
Maintainer Author

This could work for notifications and anything directly tied to an issued request, but doesn't provide an easy way for the server to make requests of the client, which is a core feature (e.g., for sampling).

dave-shawley Jan 6, 2025

That is where the parameters to notifications.* come in. Bi-directional messaging over HTTP where the message is not in response to a request is difficult. Both SSE and web sockets provide a pattern that works over an established connection; however, if the connection is severed, then all bets are off. We can add parameters that make reconnecting a disconnected session (stream?) possible. I'm going to call it a session for the time being. Stream might be a better way to think of it though.

I was working on a response to the "tool calling should ideally be stateless" comment that I abandoned. There is a similar problem there since many tools need to identify/cache/stash information that is local to the session. Perhaps, creating a session ID in the protocol that is included in messages. It could be optionally included in the capabilities.notifications.SSE param of initialize to resume a disconnected session. The server could create a new session and return the identifier in the response. The same identifier could be passed into tool calling somehow (haven't fleshed out what this would look like yet). The idea is to not necessarily mandate a session storage mechanism but to enable one for clients, servers, and tools that opt-in. Having document level properties would work nicely here.

sean-roberts Jan 8, 2025

This could work for notifications and anything directly tied to an issued request, but doesn't provide an easy way for the server to make requests of the client, which is a core feature (e.g., for sampling).

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
   "capabilities":  {
      "notifications": {
         "SSE": {},
         "websocket": {},
         "polling": {
              "interval": {number}
         }
      }
  }
}

Riffing on the idea, what if the client was informed that it could "check in" with the client state of some sort. The Server could then inform the client that it needs to make a request. There might be some piggybacking that's possible with the expectation of regular interval pings.

In this world, the client can always send standard requests/notifications as needed (using the allowed notifications) to the server. With polling, it can check in with the server to identify if it wants to request anything. This would only happen during the duration of the session as determined by the client - no different from the expectations for SSE/WS. The server can then inform the client that it wants a status update if it's stale on something it needs.

This might also help as a means for graceful degradation for servers as well. If/when there are issues with connection management.

apryiomka Jan 31, 2025

This could work for notifications and anything directly tied to an issued request, but doesn't provide an easy way for the server to make requests of the client, which is a core feature (e.g., for sampling).
{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
   "capabilities":  {
      "notifications": {
         "SSE": {},
         "websocket": {},
         "polling": {
              "interval": {number}
         }
      }
  }
}

SSE, polling makes sense for long running operations. I would like to see support for regular synchronous request / response. I would assume many tools would just return the output right away or within meaningfully small interval.

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
   "capabilities":  {
      "notifications": {
         "SSE": {},
         "websocket": {},
         "http": {},
         "polling": {
              "interval": {number}
         }
      }
  }
}

allan-simon Feb 28, 2025

I agree , the stateful should be negotiated , and the client should assume it's not available except if negotiated with the server

So that even on "STDIO" transport, you could just call one shot commands ( grep, jq etc.) without needing to add wrapper around them.

On a side note, currently Librechat provide an option to feed the tools with an openapi-spec https://swagger.io/specification/ and it's a real time saver , suddenly all the hundreds of Saas with an rest api become accessible without any wrapper

sean-roberts · 2025-01-08T14:03:46Z

sean-roberts
Jan 8, 2025

Big fan here 👋

Option 2 feels right and is in line with the other conversations around authorization that enable multiple paths depending on the servers capabilities. The trade-off being the additional complexity to the architecture itself. That said, this feels solvable by trying to look at the protocol payloads as separate from the delivery mechanism and let the delivery mechanisms abide to a separate contract layer which itself could be decoupled from clients as SDKs. With that the protocol shouldn't bifurcate - assuming we solve sufficient parity.

1 reply

sean-roberts Jan 8, 2025

What's more is that we should probably decouple the notifications/communication from how to represent state as well. Even for WS/SSE, this will be useful but is a little different to the problem of how do we provide sufficient requesting capabilities for client to server and for server to connected client.

pcingola · 2025-01-08T21:49:57Z

pcingola
Jan 8, 2025

Thank you @jspahrsummers and others for this discussion.

From my perspective (I'm quite new to MCP, so please correct me if I'm wrong), the most interesting would be

Option 2: offer "stateless" and "stateful" variants of the protocol"

Why?

It keeps the full functionality of the statefull server, for those that need it.
It opens the door for a "simplified" server (with reduced functionality) to be implemented in a stateless manner.

I think stateless enables easier scaling, federation of MCP servers, "tools discovery", and MCP proxies. If we want to create some "HuggingFace" for MCP tools, this would be "MCP Stateless" would make it easier.

I currently see all the AAA layer as completely independnet from the Statefull / Statless discussion.

I'll be happy to help / contribute if "Stateless MCP" becomes a thing.

Just out of curiosity, what does the decision process to change / improve the MCP protocol look like? I mean, other than creating a change in the spec and SDK code changes, how are these proposals reviewed and approved / rejected?

1 reply

jspahrsummers Jan 9, 2025
Maintainer Author

Just out of curiosity, what does the decision process to change / improve the MCP protocol look like? I mean, other than creating a change in the spec and SDK code changes, how are these proposals reviewed and approved / rejected?

That's how. 🙂 We have a core set of maintainers that we are looking to expand over time; they are responsible for reviewing proposals and offering feedback. Changes to the spec specifically require the approval of multiple maintainers.

calclavia · 2025-01-27T12:07:15Z

calclavia
Jan 27, 2025

I'm building a hosting platform for deploying MCPs and SSE makes it hard to scale remote MCPs because we can't use serverless.

I did more research into this, and it seems like there's no way to properly route a connection under a protocol like SSE because all the POST requests are independent (REST is stateless, after all). So if you scale up any server to multiple replicas (even in a non-serverless way using VMs or Kubernetes), it's a pain to figure out which spun-up instance to route the messages to.

Actually, statefulness isn't the issue here - it's SSE. One way to side-step this is via gRPC or WebSockets due to how they retain the connection on subsequent requests (there's a sense of session affinity). Is there a reason why WS or gRPC wasn't chosen as the primary transport and SSE was chosen instead? Just want to fully understand the motivations.
@jspahrsummers

3 replies

jerome3o-anthropic Jan 29, 2025
Maintainer

iirc the reason we went for SSE over websockets/gRPC is because SSE exists within standard HTTP, and we figured that adopting websockets on average would probably be a bigger lift than supporting SSE for existing web stacks.

It's a good point regarding additional complexity of routing of subsequent requests back to the container/instance that is holding open the SSE connection. This is another complexity/barrier to the deployment of servers. Some off the cuff ways to solve this:
* Use something like redis to route messages to the correct places
* During the configuration of the SSE transport, the server specifies the endpoint in which it will listen for messages in the session - this endpoint would be used to route back to the correct server instances

That being said, I think stateful/stateless discussion is still relevant - as supporting long lived websockets in a webapp would still necessitate solving all the same issues with stateful/long lived connections

calclavia Jan 30, 2025

iirc the reason we went for SSE over websockets/gRPC is because SSE exists within standard HTTP, and we figured that adopting websockets on average would probably be a bigger lift than supporting SSE for existing web stacks.

It's a good point regarding additional complexity of routing of subsequent requests back to the container/instance that is holding open the SSE connection. This is another complexity/barrier to the deployment of servers. Some off the cuff ways to solve this: * Use something like redis to route messages to the correct places * During the configuration of the SSE transport, the server specifies the endpoint in which it will listen for messages in the session - this endpoint would be used to route back to the correct server instances

That being said, I think stateful/stateless discussion is still relevant - as supporting long lived websockets in a webapp would still necessitate solving all the same issues with stateful/long lived connections

Thanks for clarifying the motivation behind SSE.

I think having both options (WS/SSE) officially documented would be great, since I think for practical purposes WS is more efficient and designed specifically for these long-lived bidirectional connections and avoids requiring server-side complexities.

Plus, I noticed WS is already in the SDKs, so it would be good if that's formalized.

the-vampiire Mar 8, 2025

MCP servers can already be seen as a proxy that requires new development. I’m not convinced of the argument that SSE is preferred for simplicity of being native to HTTP.

WS seems far more natural for MCP and is (relatively) trivial to add to SDKs whose purpose is to already abstract development of MCP servers.

I’m surprised this hasn't received more discussion (catching up so maybe it does further in the discussion)

jerome3o-anthropic · 2025-01-30T04:36:41Z

jerome3o-anthropic
Jan 30, 2025
Maintainer

I've been mulling this over a bit and wanted to share my (candid and somewhat rambly) thoughts on this.

A bit of a recap of the problem

The key issue with the statefulness is the scaling characteristics of long lived connections / inability to use serverless deployments. There is also an issue with the SSE transport where the "side channel" post requests need to be routed to the server instance holding open the SSE stream.

The reason we have a stateful bidirectional protocol is to enable some really nice features (quoting justin):

Notifications about changes—e.g., changes to resources, or tools being added/removed. (These can occur in either direction too.)

Server-initiated sampling at any point, enabling agentic workflows.

Passing arbitrary server logging through to the client.

… more stuff in future? …

I think these (+ future bidirectional) features will be important in the long run to achieve great UX in user facing apps and rich and efficient communication between agents (somewhat speculative, but I can definitely imagine graphs of agents being well served by stateful bidirectional communication). It's still very early days, but most servers and clients aren't properly leveraging these features. I suspect this is because they are harder to implement, and there aren't many good examples of clients in the wild that support the features.

It's important for adoption that we don't add undue complexity/friction to client and server developers early on, but it's also important that we don't close doors on the aspects of the protocol that will enable the long-tail of great features.

The direction I'm currently leaning in

I really like @atesgoral's approach of progressive enhancement:

If a client needs real-time notifications, it would call some sort of "subscribe" method to get back an SSE URL

I feel like we could update the SSE transport (or just make a new transport) where:

All client->server messages go through HTTP POST requests (including initialization) and the responses i.e:

→ POST body contains:
  {
    "method": "tools/call",
    "params": {
      "name": "string",
      "arguments": {...}
    }
  }

← Response 200 contains:
  {
    "content": [
      // Array of TextContent, ImageContent, or EmbeddedResource
    ],
    "isError": false  // Optional, defaults to false
  }

→ POST body contains:
  {
    "method": "resources/read",
    "params": {
      "uri": "resource-uri"
    }
  }

← Response 200 contains:
  {
    "contents": [
      // Array of TextResourceContents or BlobResourceContents
    ]
  }

(Note: In the current SSE implementation all server->client messages come through the open SSE channel)

And all server initiated messages (i.e. notifications and sampling requests) come through an SSE stream that the client can optionally subscribe to.

The implementation of the SSE channel is optional for servers, allowing server implementers to get some value from MCP (tool calls, read resources, evaluate prompts, resource/prompt completions) without needing to support long lived connections.

Then, when server implementers and clients decide to implement the richer stateful features, they can implement the SSE channel and tackle the scaling implications.

These SSE channels could also be best effort, and it's okay for them to occasionally disconnect (i.e. when a deployment occurs).

Pros:

Easy for server implementers to deploy MCP at scale
Only tackle the complexity of long-lived connections if they want/need the additional features
Don't close the door to the richer bidi features of the protocol

Cons:

Most servers won't support the richer features
Some additional complexity of the transport implementation for clients

There are probably other issues with this that I haven't thought through

7 replies

atesgoral Jan 30, 2025
Collaborator

What's more, the real-time notification URL could also be a ws: URL, telling the client to either:

Open a unidirectional WebSockets stream for notifications
Move over to bidirectional WebSockets entirely. This is fun because it will effectively be a 2-step upgrade, from JSON-RPC to HTTP to WebSockets (using the Upgrade/Connection headers).

jspahrsummers Feb 3, 2025
Maintainer Author

This direction makes sense to me! I think this threads the needle well between simplicity while still supporting these features that we believe will be important—especially for agents.

Perhaps we could make SSE support a stronger requirement on servers, but allow disconnecting it at any time? Then a short-lived interaction but still supporting bidirectionality could look like:

Client periodically sends requests/notifications to server via HTTP POST
Client periodically connects to server over SSE
Server delivers any "queued" requests/notifications over SSE
If the server doesn't wish to support a long-lived connection, it then cleanly disconnects the SSE stream
[repeat]

pcingola Feb 3, 2025

I tihnk we all tend to agree that the crux of the server scaling problem is not as much about "state", but more about "long lived connections" (i.e. SSE).

As @jerome3o-anthropic, @jspahrsummers, @atesgoral , and others mentioned, if the SSE connections are "relatively short", and the servers are "allowed to disconnect" (i.e. best effort), the scaling issues should be minimized.

It seems to me the we are reaching the conclusion that SSE connections should be alive only during a "transaction cycle", e.g. request for an "agent / tool", and the agent uses sampling capabilities. Within this back and forth (which may last a few seconds, or up to a couple of minutes), we keep the SSE channel open, but then it's closed when the end of the "agent / tool cycle" is reached.

IMO the protocol "as is" allows this behaviour (perhaps with some minor additions to #153). The changes in specification are probably in the clarifications and examples on "how to implement" the client and server. Some changes will also be needed in the reference implementations, which I'm happy to contribute to if there is a decision to move forward.

artpi Feb 19, 2025

This is a very good direction!

At Automattic we are trying to use MCP for WordPress.

I managed to hack together native WordPress support for the SSE approach, but because PHP is kind of stateless, I had to:

Throw in a bunch of php,.ini directives to hack together a working SSE connection
Store responses in the database
Read them in a loop in the SSE endpoint.

This is very sub-optimal and I would love for the HTTP transport to just be query/response. The current SSE implementation frankly sounds like the STDIO approach ported directly to the web. HTTP/REST is a stateless protocol and yet there is a lot of hacks to make it behave statefully by using databases, memcache, etc etc.

The implementation of the SSE channel is optional for servers, allowing server implementers to get some value from MCP (tool calls, read resources, evaluate prompts, resource/prompt completions) without needing to support long lived connections.

YES PLEASE

I think these (+ future bidirectional) features will be important in the long run to achieve great UX in user facing apps and rich and efficient communication between agents (somewhat speculative, but I can definitely imagine graphs of agents being well served by stateful bidirectional communication)

Cons:
Most servers won't support the richer features

@jerome3o-anthropic I don't think the current SSE implementation is helping here. Again, it looks like a direct port of STDIO approach to the web, which is not how most web apps seem to operate.
The agentic workflows and larger systems will require long-running intermittent connections to operate and the way to do those would be to introduce

Webhooks
Message endpoints that would be polled /checked periodically.

Taking inspiration from payments

Come to think of it, these workflows could have similar latency to payments flows, where authorizing a credit card could take up to a minute, invoicing could take minutes and renewals could take years - its a mix of short-and-long term connections.

So from Stripe API we have:

Webhooks that you can set up on Stripe https://docs.stripe.com/api/v2/core/event_destinations
And an endpoint to retrieve all events https://docs.stripe.com/api/v2/core/events/list

Then a web-based client could send a webhook, or poll the server for an event.
I know this all requires clients to be a little more complicated, but I also think we'll end up in a world whree there is more servers than clients. thus we should optimize a bit for the ease of use of servers vs the clients.

In any event, your proposal of the SSE channel being optional is great.

Fraggle Feb 20, 2025

Hello, I have been following the discussions with great interest and I really like this approach too.

A few remarks about the server response in a stateless situation as described in the OP (maybe it's already covered somewhere else that I missed).

First, I think we should still use the JSONRPC format for the response instead of using another format.

Also, instead of only sending the final response directly, I believe it would be useful for the server to be able to stream back messages to the client in order to inform of progress during long running operations.
I understand it's somehow still server side events but in a transactional / serverless use-case. People using LLM are used to it as most LLM api are streaming.

It could be configured by the client header (Accept: application/json-rpc-stream) or in clientCapabilities, if not supported, the server would just discard all notifications/* messages.

(Maybe it's just simpler to always stream the answer)

Very exciting stuff anyway !

clareliguori · 2025-02-20T23:00:57Z

clareliguori
Feb 20, 2025

Thank you for this discussion!

If I'm understanding the current spec correctly, I think there are two categories of server>client communication to solve for over short-lived and/or interruptable connections, but today they are not distinguished between each other in the spec. I'm wondering if they should be, and if they should happen over distinct connections between client and server, instead of over one monolith streaming connection.

My rough stab at how that might look, without perfectly understanding the spec today:

Category 1. Notifications about changes to what the server can provide to the client

Examples: Resource/prompt/tool list changes, resource content changes

Use case: As the client application, I need to keep track of the resources/prompts/tools that a server can provide to me, so I can reason about using those resources/prompts/tools and/or present that list to the user. Streaming notifications from the server help me keep my local list of resources/prompts/tools up-to-date in real time. If I get disconnected, I can re-build my local list of resources/prompts/tools by calling the server's List/Get APIs, and then connect to a stream for updates. If a server does not support streaming updates, I can poll the server's List/Get APIs periodically to keep my local list up-to-date.

For servers that don't support streaming (or clients who don't want to stream):
Client periodically sends requests/notifications to server via HTTP POST

For servers that support streaming:

Client discovers server's resources/prompts/tools via HTTP POST
Client subscribes to change notifications over SSE
Server delivers change notifications over SSE
SSE connection disconnects (this could be a network issue, the server going through a deployment or scale down, etc)
Repeat 1-4

Category 2. Requests/notifications that are (hopefully?) directly related to some work that the client requested

Examples: Sampling requests, tool progress notifications, (logging?), (roots?)

Use case: As the client application, I want to use prompts, tools, and agents from a server. In the course of completing my request to the server, the server may need additional information from me (like LLM samples). Or, it may want to send me occasional updates like progress notifications and logs. I establish a bidirectional communication stream with the server, so that the server can send me the information and requests it needs to complete my work. If the stream is disconnected mid-way, the server may not be able to complete my request and I may need to start a new request.

In the spec today, there doesn't seem to be any kind of a "session ID" or "job ID" associated with a request that might take a while to complete and might require some back-and-forth communication. For example, sampling requests and progress notifications from server>client don't seem to be directly associated to the original tool call request initiated from client>server. It seems like today it is technically valid for a server to spam the client with sampling requests and root requests over the long-running connection, without the client ever actually using the server.

Let's assume that some kind of session ID is introduced that is assigned to requests from the client for using prompts/tools/agents. For certain types of server>client requests, they must be within the context of a session ID. The original request from the client can be upgraded to a stream for bidirectional communication for that session only. The server completes the session when it has completed the requested work. (I think this pattern is similar to the "transactions" @pcingola was describing in his comment above)

For servers that don't support streaming:
Client sends a tool call request to server via HTTP POST.
Connection is held open until the server returns the result.
If the connection breaks in the middle, the client must send a new request.

For servers that support streaming:

Client sends a tool call request to server via HTTP POST
The request is upgraded to SSE
Server delivers requests/notifications to the client via SSE
Server delivers final result over SSE
Server closes the SSE connection

If the connection breaks in the middle, the client must send a new request.

Optional: For servers that persist session state (for example, by session ID):
For non-streaming servers, the client could poll the server by session ID for any requests the server has for the client.

If the connection is broken (for either streaming or non-streaming servers), the client can make a request to get the results of a session ID. The result comes back immediately if the session is already complete. The request is resumed if the session ID exists. The behavior then depends on whether the server supports streaming, as above - either the HTTP request is held open until the result is ready, or a stream is started for bidirectional communication.

0 replies

ErenArslan · 2025-02-25T12:52:54Z

ErenArslan
Feb 25, 2025

We have solution. We manage transporter and server per connection.

We can handle multiple SSE remote sessions with this way.

import express, { RequestHandler } from "express";
import createServer from "./server.js";

// Add Winston logger for better logging
import winston from 'winston';

// Configure logger
const logger = winston.createLogger({
  level: 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.json()
  ),
  transports: [
    new winston.transports.Console(),
  ]
});

// Add uncaught exception handler
process.on('uncaughtException', (error) => {
  logger.error('Uncaught Exception:', { error: error.message, stack: error.stack });
  // Give time for logs to be written before potential pod restart
  setTimeout(() => process.exit(1), 1000);
});

process.on('unhandledRejection', (reason, promise) => {
  logger.error('Unhandled Rejection:', { reason, promise });
});



// Start receiving messages on stdin and sending messages on stdout
const transportMap = new Map<string, SSEServerTransport>();

const app = express();

// Add request logging middleware
app.use((req, res, next) => {
  const start = Date.now();
  res.on('finish', () => {
    const duration = Date.now() - start;
    logger.info('Request completed', {
      method: req.method,
      path: req.path,
      statusCode: res.statusCode,
      duration,
      userAgent: req.get('user-agent')
    });
  });
  next();
});

const sseHandler: RequestHandler = async (req, res) => {
  const transport = new SSEServerTransport("/messages", res);
  const server = createServer();
  res.setHeader('X-Accel-Buffering',"no");

  try {
    transportMap.set(transport.sessionId, transport);

    res.on('close', async () => {
      logger.info('SSE connection closed', { sessionId: transport.sessionId });
      await server.close()
      transportMap.delete(transport.sessionId);
    });

    await server.connect(transport);
    logger.info('SSE connection established successfully', { sessionId: transport.sessionId });
  } catch (error) {
    logger.error('Failed to establish SSE connection', {
      sessionId: transport.sessionId,
      error: error instanceof Error ? error.message : 'Unknown error',
      stack: error instanceof Error ? error.stack : undefined
    });
    transportMap.delete(transport.sessionId);
    res.status(500).end();
  }
};

const messageHandler: RequestHandler = async (req, res) => {
  const sessionId = req.query.sessionId as string;
  res.setHeader('X-Accel-Buffering',"no");

  console.log("Messages sessionId", sessionId);
  if (!sessionId) {
    logger.error('Message received without sessionId');
    res.status(400).json({ error: 'sessionId is required' });
    return;
  }

  const transport = transportMap.get(sessionId);
  if (!transport) {
    logger.error('No active transport found for session', { sessionId });
    res.status(404).json({ error: 'No active connection found for this session' });
    return;
  }

  try {
    logger.debug('Received message', { sessionId, body: req.body });
    await transport.handlePostMessage(req, res);
    logger.debug('Message handled successfully', { sessionId });
  } catch (error) {
    logger.error('Error handling message', {
      sessionId,
      error: error instanceof Error ? error.message : 'Unknown error',
      stack: error instanceof Error ? error.stack : undefined
    });
    res.status(500).json({ error: 'Internal server error' });
  }
};

app.get("/sse", sseHandler);
app.post("/messages", messageHandler);

const port = process.env.PORT || 3001;

// Add error handling for server startup
const httpServer = app.listen(port, () => {
  logger.info(`Server started`, { 
    port, 
    nodeEnv: process.env.NODE_ENV,
    pid: process.pid,
    memory: process.memoryUsage()
  });
});

// Handle server-specific errors
httpServer.on('error', (error: Error) => {
  logger.error('Server startup error:', {
    error: error.message,
    stack: error.stack,
    port,
    pid: process.pid
  });
  
  // Exit process on critical errors
  if ((error as any).code === 'EADDRINUSE') {
    logger.error('Port is already in use, exiting process');
    process.exit(1);
  }
});

// Handle process termination
process.on('SIGTERM', () => {
  logger.info('SIGTERM received, shutting down gracefully');
  httpServer.close(() => {
    logger.info('Server closed');
    process.exit(0);
  });
  
  // Force close if graceful shutdown fails
  setTimeout(() => {
    logger.error('Could not close server gracefully, forcing shutdown');
    process.exit(1);
  }, 10000);
});

process.on('SIGINT', () => {
  logger.info('SIGINT received, shutting down gracefully');
  httpServer.close(() => {
    logger.info('Server closed');
    process.exit(0);
  });
  
  // Force close if graceful shutdown fails
  setTimeout(() => {
    logger.error('Could not close server gracefully, forcing shutdown');
    process.exit(1);
  }, 10000);
});```

0 replies

atesgoral · 2025-03-01T05:18:21Z

atesgoral
Mar 1, 2025
Collaborator

At Shopify, we're so far mostly using what we call "MCP Lite": Just regular, transactional (POST and get the result in the HTTP response) JSON-RPC, and often just implementing the MCP tools/call method. This meets most of our current use cases since we have a fledgling ecosystem of internal tools with no immediate need for dynamic server/tool discovery. But that need is fast-approaching!

We have in fact done a PoC implementation of the JSON-RPC-SSE transport when it first came out, but as others in this thread have pointed out, it's awkward to implement: In podded deployments we are forced to use an inter-processes message passing mechanism to link the JSON-RPC POST request to the SSE stream.

I proposed progressive enhancement above, without pictures. Time for some pictures.

Selective notification subscription

"MCP Lite", using plain JSON-RPC. No SSE in sight. Very simple for adoption:

sequenceDiagram
    participant C as MCP Client
    participant S as MCP Server
    
    C->>+S: POST JSON-RPC tools/call
    S-->>-C: tool result

Important points:

initialize can be called for capability negotiation, but maybe there are smart/assumed defaults that make this step optional
tools/list can be called to discover tools, but if servers talking to each other already know what tools exist at the destination, they can skip this

Discovering notification URLs during initialization, splitting the MCP Server's JSON-RPC and notification endpoints for clarity:

sequenceDiagram
    participant C as MCP Client
    box MCP Server
        participant J as JSON-RPC Endpoint
        participant N as Notification Endpoint
    end
    
    C->>+J: POST JSON-RPC initialize
    J-->>-C: Notification URLs

    C->>N: Start streaming from a notification URL above

    N-->>C: event 1

    C->>+J: POST JSON-RPC tools/call
    J-->>-C: tool result

    N-->>C: event 2

Important points:

The notification stream returned from initialize can be zero or more transports supported by the server. It could also be a subset based on a capability query from the client, making it easy for the client to pick the first one that it ranks as preferable. HTTP poll URL, SSE URL, WebSocket URL, Kafka URL, ...
The JSON-RPC tool/call is completely unhindered by any notification streaming happening on some other pipe. They are on different planes of existence.

0 replies

atesgoral · 2025-03-01T05:46:43Z

atesgoral
Mar 1, 2025
Collaborator

Sampling without streaming (borderline crazy idea)

In an "MCP Lite" world (see above), how can MCP-server-initiated sampling work?

Borrowing from HTTP, where servers can emit different response codes to ask clients to take certain actions (e.g. provide credentials, redirect away and forget this URL, I'm busy backoff, etc.) the tools/call method could return a special result to prompt the client to process sampling, and then return the result to the MCP server as a "tool call continuation":

sequenceDiagram
    participant C as MCP Client
    participant S as MCP Server
    participant U as User
    participant L as LLM
    
    C->>+S: POST JSON-RPC tools/call
    S-->>-C: sampling request, continuation payload

    C->>+U: Get user approval
    U-->>-C: Go ahead

    C->>+L: Perform completion
    L-->-C: Completion

    C->>+U: Get user approval
    U-->>-C: Go ahead

    C->>+S: POST JSON-RPC tools/continue
    S-->>-C: tool result

Assumption: The MCP Server will never send an unsolicited sampling request to the client, but these will all be as a response to tool calls.

Abstractly, this treats the tool as a finite state machine. When sampling is needed, the state of the tool is bounced back to the client and the client can progress the state of the tool by passing it the state + completion to transition the state back to running. This "state" could simply be a tool call reference if the MCP Server is stateful and can persist the paused tool state on its side.

0 replies

atesgoral · 2025-03-01T06:36:25Z

atesgoral
Mar 1, 2025
Collaborator

Short-lived SSE as JSON-RPC response

I think others might have suggested or alluded to this already. Focusing on tool calling only:

POST to the JSON-RPC endpoint, get back an SSE response. The stream only lasts for the duration of a tool call.

Simple tool response over a single SSE event:

sequenceDiagram
    participant C as MCP Client
    participant S as MCP Server
    
    C->>+S: POST JSON-RPC tools/call
    S-->>-C: tool result over SSE

Certain implementations may support tools emitting intermediate diagnostic events or progress events usually meant for rendering on the UI.

sequenceDiagram
    participant C as MCP Client
    participant S as MCP Server
    
    C->>+S: POST JSON-RPC tools/call
    S-->>C: reticulating splines
    S-->>C: modulating frequencies
    S-->>-C: tool result over SSE

A tool can also emit one or more sampling requests over SSE (even at different times in its processing cycle) and the same continuation mechanism in my post above can be used to resume the tool when all sampling is completed.

1 reply

daviddenton Mar 3, 2025

This is exactly the approach we're taking for the MCP SDK which we're currently implementing in @http4k and we think it hits a good balance and is consistent from a client POV. This would be great as a standardised fallback for simple HTTP usage in the spec - we still require an accept header of text/eventstream content type. The only thing that is of question for us would be a standardised (default) URL convention on which this endpoint would be hosted. I can think of the following schemes:

Reusing /sse - this would be consistent, but would prohibit both request schemes being mixed on the same server.
Use /messages but with an accept header of text/eventstream to disambiguate it from the established mechanism. This of course could potentially break existing clients, but is HTTP compliant.
Using the method name as the endpoint path (eg. /tools/call). This would be easy to do but would be duplicative with what is already in the JSON RPC message received and I'm not sure it adds any value.
Something else we haven't thought of!

tristanz · 2025-03-03T23:13:40Z

tristanz
Mar 3, 2025

We've been struggling with this too. Long-lived connections are problematic for the reasons others have listed.

It seems like robust tool calling needs to satisfy two constraints:

Simple tools should be fast and familiar.
Long-running tools should be cancellable, streamable, and reliable.

Most cloud APIs solve this by having two types of endpoints:

Standard endpoints that return the result immediately, without support for streaming or cancellation.
Job endpoints that return a reference to a job that is cancellable, reliable, and streamable.

Good examples of this pattern are Google's AIP-151 for Long-Running Operations and Fal AI's Queue Endpoint. FAL's Queue API is a good reference implementation for long-running operations for models and tools that have streaming output.

Modifying this to MCP's JSON-RPC protocol would be relatively straightforward.

For simple tools:

tool/call -> immediate result

For long-running tools:

tool/call -> returns Operation reference (job_id)
operation/stream?id=xxx -> SSE stream of results
operation/cancel?id=xxx -> cancel operation
operation/get?id=xxx -> get current state/result

This gives you a stable job id that you can cancel and reconnect to regardless of connection stability.

This is slightly slightly more complicated than just upgrading to SSE on the initial call, as proposed in previous comments, but is easy to understand. I guess you could also support upgrading to SSE directly if optimizing was a priority, but conceptually there is a job.

Sequence diagrams end up like:

sequenceDiagram
    participant Client
    participant MCP as MCP (Job Manager)
    participant Tool
    
    %% Simple Tool Flow
    Client->>MCP: tool/call (simple tool)
    MCP->>Tool: Execute simple tool
    Tool-->>MCP: Result
    MCP-->>Client: Immediate response
    
    %% Long-running Tool Flow
    Client->>MCP: tool/call (long-running tool)
    MCP->>MCP: Create job record
    MCP->>Tool: Start job execution
    Note right of MCP: MCP tracks job state
    MCP-->>Client: Return Operation reference (job_id)
    
    Client->>MCP: operation/stream?id=xxx
    Tool-->>MCP: Job progress updates
    MCP-->>Client: Stream updates via SSE
    
    %% Optional Get/Cancel Flow
    opt Get Operation State
        Client->>MCP: operation/get?id=xxx
        MCP-->>Client: Current state/result
    end
    
    opt Cancel Operation
        Client->>MCP: operation/cancel?id=xxx
        MCP->>Tool: Cancel job execution
        Tool-->>MCP: Execution cancelled
        MCP-->>Client: Cancellation confirmed
    end

If there are needs for other types of notifications than job progress updates that seems like a separate Events API. I'd lean toward making that be done via reliable webhook delivery vs. a single long-lived SSE connection.

0 replies

madevoge · 2025-03-04T02:11:01Z

madevoge
Mar 4, 2025

I also believe Option 1 makes sense as a way to disconnect sockets from sessions.
I have a transport proposal #182 that goes into some of the implications of bi-directional traffic, such as the lifecycle and security.

From the discussion post, I would argue that we can keep the management of session context and state management up to the server to decide upon.
We can enable flexibility by being generous and undescriptive in the "session id" format.
Whether a server wants to encode data or include a Secure Access Signature as a pointer, this should be opaque to the client.

In terms of extensibility, a few additions that could be great but are not required to solve the long-running sessions:

we would not lock ourselves out of defining extensibility points for clients and servers to support an additional SSE "subscribing" channel to listen on a session.
Potentially allow the ability to "return early" with a response body when full async is not required for a given call.
When a server always returns in the same http request, this would essentially define a stateless server. The server maintainer could decide at any time to start leveraging the callback URI as the server evolves and starts operating more asynchronously.

0 replies

kentonv · 2025-03-04T20:42:59Z

kentonv
Mar 4, 2025

Coming to this thread a bit late, but speaking for Cloudflare Workers:

Statefulness is just fine for us. Durable Objects are all about handling stateful protocols. The original stateful MCP protocol over a WebSocket transport should be a great fit for MCP servers built on Workers.

A protocol involving session IDs would also be OK -- it's trivial for Workers to route requests with the same session ID to the same Durable Object, where its state is tracked. The main problem is lifecycle: if the MCP client disappears without explicitly ending the session, how does the MCP server decide when it can clean up? WebSockets are nice because you naturally clean up when the connection is closed. So MCP servers built on Workers would probably prefer a stateful WebSocket-based protocol, but could also live with session IDs.

I am not sure how a session token that "Encodes all session state itself" would work exactly, but it sounds like complexity that wouldn't benefit Workers users.

9 replies

atesgoral Mar 7, 2025
Collaborator

Going back to the motivation behind my suggestion above: After a regular POST initialize handshake, the returned WS stream URL can be a signed URL that can embody anything from authentication to some sort of session state (and thus no need for custom headers).

Kludex Mar 7, 2025

This is super useful information.

One thing we currently ran into is that we really want a potential Websocket connection to work with WebAPIs. It turns out the browser/webapi for WS doesn't support adding custom headers for authentication, etc. I am curious what people think are potential options to ensure if we do Websockets that we can have arbitrary headers (as allowed per HTTP standard), while keeping webapi compat.

What I usually recommend is to use the protocols parameter from the WebAPI and send the Authorization and the token as subprotocols.

So you can denial the websocket connection, and send a 401 HTTP response, because the connection was not upgraded yet.

Disclaimer: I maintain uvicorn and starlette. The server dependencies of the MCP package in Python.

daviddenton Mar 7, 2025

This is super useful information.

One thing we currently ran into is that we really want a potential Websocket connection to work with WebAPIs. It turns out the browser/webapi for WS doesn't support adding custom headers for authentication, etc. I am curious what people think are potential options to ensure if we do Websockets that we can have arbitrary headers (as allowed per HTTP standard), while keeping webapi compat.

From the JVM perspective - so speaking entirely selfishly since I operate mostly in that space🙃 - websocket server implementations are fairly badly featured/inconsistent/untestable/trickier to secure, so I believe requiring them would definitely hinder adoption in that space. Our current SDK model is to provide standard protocol options for SSE + WS + straight JsonRpc (no streaming) + StdIo.

That said, a regular POST plus redirect to Websocket makes good sense here from a usability and consistency viewpoint - ie if you want streaming then expect an endpoint to be returned from the initial call.

kentonv Mar 7, 2025

It's indeed annoying that the in-browser WebSocket API does not allow you to set headers. IMO they should allow you to set the Authorization header at least. But we don't control that.

I suspect most non-browser WebSocket implementations will let you set headers, since the specific security concerns motivating this restriction don't really apply outside a browser. Certainly Cloudflare Workers supports setting headers on outgoing WebSockets. Do we expect MCP clients to be running in-browser or do we expect this to be server-to-server communications?
The "recommended" approach is of course to send credentials as the first message on the WebSocket. Of course, this doesn't always fit nicely into HTTP frameworks that would like to do authorization as middleware.
A simple compromise is to just put the token in the URL. This might have some security concerns due to the fact that some systems may assume URLs are not secrets and might decide to log them and such, but perhaps that can be mitigated by using a single-use short-lived token? (I think this is what @daviddenton is suggesting, the initial POST would be authenticated and then redirect to a single-use secret URL.)
I haven't heard of @Kludex's approach before of using the protocols parameter, but that does seem like a neat hack and I can't think of a real problem with it!

My feeling right now is: Support the regular Authorization header and also support stuffing the token into the protocols as a work-around for browsers. Show what we've done to browser-makers and hope that it bothers them enough that they just start supporting setting the Authorization header directly in the API.

tristanz Mar 7, 2025

In multi-tenant setup, what's a session duration? The user, thread, connection, turn, tool call?

thdxr · 2025-03-07T02:30:23Z

thdxr
Mar 7, 2025

we're working on solving internal operations things over at SST for our users and letting them ship tools in a lambda is super important. it becomes a no brainer vs something they have to think about if it has to be containerized

option 2 is obviously the simplest for us - and we actually already built this in the short term so we can get moving. bridge mcp server that can talk to a stateless implementation of the mcp protocol hosted at some url

0 replies

ognis1205 · 2025-03-07T12:12:34Z

ognis1205
Mar 7, 2025

Following.

0 replies

gingerhendrix · 2025-03-07T13:30:14Z

gingerhendrix
Mar 7, 2025

For client -> server - Just remove the SSE transport from the spec and have everyone use stdio. Developers are free to implement any protocol they wish to connect to their web service and then expose the client as an MCP-server.

This is the "paving the cowpaths" way, it's what most MCP servers in the wild are already doing (e.g Dax's comment), and it leaves developers to come up with the best solution for their needs. This also leaves the door open to future standardization on (possibly multiple) protocols more suited to client -> server.

(MCP Host / Client) --- MCP/STDIO ---> (standardized MCP Server/Web Client) --- Some new thing ---> (Web Service)

The SSE transport could still be used - but now via a standard client npx mcp-see https://my-cool-tool.ai/mcp. Similar clients could be made for websockets etc.

I think server -> server is a completely different problem (i.e. the host application/mcp client is a web app) - but here tbh I think a completely different protocol would make more sense, so you can take advantage of standard conventions like http callbacks.

2 replies

daviddenton Mar 7, 2025

I think server -> server is a completely different problem (i.e. the host application/mcp client is a web app) - but here tbh I think a completely different protocol would make more sense, so you can take advantage of standard conventions like http callbacks.

For server->server we were working under the assumption that each server can simply connect to the other as a client - that would be simple and mean that each participant had access to the full range of tools from the other.

atesgoral Mar 7, 2025
Collaborator

Server-to-server, a server that hosts an agent (red process boundary) is also an MCP Client:

thiagovictorino · 2025-03-07T13:56:18Z

thiagovictorino
Mar 7, 2025

Good morning, folks!

Maybe I’m too unfamiliar with this subject to offer a fully informed opinion, but I can share my experience with MCP as a developer user.

From my perspective, I’d go all in with HTTP requests. It could significantly increase the number of available servers since it opens up opportunities for people to monetize them. In my experience with MCP, a single request is usually enough to get what I need—I don’t have to listen for ongoing updates. This makes synchronous communication simple to implement and straightforward to use.

I suggest keeping the current SSE approach but adding this new HTTP-based option, each with its own pros and cons. The server’s developer can then decide which protocol best suits their needs.

Just sharing my two cents—keep rocking!

0 replies

sean-roberts · 2025-03-07T19:28:17Z

sean-roberts
Mar 7, 2025

From my understanding, the biggest issue with supporting standard HTTP endpoint calls is that there isn't a means for the server to do sampling, same connection resource change subscriptions, or general server-initiated communication. But what if that is the trade-off we're asking for the spec to make? Every MCP server that wants to do server-initiated communication should use SSE/WS style transports. This is just the common expectation for the web in general - you have to either implement polling/SSE/WS to do it or you're looking at client-initiated transactions only.

I can only think of a handful of use cases that would want to support sampling, but countless that only want to expose their capabilities reliably using the systems they understand today. Will this change in the future? Maybe .. but we should build a system to support developers today with the option to evolve it to support future capabilities. This is why there is a capabilities negotiation layer.

3 replies

mbleigh Mar 7, 2025

I agree that looking in the ecosystem today it seems like >90% of all MCP servers are doing stateless things that do not require subscriptions or ongoing connections. Most of them are tools or prompts.

To that end, it seems like Option 2 could be pursued -- capabilities negotiation can determine whether a stateful connection is required, and servers and clients both can avoid the added complexities of persistent connections if they don't need it. Option 1 could be pursued in parallel to make it easier to build stateful connections on stateless infra (along with e.g. a webhook transport spec).

boxabirds Mar 8, 2025

Exactly. Consider game creation in AI coding agents. It'll be pretty wizzy for the agent to be able to convert a user's plain English request into media asset discovery / generation & procurement, directly into the project. At best it's stateless, and at worst it needs an auth token for context just like normal web services.

cramforce Mar 8, 2025

👍

HTTP has this right:

Stateless by default
Session token on initial connect if stateful
Use existing UPGRADE mechanism to switch to websocket when desirable

VoloBuilds · 2025-03-10T06:11:56Z

VoloBuilds
Mar 10, 2025

Personally, I'd go with Option 3.

MCP is supposed to make it easy for AI agents to integrate with tools and resources. This is a data integration problem. The industry standard for integrating data across platforms are REST APIs. This is what 99% of companies will already have up and running. The burden of integration for MCP is largely on the server developers - and expecting them to not only create a new set of endpoints but to run their software in an entirely different way (requiring long-running servers) feels absurd to me.

You could argue that it is to support additional capabilities. But the two main capabilities I am seeing above are 'sampling' and the server informing the client about updated resources/capabilities. The latter is easily solved 90% of the time by the client polling the server - and for the last 10%, the server can simply reply with a 400-level error.

As far as 'sampling' - I believe this is an anti-pattern and should be out of scope for MCP. If servers need AI capabilities to properly respond to tool/resource requests, they should implement that behind their API. They shouldn't have to depend on unpredictable AI capabilities of an unknown client. I don't think this capability should even be something that servers should be able to do. It creates security issues where servers can covertly request sensitive data that clients may have. It also adds unnecessary risk for client developers since servers can effective utilize the client's AI tokens. I'm not sure why a client developer would even build support for sampling given these concerns (what do they really have to gain?) - speaking of which, none of the current documented clients have support for sampling: https://modelcontextprotocol.io/clients

Any other more complex server-client interactions should be handled by multiple separate tool/resource calls.

In my opinion, a stateless version is an absolute must. Many developers are using serverless solutions and long-running servers/connections are a non-option for them. So at a minimum, we should go with Option 2. But I would go a step further and simplify the protocol by removing features which (in my opinion) shouldn't be there in the first place.

0 replies

State, and long-lived vs. short-lived connections #102

jspahrsummers Dec 6, 2024 Maintainer

Context

Problem

Possible solutions

Option 1: encapsulate state into a state or session token

Option 2: offer "stateless" and "stateful" variants of the protocol

Option 3: make all of MCP "stateless"

Thoughts?

Replies: 25 comments · 38 replies

jspahrsummers Dec 6, 2024 Maintainer Author

jspahrsummers Dec 6, 2024 Maintainer Author

jspahrsummers Dec 6, 2024 Maintainer Author

jspahrsummers Jan 2, 2025 Maintainer Author

atesgoral Jan 6, 2025 Collaborator

jspahrsummers Jan 6, 2025 Maintainer Author

jspahrsummers Jan 9, 2025 Maintainer Author

jerome3o-anthropic Jan 29, 2025 Maintainer

jerome3o-anthropic Jan 30, 2025 Maintainer

A bit of a recap of the problem

The direction I'm currently leaning in

atesgoral Jan 30, 2025 Collaborator

jspahrsummers Feb 3, 2025 Maintainer Author

Taking inspiration from payments

Category 1. Notifications about changes to what the server can provide to the client

Category 2. Requests/notifications that are (hopefully?) directly related to some work that the client requested

atesgoral Mar 1, 2025 Collaborator

Selective notification subscription

atesgoral Mar 1, 2025 Collaborator

Sampling without streaming (borderline crazy idea)

atesgoral Mar 1, 2025 Collaborator

Short-lived SSE as JSON-RPC response

atesgoral Mar 7, 2025 Collaborator

jspahrsummers
Dec 6, 2024
Maintainer

Replies: 25 comments 38 replies

jspahrsummers
Dec 6, 2024
Maintainer Author

jspahrsummers Dec 6, 2024
Maintainer Author

jspahrsummers
Dec 6, 2024
Maintainer Author

jspahrsummers Jan 2, 2025
Maintainer Author

atesgoral
Jan 6, 2025
Collaborator

jspahrsummers Jan 6, 2025
Maintainer Author

jspahrsummers Jan 9, 2025
Maintainer Author

jerome3o-anthropic Jan 29, 2025
Maintainer

jerome3o-anthropic
Jan 30, 2025
Maintainer

atesgoral Jan 30, 2025
Collaborator

jspahrsummers Feb 3, 2025
Maintainer Author

atesgoral
Mar 1, 2025
Collaborator

atesgoral
Mar 1, 2025
Collaborator

atesgoral
Mar 1, 2025
Collaborator

atesgoral Mar 7, 2025
Collaborator