Improvements for MCP-based agents #111

jspahrsummers · 2024-12-13T11:53:57Z

jspahrsummers
Dec 13, 2024
Maintainer

jerome3o-anthropic · 2024-12-13T12:02:51Z

jerome3o-anthropic
Dec 13, 2024
Maintainer

I think these are all great topics

Structured, formatted intermediate updates from server -> client, so a deep agent graph can provide information to the user even while a top-level tool is still being run

I think this should enable the bubbling of permission requests up from sub-agents to higher up agents/top level interaction

Namespacing

Something important here imo would be enabling top-level (or any intermediate layer) to have awareness of the topology of all nested agentic activity.

1 reply

sepo-eng Feb 20, 2025

I think this should enable the bubbling of permission requests up from sub-agents to higher up agents/top level interaction

Permission requests in this context is about HITL interactions I suppose. In this case, we can consider enabling structured events as either notifications or optionally, requests so the top level interaction can respond.

Referencing this, a stateless interaction with an agent might require continuation after an indefinite pause (for example HITL). In this case status checks, and listing on agentic runs will be useful to allow for continuation.

tristanz · 2025-01-05T19:45:08Z

tristanz
Jan 5, 2025

It would be useful to understand better how you see agents fitting into MCP at a conceptual level and in terms of user experience. Right now clients like Claude seem to play the role of a unified agent, since they execute the agent loop, and servers are simply capabilities/context being offered to this singular agent/client. Tools can do whatever they want, so can be agentic but this is irrelevant to the top level agent/client.

What is meant by "agent support" in the roadmap? Is MCP working toward a vision where there is a singular client/agent that users interact and this agent is empowered by being connected to many MCPs? Or are you thinking about switching to something more akin to GPTs/Gems/Agents where users interact with many different top level agents that have a clear identity. The gaps mentioned in this issue seem to be mostly just about support multiple servers better and providing additional capabilities to tools (e.g. direct responses and elicitation) rather than supporting multiple agents explicitly.

6 replies

tristanz Jan 6, 2025

Do you want MCP servers to expose agents as first class objects to users that they can @mention, similar to prompts, or do you want agents to be largely invisible to users, similar to tools.

jspahrsummers Jan 8, 2025
Maintainer Author

It's an interesting question! Not one I've contemplated a whole bunch, to be honest. What are your thoughts?

tristanz Jan 9, 2025

I think the most flexible and open approach would be to expose them at the top level or recursively expose them up through linked MCPs. This mirrors human communication networks and seems the most open long-term vision. If agents are exposed to users, clients become a distribution channel for third-party agents, and users have more choice. The alternative is a more locked-down vision where the purpose of MCPs is only to enhance the functionality of the client, and users conceptually think of MCPs more like applications that their primary agent/client (Claude) can use. I think both approaches are valid, and the choice depends on strategic goals. But as a user and as a potential developer of third-party agents, my preference would be to allow MCP agents to be exposed directly to end users so that I can talk with all my agents from any client.

jspahrsummers Jan 9, 2025
Maintainer Author

I think, regardless, MCP (being a protocol) cannot force clients to expose users directly to such concepts. We kind of have to assume that clients will intermediate everything anyway. But some way of making information about sub-agents available at the "top level" seems useful.

Thanks for the thoughts!

Mehdi-Bl Feb 6, 2025

But MCP is mainly function calls here too.
And MCP calls are limited by model output, yes you can loop it. But the main issue I see, if you rely only on MCP to transfert fully the context, you will use more context output that is far more limited than input. Like providing the files to use or all the informations extracted from the database in previous round.
As MCP is backed by function call. And you want to loop agents, why not then have agents too using function call and tell them, read this file or fetch this record. This would be more efficient for output context.

Mehdi-Bl · 2025-02-11T15:31:30Z

Mehdi-Bl
Feb 11, 2025

Structured, formatted intermediate updates from server -> client, so a deep agent graph can provide information to the user even while a top-level tool is still being run
Perhaps these could be represented as resources

This would be costly in context. As it means the model is stopping and making another function call? Or how?
Currently MCP remain a wrapper over function calling.
So the normal function calling workflow remain. Model understand it need to do a call, output the structured output to trigger our tool and wait for output. Once it gets the full output, it will reload the full context and "continue". Model don't remember previous state so we have to reload the full context again. Models are made for async calls. So until that issue is solved I don't see how you want to implement this or the need?
If I need step by step output and adapt, I will create an MCP server that do step by step and request back and forth.

0 replies

Zane-XY · 2025-02-21T02:45:50Z

Zane-XY
Feb 21, 2025

Let MCP be simple and focus on standardizing tools and resources, etc., rather than defining standards for Agentic Workflows. The protocol should not overcomplicate matters by forcing diverse use cases to conform to a singular MCP way of organizing/orchestrating agents.

0 replies

Kalmy8 · 2025-02-27T08:13:15Z

Kalmy8
Feb 27, 2025

Hi! Here are few theses I'd like to share with you:

For me, it seems like trees of agents concept is breaking the extisting client-server architeture

For trees of agents, it seems like all the nodes can both contain llm-calling logic (act as an MCP client) and provide resources/tools for them (act as an MCP server)
It is an interesting concept, but for me it feels like a whole different thing. For complicated logic (like trees of agents) we could use complicated langgraphs or some multi-agent framework (a bunch of them are available around)

For me, MCP now is a very handy an needed instrument to decouple model's tools, prompts and datafeeds from the llm loop execution logic. Sure it is a very useful thing for production-grade services, enhancing flexibility, scalability, testability, etc.

I do really miss the Server-->Client opportunities for my application,

this caused me to look up over this issues and disscussions to see if people do run into same questions

For example, let's take a Twitter MCP server. Imagine that this server can poll twitter to recieve new mentions from users, and then use MCP client (llm) to create responds for them

Currently, this logic can be executed in 2 ways:

MCP client itself polls Twitter MCP server to fetch new mentions via Twitter API -> send them to client -> client sends a response back to MCP server -> MCP server publishes a response via API

This approach is okay, but only if you are connecting to one server (Twitter server). If your agent should act on various platforms (Discords, youtube, tiktok, telegram...) you'll have to poll all those MCP servers as well, creating a lot of traffic and breaking the single responsibility principle inside the client

Twitter MCP server could instead perform Twitter-polling itself, autonomously, fetch new events and store them as a resource. MCP client can subscribe and recieve notifications for that resource.

This approach is way better in sense of multiple MCP servers, but again, you'll have to subscribe to all of that resource and maintain some resource registry, which does not feel right to me. I would instead love an opportunity to push all updates directly to some message broker, and MCP client would consume that messages, idk

0 replies

Kvadratni · 2025-02-28T17:53:03Z

Kvadratni
Feb 28, 2025

Proposal: Enhancing MCP to Support Provider-Independent Capabilities

Hello from Block,

While considering improvements to the Agent UI for Goose, I realized that MCP might need enhancements to better support provider-independent capabilities. Let me explain in detail.

Problem Statement

Imagine an agent that allows the use of any provider, such as Goose. Now, suppose we want to enable real-time audio communication. Since we don’t want to tie this feature to a specific provider, it makes sense to introduce it at the MCP server level. However, the UI should also be able to reflect this newly added capability, for example, by displaying a microphone button when audio communication is available.

Proposed Solution

If the MCP specification allowed servers or tools to declare the types of capabilities they provide (potentially as an enum of known categories), the Agent UI could dynamically adapt its controls based on available features. This could include elements such as:

A microphone button for real-time audio communication
A webcam toggle for video streaming
Other UI elements for reactive capabilities

Additionally, this would allow for more flexible settings management. If multiple MCPs provide the same capability, users could select which MCP instance should handle a given capability. This would enable the installation of MCPs with overlapping capabilities while ensuring that the preferred provider is chosen for each feature.

Benefits

Improved UI adaptability – The Agent UI can dynamically adjust to available capabilities.
Provider independence – Features can be added without binding to a specific provider.
More flexible settings – Users can configure MCP instances to handle overlapping capabilities.

Would love to hear your thoughts on this approach!

2 replies

Mehdi-Bl Feb 28, 2025

This is more about meta data. And is not providing any change for the underlaying Function calling that we should not forget MCP wrap.
This will be more extra meta data that only the UI use.
Similar like MCP could expose auto discovery of required parameters/defaults.

Kvadratni Feb 28, 2025

absolutely. I just think this should be a bit more formalized than just raw metadata?Is there a better topic for this suggestion?

hasani114 · 2025-03-04T21:07:54Z

hasani114
Mar 4, 2025

Great discussion here. One of the UX patterns that I've been thinking about requires the LLM to initiate a long running task using a tool while it continues interaction with the user. Only to get an update when the long running task is complete so it can provide user the desired information or take further action based on the output. Think something like Deep Research where instead of the Agent being locked until the research is done it can continue the conversation. Can this be done with the current implementation (using the sampling method) or would this require changes in the protocol?

This can be further expanded to allow the Agent to do multi-tasking, where it can invoke multiple tools and synthesize and refine its output as more information is provided to it.

3 replies

Mehdi-Bl Mar 4, 2025

This is not how LLM work and function calling work
You need to understand that MCP wraps function calling. Despite we have JSON RPC. The AI is doing things in serial not parallel.
So when the model understand it need to call a tool, it generates a structured output in the expected format and the wrapper then transfert that and call the tool, get the output and then get back.
I have similar needs and wanted to run similar loads. But then I'm working on background runner that would send the task in the queue and when finished inform me another way (not thru the LLM).

hasani114 Mar 4, 2025

I understand the sequential way in which current LLM function calls are usually implemented. But since we're talking about Agentic systems I'm assuming progress in getting them to work on longer running tasks would require changes in implementation. If a human is given a task that would require them to execute n number of subtasks doesn't mean they are no longer available to talk. While I agree the current Agent loop is sufficient when current LLM capabilities are taken into context they won't make sense for more autonomous longer running agents.

You can already see some of the UX being implemented in double texting (which is available in Langgraph) which allows a user to send multiple texts before the initial loop is complete. Essentially, the first run is "interrupted" and the execution is rolled back and restarted, but it would be better to allow the Agent to have two async trajectories. Just because an Agent is waiting for a tool call doesn't mean it cannot perform other actions or interact with the user. The UX seems more natural and intuitive to me, and I'm wondering if the protocol allows for something like this to be done OOB.

Mehdi-Bl Mar 5, 2025

I'm afraid you need an orchestrator here and this looks over kill for MCP.
May be your scope is swarms of agents. But as an MCP user/builder, I find we should fill the gaps to get 1:1 working fine.
Than run into the complexity for clusters of agents. This is not simple things what you want to do and more edge cases.

hasani114 · 2025-03-04T21:31:34Z

hasani114
Mar 4, 2025

In addition, we're working with multi-agent flows and the discussion regarding whether servers should be thought of as Tool Providers vs being fully Agentic. Instead of expecting servers to act as an Agent can we not have the Agent abstraction at the top level? So a session can have multiple "Agents" each with their own servers. And this information can be accessible to individual servers in case they want to invoke an available Agent (like sampling). I think it would be beneficial to have an Agent abstraction instead of nesting Agents within tools.

In case a tool requires access to a specific Agent with certain capabilities, it can ask the client to enable/download the Agent and authenticate/authorize them to act. This would be cleaner from a privacy, transparency, security perspective since the user would have more visibility and control over how their data is being passed around behind the tool call.

2 replies

Mehdi-Bl Mar 4, 2025

What you want here is more meta data and full scale tools that work outside of function calling.
We have tools/promptes/resources, then may be create another type? But this can't be in tools.

hasani114 Mar 4, 2025

Yes, that is my thinking as well.

Improvements for MCP-based agents #111

jspahrsummers Dec 13, 2024 Maintainer

Scope

Replies: 8 comments · 14 replies

jerome3o-anthropic Dec 13, 2024 Maintainer

jspahrsummers Jan 8, 2025 Maintainer Author

jspahrsummers Jan 9, 2025 Maintainer Author

For me, it seems like trees of agents concept is breaking the extisting client-server architeture

I do really miss the Server-->Client opportunities for my application,

Proposal: Enhancing MCP to Support Provider-Independent Capabilities

Problem Statement

Proposed Solution

Benefits

jspahrsummers
Dec 13, 2024
Maintainer

Replies: 8 comments 14 replies

jerome3o-anthropic
Dec 13, 2024
Maintainer

jspahrsummers Jan 8, 2025
Maintainer Author

jspahrsummers Jan 9, 2025
Maintainer Author