Casual MCP is a Python framework for building, evaluating, and serving LLMs with tool-calling capabilities using Model Context Protocol (MCP).
It includes:
- β A multi-server MCP client using FastMCP
- β Provider support for OpenAI (and OpenAI compatible APIs)
- β A recursive tool-calling chat loop
- β System prompt templating with Jinja2
- β A basic API exposing a chat endpoint
- Plug-and-play multi-server tool orchestration
- Prompt templating with Jinja2
- Configurable via JSON
- CLI and API access
- Extensible architecture
pip install casual-mcpOr for development:
git clone https://github.com/AlexStansfield/casual-mcp.git
cd casual-mcp
uv pip install -e .[dev]Providers allow access to LLMs. Currently, only an OpenAI provider is supplied. However, in the model configuration, you can supply an optional endpoint allowing you to use any OpenAI-compatible API (e.g., LM Studio).
Ollama support is planned for a future version, along with support for custom pluggable providers via a standard interface.
System prompts are defined as Jinja2 templates in the prompt-templates/ directory.
They are used in the config file to specify a system prompt to use per model.
This allows you to define custom prompts for each model β useful when using models that do not natively support tools. Templates are passed the tool list in the tools variable.
# prompt-templates/example_prompt.j2
Here is a list of functions in JSON format that you can invoke:
[
{% for tool in tools %}
{
"name": "{{ tool.name }}",
"description": "{{ tool.description }}",
"parameters": {
{% for param_name, param in tool.inputSchema.items() %}
"{{ param_name }}": {
"description": "{{ param.description }}",
"type": "{{ param.type }}"{% if param.default is defined %},
"default": "{{ param.default }}"{% endif %}
}{% if not loop.last %},{% endif %}
{% endfor %}
}
}{% if not loop.last %},{% endif %}
{% endfor %}
]π See the Programmatic Usage section to build configs and messages with typed models.
The CLI and API can be configured using a casual_mcp_config.json file that defines:
- π§ Available models and their providers
- π§° Available MCP tool servers
- π§© Optional tool namespacing behavior
{
"models": {
"lm-qwen-3": {
"provider": "openai",
"endpoint": "http://localhost:1234/v1",
"model": "qwen3-8b",
"template": "lm-studio-native-tools"
},
"gpt-4.1": {
"provider": "openai",
"model": "gpt-4.1"
}
},
"servers": {
"time": {
"command": "python",
"args": ["mcp-servers/time/server.py"]
},
"weather": {
"url": "http://localhost:5050/mcp"
}
}
}Each model has:
provider:"openai"(more to come)model: the model name (e.g.,gpt-4.1,qwen3-8b)endpoint: required for custom OpenAI-compatible backends (e.g., LM Studio)template: optional name used to apply model-specific tool calling formatting
Servers can either be local (over stdio) or remote.
command: the command to run the server, e.gpython,npmargs: the arguments to pass to the server as a list, e.g["time/server.py"]- Optional:
env: for subprocess environments,system_promptto override server prompt
url: the url of the mcp server- Optional:
transport: the type of transport,http,sse,streamable-http. Defaults tohttp
There are two environmental variables:
OPEN_AI_API_KEY: required when using theopenaiprovider, if using a local model with an openai compatible API it can be any stringTOOL_RESULT_FORMAT: adjusts the format of the tool result given back to the LLM. Options areresult,function_result,function_args_result. Defaults toresult
You can set them using export or by creating a .env file.
Start the API server.
Options:
--host: Host to bind (default0.0.0.0)--port: Port to serve on (default8000)
Loads the config and outputs the list of MCP servers you have configured.
$ casual-mcp servers
βββββββββββ³βββββββββ³ββββββββββββββββββββββββββββββββ³ββββββ
β Name β Type β Command / Url β Env β
β‘βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
β math β local β mcp-servers/math/server.py β β
β time β local β mcp-servers/time-v2/server.py β β
β weather β local β mcp-servers/weather/server.py β β
β words β remote β https://localhost:3000/mcp β β
βββββββββββ΄βββββββββ΄ββββββββββββββββββββββββββββββββ΄ββββββ
Loads the config and outputs the list of models you have configured.
$ casual-mcp models
βββββββββββββββββββββ³βββββββββββ³ββββββββββββββββββββββββββββ³βββββββββββββββββββββββββ
β Name β Provider β Model β Endpoint β
β‘ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
β lm-phi-4-mini β openai β phi-4-mini-instruct β http://kovacs:1234/v1 β
β lm-hermes-3 β openai β hermes-3-llama-3.2-3b β http://kovacs:1234/v1 β
β lm-groq β openai β llama-3-groq-8b-tool-use β http://kovacs:1234/v1 β
β gpt-4o-mini β openai β gpt-4o-mini β β
β gpt-4.1-nano β openai β gpt-4.1-nano β β
β gpt-4.1-mini β openai β gpt-4.1-mini β β
β gpt-4.1 β openai β gpt-4.1 β β
βββββββββββββββββββββ΄βββββββββββ΄ββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββ
You can import and use the core framework in your own Python code.
Orchestrates LLM interaction with tools using a recursive loop.
from casual_mcp import McpToolChat
from casual_mcp.tool_cache import ToolCache
from casual_mcp.models import SystemMessage, UserMessage
tool_cache = ToolCache(mcp_client)
chat = McpToolChat(mcp_client, provider, system_prompt, tool_cache=tool_cache)
# Generate method to take user prompt
response = await chat.generate("What time is it in London?")
# Generate method with session
response = await chat.generate("What time is it in London?", "my-session-id")
# Chat method that takes list of chat messages
# note: system prompt ignored if sent in messages so no need to set
chat = McpToolChat(mcp_client, provider, tool_cache=tool_cache)
messages = [
SystemMessage(content="You are a cool dude who likes to help the user"),
UserMessage(content="What time is it in London?")
]
response = await chat.chat(messages)Instantiates LLM providers based on the selected model config.
from casual_mcp import ProviderFactory
provider_factory = ProviderFactory(mcp_client, tool_cache=tool_cache)
provider = await provider_factory.get_provider("lm-qwen-3", model_config)βΉοΈ Tool catalogues are cached to avoid repeated
ListToolscalls. The cache refreshes every 30 seconds by default. Override this with theMCP_TOOL_CACHE_TTLenvironment variable (set to0or a negative value to cache indefinitely).
Loads your casual_mcp_config.json into a validated config object.
from casual_mcp import load_config
config = load_config("casual_mcp_config.json")Creats a multi server FastMCP client from the config object
from casual_mcp import load_mcp_client
config = load_mcp_client(config)Exported models:
- StdioServerConfig
- RemoteServerConfig
- OpenAIModelConfig
Use these types to build valid configs:
from casual_mcp.models import OpenAIModelConfig, StdioServerConfig
model = OpenAIModelConfig(model="llama3", endpoint="http://...")
server = StdioServerConfig(command="python", args=["time/server.py"])Exported models:
- AssistantMessage
- SystemMessage
- ToolResultMessage
- UserMessage
Use these types to build message chains:
from casual_mcp.models import SystemMessage, UserMessage
messages = [
SystemMessage(content="You are a friendly tool calling assistant."),
UserMessage(content="What is the time?")
]from casual_mcp import McpToolChat, load_config, load_mcp_client, ProviderFactory
from casual_mcp.models import SystemMessage, UserMessage
model = "gpt-4.1-nano"
messages = [
SystemMessage(content="""You are a tool calling assistant.
You have access to up-to-date information through the tools.
Respond naturally and confidently, as if you already know all the facts."""),
UserMessage(content="Will I need to take my umbrella to London today?")
]
# Load the Config from the File
config = load_config("casual_mcp_config.json")
# Setup the MCP Client
mcp_client = load_mcp_client(config)
# Get the Provider for the Model
provider_factory = ProviderFactory(mcp_client)
provider = await provider_factory.get_provider(model, config.models[model])
# Perform the Chat and Tool calling
chat = McpToolChat(mcp_client, provider)
response_messages = await chat.chat(messages)casual-mcp serve --host 0.0.0.0 --port 8000model: the LLM model to usemessages: list of chat messages (system, assistant, user, etc) that you can pass to the api, allowing you to keep your own chat session in the client calling the api
{
"model": "gpt-4.1-nano",
"messages": [
{
"role": "user",
"content": "can you explain what the word consistent means?"
}
]
}
The generate endpoint allows you to send a user prompt as a string.
It also support sessions that keep a record of all messages in the session and feeds them back into the LLM for context. Sessions are stored in memory so are cleared when the server is restarted
model: the LLM model to useprompt: the user promptsession_id: an optional ID that stores all the messages from the session and provides them back to the LLM for context
{
"session_id": "my-session",
"model": "gpt-4o-mini",
"prompt": "can you explain what the word consistent means?"
}
Get all the messages from a session
This software is released under the MIT License