An MCP server that executes build/test commands and routes the output to smaller LLMs to analyze the outputs and provide concise, actionable summaries while storing full outputs for detailed inspection when needed.
Blog post: https://gordles.io/blog/llm-friendly-test-suite-outputs-pytest-llm
The primary goal of this tool is to save context in the main coding agent thread by reducing the amount of tokens processed when executing builds or tests.
Instead of flooding Claude's context with thousands of lines of build output, this server:
- Executes any build/test command -
npm test
,pytest
,docker build
,cargo test
, etc. - Provides intelligent summaries - LLM analyzes output and gives the actual meat of what the main thread needs to do
- Stores full outputs - Access complete logs when you need detailed analysis
- Maintains history - Track builds over time with unique IDs
- Getting Started - Setup in 5 minutes
- Available Tools - Overview of all MCP tools
- Usage Examples - Real-world scenarios
- Configuration - API keys and settings
# Safe, provider-based commands only
run_build("/my-app", "npm", ["run", "test"])
run_build("/my-api", "pytest", ["--cov=src", "tests/"])
run_build("/my-container", "docker", ["build", "-t", "myapp", "."])
run_build("/my-python", "unittest", ["discover", "-s", "tests"])
Provider | Description | Common Flags | Example Usage |
---|---|---|---|
pytest | Python testing framework | --cov=src , --verbose , -x , --tb=short |
run_build("/app", "pytest", ["--cov=src", "tests/"]) |
unittest | Python built-in testing | discover , -s , -p , -v |
run_build("/app", "unittest", ["discover", "-s", "tests"]) |
npm | Node.js package manager | run , test , install , --coverage |
run_build("/app", "npm", ["run", "test", "--", "--coverage"]) |
docker | Container platform | build , run , -t , --no-cache |
run_build("/app", "docker", ["build", "-t", "myapp", "."]) |
- Smart summaries from OpenRouter LLMs (Mistral, Gemini, etc.)
- Error highlighting - Focuses on actionable failures
- Success metrics - Extract test counts, coverage, performance data
- Configurable models - Choose the right LLM for your analysis needs
- Automatic storage - Every build gets a unique ID
- Full text access - Retrieve complete stdout/stderr when needed
- Build history - Track builds over time
- Auto-cleanup - Manage disk space automatically
Perfect for development workflows:
- Run builds - Get instant intelligent summaries
- Debug failures - Access full logs for detailed analysis
- Track progress - Monitor builds over time
- Share results - Build IDs make collaboration easy
- Python 3.9+
- OpenRouter API key
- Claude Code CLI (recommended) or other MCP compatible Agent CLI
# Clone and setup
git clone https://github.com/your-username/build-output-tools-mcp.git
cd build-output-tools-mcp
# One-command setup
./run_server.sh
The setup script will:
- Create Python virtual environment
- Install dependencies
- Setup
.env
file - Optionally add to Claude Code
Edit .env
file with your OpenRouter API key:
OPENROUTER_API_KEY=your_openrouter_api_key_here
DEFAULT_MODEL=mistralai/mistral-small-3.2-24b-instruct-2506:free
In Claude Code:
# List available providers
list_providers()
# Run tests with specific provider
run_build(
project_path="/path/to/my-app",
provider="npm",
flags=["run", "test"]
)
Execute build/test commands using supported providers and get intelligent analysis.
Parameters:
project_path
(string) - Directory to run command inprovider
(string) - Build provider: "pytest", "unittest", "npm", or "docker"flags
(optional list) - List of flags/arguments to pass to the providertimeout
(optional int) - Command timeout in seconds (default: 600)model
(optional string) - LLM model for analysis
Returns:
- Summarized build or test results from a smaller LLM
- Build ID for retrieving full output
- Exit code and basic metrics
Example:
run_build(
project_path="/my-react-app",
provider="npm",
flags=["run", "test", "--", "--coverage"],
timeout=300,
model="openai/gpt-4o-mini"
)
Get complete stdout/stderr from any previous build.
Parameters:
build_id
(string) - Build ID from run_build result
Returns:
- Complete stdout and stderr text
- Command details and metadata
- Execution timestamps
Example:
# First run a build
result = run_build("/my-app", "npm", ["test"])
build_id = json.loads(result)["build_id"]
# Later, get full output for detailed analysis
full_output = get_build_output(build_id)
List recent builds with their IDs and summaries.
Parameters:
limit
(optional int) - Maximum builds to return (default: 10)
Returns:
- List of recent builds with metadata
- Build IDs for retrieving full outputs
Show supported build/test providers and example usage.
Returns:
- List of supported providers
- Example flags for each provider
- Usage guidance
Show available LLM models for analysis.
Returns:
- List of common OpenRouter models
- Default model configuration
Clean up old build outputs to save disk space.
Parameters:
max_age_days
(optional int) - Maximum age in days (default: 7)
# Run tests and get summary
result = run_build("/my-app", "npm", ["run", "test"])
# Output:
{
"status": "failed",
"summary": "Tests failed: 2 of 15 tests failed in UserAuth module. TypeError in login validation - expected string but received undefined.",
"build_id": "1704123456_1234",
"exit_code": 1
}
# Get full output for debugging
full_output = get_build_output("1704123456_1234")
# Access complete logs
stdout = json.loads(full_output)["stdout"]
stderr = json.loads(full_output)["stderr"]
# Analyze Docker builds
run_build(
project_path="/my-container-app",
provider="docker",
flags=["build", "-t", "myapp:latest", "."]
)
# Output might be:
{
"status": "success",
"summary": "Docker build completed successfully. Image size: 1.2GB. Build time: 3m 45s. All layers cached except final application layer.",
"build_id": "1704123789_5678"
}
# Run Python tests with coverage
run_build(
project_path="/my-python-api",
provider="pytest",
flags=["--cov=src", "--cov-report=term-missing", "tests/"],
model="anthropic/claude-3-haiku"
)
# Run unittest discovery
run_build(
project_path="/my-python-api",
provider="unittest",
flags=["discover", "-s", "tests", "-p", "test_*.py"]
)
# Check recent builds
history = list_build_history(5)
Configure in .env
file:
# OpenRouter API Configuration (Required)
OPENROUTER_API_KEY=your_openrouter_api_key_here
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
# Default model for analysis
DEFAULT_MODEL=mistralai/mistral-small-3.2-24b-instruct-2506:free
# Command timeout (seconds)
DEFAULT_TIMEOUT=600
Popular OpenRouter models for build analysis:
Fast & Free:
mistralai/mistral-small-3.2-24b-instruct-2506:free
google/gemini-flash-1.5-8b:free
deepseek/deepseek-r1-0528:free
qwen/qwen3-32b:free
google/gemini-2.0-flash-exp:free
mistralai/mistral-nemo:free
Balanced:
anthropic/claude-3-haiku
openai/gpt-4o-mini
High Quality:
anthropic/claude-3-sonnet
openai/gpt-4o
Build outputs are stored in build_outputs/
directory:
- JSON files with full command output
- Index file for quick lookups
- Automatic cleanup after 7 days (configurable)
The run_server.sh
script can automatically add the server to Claude Code:
./run_server.sh
# Choose 'Y' when prompted to add to Claude Code
claude mcp add build-output-tools -s user -- /path/to/.venv/bin/python /path/to/src/build_output_tools_mcp/server.py
# Stop the MCP server connection
./scripts/stop_server.sh
# Completely remove from Claude Code
./scripts/uninstall_server.sh
# Reinstall after removal
./run_server.sh
Add to claude_desktop_config.json
:
{
"mcpServers": {
"build-output-tools": {
"command": "/path/to/.venv/bin/python",
"args": ["/path/to/src/build_output_tools_mcp/server.py"]
}
}
}
MCP Server Connection Issues:
# Stop and restart the server
./scripts/stop_server.sh
./scripts/uninstall_server.sh
./run_server.sh
API Key Not Working:
# Check your .env file
cat .env | grep OPENROUTER_API_KEY
# Verify API key at https://openrouter.ai/
Command Timeouts:
# Increase timeout for long builds
run_build("/my-app", "npm", ["run", "build"], timeout=1200)
Storage Full:
# Clean up old builds
cleanup_old_builds(max_age_days=3)
# Run test suite
python -m pytest tests/ -v
# Test with sample providers
run_build("/tmp", "npm", ["--version"])
run_build("/tmp", "pytest", ["--help"])
# Discover available providers
providers = list_providers()
print(json.loads(providers)["supported_providers"]) # ["pytest", "unittest", "npm", "docker"]
# Invalid provider handling
result = run_build("/my-app", "invalid_provider", ["test"])
# Returns: {"status": "error", "error": "Unsupported provider: invalid_provider..."}
# Use specific models for different analysis needs
run_build("/my-app", "npm", ["test"], model="anthropic/claude-3-sonnet") # Deep analysis
run_build("/my-app", "npm", ["run", "lint"], model="google/gemini-flash-1.5") # Quick checks
# Get history and analyze patterns
history = list_build_history(50)
# Use build IDs to analyze common failure patterns
failing_builds = [b for b in history if not b["success"]]
We welcome contributions! Some potential areas for improvement:
- Direct API integration with providers - Removing the hard dependency on OpenRouter and adding API access for other model providers like Anthropic or OpenAI
- Safety checks for running test commands - The current method of using
subprocess
with a guaranteed start to the command likepytest
ordocker
is a good start, but there could be potential still for malicious activity workarounds. - Framework-specific parsers - Looking for contributors to add support for more testing frameworks!
- Build comparison - Diff analysis between builds
MIT License - see LICENSE file for details.
- Issues: GitHub Issues
- Documentation: This README
- API Reference: See tool descriptions above