Skip to content

Latest commit

 

History

History
848 lines (685 loc) · 28.8 KB

File metadata and controls

848 lines (685 loc) · 28.8 KB

Technical Architecture Guide

Vibe Check MCP: Anti-Pattern Detection System

Version: 1.0
Date: January 2025
Purpose: Technical architecture documentation for engineering anti-pattern detection and educational coaching platform
Architecture: Independent FastMCP Server with dual-mode analysis


Executive Summary

This document outlines the technical architecture of Vibe Check MCP, a proven anti-pattern detection system that prevents systematic engineering failures through real-time pattern detection and educational coaching. The system is built as an independent FastMCP server designed for seamless integration with Claude Code workflows.

Core Architecture Principles:

  • Validation-First Development: Proven detection algorithms before infrastructure building
  • Educational Focus: Explain WHY patterns are problematic with real-world case studies
  • Dual-Mode Analysis: Fast pattern detection + Deep Claude-powered reasoning
  • MCP-Native Design: Built specifically for Model Context Protocol integration

System Architecture

High-Level Architecture

Vibe Check MCP - Independent MCP Server Architecture
┌─────────────────────────────────────────────────────────────┐
│                    MCP Protocol Layer                      │
│  • Standardized tool/resource interfaces                   │
│  • JSON-RPC communication with Claude Code                 │
│  • Client-agnostic connectivity                            │
├─────────────────────────────────────────────────────────────┤
│                    FastMCP Server Core                     │
│  ┌─────────────────┬─────────────────┬───────────────────┐  │
│  │   MCP Tools     │ Educational     │   CLI Interface  │  │
│  │                 │ Content Engine  │                   │  │
│  │  • analyze_issue│ • WHY explanations│ • Fallback mode│  │
│  │  • analyze_code │ • HOW remediation │ • Power users  │  │
│  │  • analyze_text │ • Case studies   │ • Testing      │  │
│  │  • server_status│ • Confidence     │ • Development  │  │
│  └─────────────────┴─────────────────┴───────────────────┘  │
├─────────────────────────────────────────────────────────────┤
│                    Anti-Pattern Detection Engine           │
│  • Python AST analysis for structural patterns             │
│  • Regex-based pattern matching for text analysis          │
│  • Context-aware analysis with confidence scoring          │
│  • Knowledge base integration for case studies             │
├─────────────────────────────────────────────────────────────┤
│                  Contextual Documentation System           │
│  ┌─────────────────┬─────────────────┬───────────────────┐  │
│  │ Library Detection│ Project Docs   │ Context-Aware    │  │
│  │ Engine          │ Parser         │ Analysis         │  │
│  │                 │                │                   │  │
│  │ • Tech stack    │ • README.md    │ • Library-specific│  │
│  │   scanning      │ • CONTRIBUTING │   recommendations │  │
│  │ • Dependency    │ • Architecture │ • Project-aware   │  │
│  │   analysis      │   decisions    │   personas        │  │
│  │ • Confidence    │ • Team         │ • Contextual      │  │
│  │   scoring       │   conventions  │   mentoring       │  │
│  └─────────────────┴─────────────────┴───────────────────┘  │
├─────────────────────────────────────────────────────────────┤
│                    Integration Services                     │
│  • GitHub API client for issue/PR analysis                 │
│  • External Claude CLI integration for deep reasoning      │
│  • Configuration management system                         │
│  • Error handling and graceful degradation                 │
└─────────────────────────────────────────────────────────────┘

Component Breakdown

1. MCP Protocol Layer

  • FastMCP Framework: Provides MCP protocol compliance and tool registration
  • JSON-RPC Communication: Standard protocol for Claude Code integration
  • Tool Schema Definition: Structured interfaces for all analysis tools
  • Error Handling: Graceful degradation with helpful error messages

2. Core Analysis Engine

  • PatternDetector: Main detection engine using AST analysis and regex patterns
  • EducationalContentGenerator: Multi-level educational responses with case studies
  • ConfidenceScorer: Weighted confidence calculation for pattern detection
  • KnowledgeBase: Structured storage of patterns, case studies, and educational content

3. Contextual Documentation System

  • LibraryDetectionEngine: Scans project files and dependencies to identify technology stack
  • ProjectDocumentationParser: Extracts context from project docs (README, CONTRIBUTING, etc.)
  • AnalysisContext: Unified context container with conflict resolution and library-specific guidance
  • ContextualDocumentationManager: Main orchestration class for project-aware analysis

4. Integration Services

  • GitHub Integration: API client for fetching issues, PRs, and posting comments
  • External Claude CLI: Integration for deep LLM-powered analysis
  • Configuration Management: Environment-based configuration with secure defaults

Contextual Documentation System Architecture

Overview

The Contextual Documentation System transforms vibe-check from generic pattern detection to project-aware analysis by understanding the specific technology stack, team conventions, and codebase patterns.

3-Layer Architecture

Layer 1: Library Detection Engine

class LibraryDetectionEngine:
    """
    Automatically detects technology stack from project files and dependencies.
    
    Detection Methods:
    - Dependency file analysis (requirements.txt, package.json, pyproject.toml)
    - Import pattern matching (from fastapi import, import React)
    - Code pattern detection (@app.get, useState, useEffect)
    - File extension mapping (.tsx → React, .py + FastAPI patterns)
    """
    
    def detect_libraries(self, project_root: str) -> Dict[str, float]:
        # Returns: {"fastapi": 0.95, "react": 0.92, "supabase": 0.87}

Performance Characteristics:

  • Scan Speed: ~200-400ms for 1000 files
  • Memory Usage: <50MB for large projects
  • Accuracy: 95%+ for common frameworks
  • File Limits: Configurable max files (default: 1000)

Layer 2: Project Documentation Parser

class ProjectDocumentationParser:
    """
    Extracts project-specific context from documentation files.
    
    Parsed Sources:
    - README.md → Project overview and main technologies  
    - CONTRIBUTING.md → Team conventions and coding standards
    - ARCHITECTURE.md → Architecture decisions and patterns
    - docs/TECHNICAL.md → Technical implementation details
    """
    
    def parse_project_docs(self, project_root: str) -> Dict[str, Any]:
        # Returns: {
        #   "team_conventions": [...],
        #   "architecture_decisions": [...],
        #   "technology_stack": [...]
        # }

Layer 3: Context-Aware Analysis Engine

class AnalysisContext:
    """
    Unified context container providing library-specific recommendations.
    
    Features:
    - Library-specific guidance based on detected stack
    - Project pattern exceptions and conflict resolution
    - Integration with mentor personas for contextual advice
    """
    
    def get_contextual_recommendation(self, pattern_type: str) -> str:
        # Returns library-specific advice instead of generic patterns

Configuration: .vibe-check/ Directory Structure

your-project/
├── .vibe-check/
│   ├── config.json              # Main configuration
│   │   ├── context_loading      # Library detection settings
│   │   ├── libraries           # Library-specific overrides
│   │   ├── project_patterns    # Project-specific patterns
│   │   └── exceptions          # Approved pattern exceptions
│   ├── pattern-exceptions.json  # Detailed exception reasoning
│   ├── library-context.json     # Cached library detection results
│   └── context-cache/           # Downloaded library documentation
└── src/
    └── your-code/

New MCP Tools

1. detect_project_libraries

Scans project and returns detected technology stack with confidence scores

{
  "tool": "detect_project_libraries",
  "arguments": {
    "project_root": ".",
    "languages": ["python", "javascript", "typescript"],
    "max_files": 1000,
    "timeout_seconds": 30
  }
}

Response Schema:

{
  "libraries": {
    "fastapi": 0.95,
    "react": 0.92, 
    "supabase": 0.87
  },
  "scan_duration_ms": 245,
  "files_scanned": 156,
  "detection_confidence": 0.91,
  "errors": []
}

2. load_project_context

Loads comprehensive project context for contextual analysis

{
  "tool": "load_project_context",
  "arguments": {
    "project_root": ".",
    "include_library_docs": true,
    "cache_duration_minutes": 60
  }
}

3. create_vibe_check_directory_structure

Sets up .vibe-check/ configuration directory with defaults

Integration Knowledge Base Enhancement

The system uses an enhanced integration_knowledge_base.json with:

{
  "fastapi": {
    "library_type": "backend_framework",
    "detection_patterns": {
      "imports": ["from fastapi", "FastAPI"],
      "dependencies": ["fastapi", "uvicorn"],
      "file_extensions": [".py"],
      "specific_patterns": ["@app.get", "@app.post", "@router"]
    },
    "versions": {
      "0.100+": {
        "best_practices": ["dependency-injection", "async-preferred"],
        "anti_patterns": ["synchronous-endpoints", "manual-validation"],
        "context_7_cache": "/context-cache/fastapi-latest-docs.md"
      }
    },
    "red_flags": ["custom-auth-over-oauth", "manual-cors-setup"]
  }
}

Performance Optimizations

Intelligent Caching

  • Library detection results: Cached for 60 minutes
  • Documentation content: Cached locally in .vibe-check/context-cache/
  • Project context: Loaded lazily on first analysis

Scan Limits & Timeouts

{
  "max_files_to_scan": 1000,
  "timeout_seconds": 30,
  "max_file_size_kb": 500,
  "parallel_processing": true
}

Memory Management

  • Project-aware context manager caching
  • Automatic cleanup of old cached contexts
  • Memory limits for cache size (default: 100MB)

Security Considerations

Safe File Reading

  • UTF-8 encoding with graceful fallback to latin-1
  • File size limits to prevent memory exhaustion
  • Path traversal protection
  • Error isolation - failed files don't break analysis

Privacy Protection

  • No external API calls for basic library detection
  • Library documentation cached locally
  • Project context stays on local machine
  • Optional Context7 integration for enhanced docs

Core Detection Engine

Pattern Detection Architecture

# Core detection workflow
class PatternDetector:
    def __init__(self):
        self.knowledge_base = KnowledgeBase()
        self.confidence_scorer = ConfidenceScorer()
        self.patterns = self.load_anti_patterns()
    
    def analyze(self, content: str, context: str) -> List[DetectedPattern]:
        """Main analysis pipeline"""
        detected_patterns = []
        
        for pattern_id, pattern_config in self.patterns.items():
            detection_result = self._detect_pattern(
                content=content,
                pattern_config=pattern_config,
                context=context
            )
            
            if detection_result.detected:
                detected_patterns.append(detection_result)
        
        return self._rank_by_confidence(detected_patterns)

Pattern Definition Structure

{
  "infrastructure_without_implementation": {
    "name": "Infrastructure Without Implementation",
    "description": "Building custom solutions before testing standard approaches",
    "severity": "high",
    "detection_threshold": 0.6,
    "indicators": [
      {
        "regex": "(?i)custom\\s+(http|api)\\s+client",
        "weight": 0.4,
        "description": "Custom HTTP client mentioned"
      },
      {
        "regex": "(?i)build\\s+(from\\s+)?scratch",
        "weight": 0.3,
        "description": "Building from scratch mentioned"
      }
    ],
    "educational_content": {
      "why_problematic": "Building custom infrastructure before validating standard approaches led to wasted development time in the Cognee case study.",
      "case_study": "cognee_integration_failure",
      "prevention_checklist": [
        "Research official SDK documentation",
        "Test basic integration with 10 lines of code",
        "Document why standard approach is insufficient"
      ]
    }
  }
}

Confidence Scoring Algorithm

class ConfidenceScorer:
    def calculate_confidence(self, indicators_found: List[Indicator]) -> float:
        """Calculate weighted confidence score"""
        total_weight = sum(indicator.weight for indicator in indicators_found)
        
        # Apply diminishing returns for multiple weak indicators
        if len(indicators_found) > 3:
            total_weight *= 0.9
        
        # Context-specific adjustments
        if self._has_contradictory_evidence(indicators_found):
            total_weight *= 0.7
        
        return min(total_weight, 1.0)

MCP Tools Implementation

Tool Registry Pattern

# FastMCP tool registration
from fastmcp import FastMCP

mcp = FastMCP("Vibe Check MCP")

@mcp.tool()
def analyze_github_issue(
    issue_number: int,
    repository: str = "kesslerio/vibe-check-mcp",
    analysis_mode: str = "quick",
    post_comment: bool = False,
    detail_level: str = "standard"
) -> dict:
    """
    Analyze GitHub issue for engineering anti-patterns.
    
    Supports dual-mode operation:
    - quick: Fast pattern detection without LLM calls
    - comprehensive: Deep Claude CLI analysis with educational content
    """

Dual-Mode Analysis Architecture

class DualModeAnalyzer:
    def __init__(self):
        self.fast_analyzer = FastPatternAnalyzer()
        self.deep_analyzer = ExternalClaudeAnalyzer()
    
    def analyze(self, content: str, mode: str) -> AnalysisResult:
        if mode == "quick":
            return self.fast_analyzer.analyze(content)
        elif mode == "comprehensive":
            return self.deep_analyzer.analyze(content)
        else:
            raise ValueError(f"Unknown analysis mode: {mode}")

Tool Schema Design

{
  "type": "object",
  "properties": {
    "issue_number": {
      "type": "integer",
      "description": "GitHub issue number to analyze"
    },
    "analysis_mode": {
      "type": "string",
      "enum": ["quick", "comprehensive"],
      "description": "Analysis depth - quick for fast feedback, comprehensive for deep reasoning"
    },
    "detail_level": {
      "type": "string", 
      "enum": ["brief", "standard", "comprehensive"],
      "description": "Educational content detail level"
    }
  },
  "required": ["issue_number"]
}

Educational Content System

Content Generation Pipeline

class EducationalContentGenerator:
    def generate_response(
        self, 
        patterns: List[DetectedPattern],
        detail_level: str = "standard"
    ) -> EducationalResponse:
        """Generate multi-level educational content"""
        
        if not patterns:
            return self._generate_positive_feedback()
        
        primary_pattern = max(patterns, key=lambda p: p.confidence)
        
        content = {
            "summary": self._generate_summary(patterns),
            "primary_concern": self._explain_pattern(primary_pattern),
            "why_problematic": self._get_why_explanation(primary_pattern),
            "case_study": self._get_case_study(primary_pattern),
            "prevention_checklist": self._get_prevention_steps(primary_pattern),
            "alternative_approaches": self._get_alternatives(primary_pattern)
        }
        
        return self._format_by_detail_level(content, detail_level)

Case Study Integration

class CaseStudyManager:
    def get_case_study(self, pattern_type: str) -> CaseStudy:
        """Retrieve relevant real-world case study"""
        case_studies = {
            "infrastructure_without_implementation": {
                "title": "Cognee Integration Learning Experience", 
                "timeline": "Several days of development time",
                "root_cause": "Built custom HTTP servers instead of using cognee.add() → cognee.cognify() → cognee.search()",
                "impact": "Delayed integration due to unnecessary complexity",
                "lesson": "Always test standard API approaches with minimal POC before building custom infrastructure"
            }
        }
        
        return case_studies.get(pattern_type, self._get_default_case_study())

External Claude CLI Integration

Architecture for Deep Analysis

class ExternalClaudeIntegration:
    def __init__(self):
        self.claude_cli_name = os.getenv("CLAUDE_CLI_NAME", "claude")
        self.timeout_seconds = 60
        
    def analyze_with_claude(
        self, 
        content: str, 
        task_type: str = "general"
    ) -> str:
        """Execute Claude CLI for deep reasoning"""
        
        system_prompt = self._get_system_prompt(task_type)
        
        try:
            result = subprocess.run([
                self.claude_cli_name, 
                "-p", 
                f"{system_prompt}\n\nAnalyze this content:\n{content}"
            ], 
            capture_output=True, 
            text=True, 
            timeout=self.timeout_seconds
            )
            
            if result.returncode == 0:
                return self._parse_claude_response(result.stdout)
            else:
                raise ClaudeIntegrationError(f"Claude CLI failed: {result.stderr}")
                
        except subprocess.TimeoutExpired:
            raise ClaudeIntegrationError("Claude CLI timeout - see diagnostics")

System Prompts for Specialized Analysis

def _get_system_prompt(self, task_type: str) -> str:
    """Get specialized system prompt for analysis type"""
    prompts = {
        "code_analysis": """
        You are an expert software architect focused on anti-pattern detection.
        Analyze the provided code for these specific patterns:
        1. Infrastructure Without Implementation - custom solutions vs standard APIs
        2. Complexity Escalation - unnecessary abstraction layers
        3. Symptom-Driven Development - treating symptoms vs root causes
        
        Provide educational explanations with real-world consequences.
        """,
        
        "issue_analysis": """
        You are an engineering coach specializing in preventing systematic failures.
        Review this GitHub issue for planning anti-patterns that lead to technical debt.
        Focus on the Cognee case study lessons about validating standard approaches first.
        """,
        
        "pr_review": """
        You are a senior technical reviewer with expertise in anti-pattern prevention.
        Review this PR for patterns that compound into long-term maintenance issues.
        Provide constructive coaching with specific improvement suggestions.
        """
    }
    
    return prompts.get(task_type, prompts["code_analysis"])

Performance & Scalability

Caching Strategy

class AnalysisCache:
    def __init__(self, ttl_hours: int = 1):
        self.cache_dir = Path("~/.vibe-check/cache").expanduser()
        self.ttl = timedelta(hours=ttl_hours)
    
    def get_cached_result(self, content_hash: str) -> Optional[dict]:
        """Retrieve cached analysis if still valid"""
        cache_file = self.cache_dir / f"{content_hash}.json"
        
        if not cache_file.exists():
            return None
            
        cached_data = json.loads(cache_file.read_text())
        cache_time = datetime.fromisoformat(cached_data["timestamp"])
        
        if datetime.now() - cache_time > self.ttl:
            cache_file.unlink()  # Remove expired cache
            return None
            
        return cached_data["result"]

Async Analysis Pipeline

class AsyncAnalyzer:
    async def analyze_multiple_items(
        self, 
        items: List[AnalysisItem]
    ) -> List[AnalysisResult]:
        """Analyze multiple items concurrently"""
        
        semaphore = asyncio.Semaphore(3)  # Limit concurrent analyses
        
        async def analyze_with_limit(item):
            async with semaphore:
                return await self._analyze_single_item(item)
        
        tasks = [analyze_with_limit(item) for item in items]
        return await asyncio.gather(*tasks)

Testing Strategy

Test Architecture

# Core detection testing
class TestPatternDetection:
    def test_infrastructure_pattern_detection(self):
        """Test detection of infrastructure-without-implementation"""
        detector = PatternDetector()
        
        # Known anti-pattern content
        issue_content = """
        We need to integrate with Stripe API for payments.
        I'm planning to build a custom HTTP client since their SDK 
        might be limiting for our use case.
        """
        
        patterns = detector.analyze_issue_content(
            title="Custom Stripe Integration",
            body=issue_content
        )
        
        assert len(patterns) > 0
        assert patterns[0]["type"] == "infrastructure_without_implementation"
        assert patterns[0]["confidence"] > 0.7

    def test_no_false_positives(self):
        """Ensure good practices don't trigger false positives"""
        detector = PatternDetector()
        
        # Good practice content
        good_content = """
        We need to integrate with Stripe for payments.
        I've reviewed their official Python SDK documentation
        and will use stripe.Customer.create() and stripe.PaymentIntent.confirm()
        as recommended in their quickstart guide.
        """
        
        patterns = detector.analyze_issue_content(
            title="Stripe Integration Using Official SDK", 
            body=good_content
        )
        
        # Should not detect any anti-patterns
        assert len(patterns) == 0

Integration Testing

@pytest.mark.integration
class TestMCPIntegration:
    def test_analyze_github_issue_tool(self):
        """Test the full MCP tool pipeline"""
        result = analyze_github_issue(
            issue_number=1,
            repository="kesslerio/vibe-check-mcp-test",
            analysis_mode="quick"
        )
        
        assert "analysis_mode" in result
        assert "patterns_detected" in result  
        assert "educational_content" in result
        assert isinstance(result["patterns_detected"], list)

Security & Privacy

Data Handling Principles

class PrivacyManager:
    """Ensure user data privacy and security"""
    
    @staticmethod
    def sanitize_content(content: str) -> str:
        """Remove potentially sensitive information"""
        patterns = [
            r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',  # Email
            r'\b(?:\d[ -]*?){13,16}\b',  # Credit card numbers
            r'\b[A-Za-z0-9]{20,}\b'      # API keys/tokens (basic heuristic)
        ]
        
        sanitized = content
        for pattern in patterns:
            sanitized = re.sub(pattern, '[REDACTED]', sanitized)
        
        return sanitized
    
    @staticmethod
    def validate_github_permissions(token: str, repository: str) -> bool:
        """Validate minimal required GitHub permissions"""
        # Implementation for permission validation
        pass

Local Processing

  • No External Data Transmission: Analysis happens locally except for user-initiated GitHub API calls
  • Configurable Privacy: Users control what data is processed and cached
  • Secure Defaults: No sensitive data logging or external analytics
  • Token Management: Users provide their own GitHub tokens with minimal required permissions

Deployment Architecture

MCP Server Configuration

{
  "vibe-check": {
    "type": "stdio",
    "command": "python",
    "args": ["-m", "vibe_check.server"],
    "env": {
      "PYTHONPATH": "/path/to/vibe-check-mcp/src",
      "GITHUB_TOKEN": "${GITHUB_TOKEN}",
      "VIBE_CHECK_DEV_MODE": "false"
    }
  }
}

Environment Configuration

class Config:
    """Environment-based configuration"""
    
    # GitHub Integration
    GITHUB_TOKEN = os.getenv("GITHUB_TOKEN")
    DEFAULT_REPOSITORY = os.getenv("VIBE_CHECK_DEFAULT_REPO", "kesslerio/vibe-check-mcp")
    
    # Claude CLI Integration  
    CLAUDE_CLI_NAME = os.getenv("CLAUDE_CLI_NAME", "claude")
    CLAUDE_CLI_TIMEOUT = int(os.getenv("CLAUDE_CLI_TIMEOUT", "60"))
    
    # Analysis Configuration
    CACHE_TTL_HOURS = int(os.getenv("VIBE_CHECK_CACHE_TTL", "1"))
    DEV_MODE = os.getenv("VIBE_CHECK_DEV_MODE", "false").lower() == "true"
    
    # Paths
    CACHE_DIR = Path(os.getenv("VIBE_CHECK_CACHE_DIR", "~/.vibe-check/cache"))
    DATA_DIR = Path(os.getenv("VIBE_CHECK_DATA_DIR", "./data"))

Monitoring & Observability

Local Analytics

class LocalMetrics:
    """Local performance monitoring without external transmission"""
    
    def log_analysis_performance(
        self, 
        tool_name: str, 
        duration: float, 
        patterns_found: int,
        success: bool
    ):
        """Log performance metrics for optimization"""
        metrics = {
            "timestamp": datetime.now().isoformat(),
            "tool": tool_name,
            "duration_seconds": duration,
            "patterns_detected": patterns_found,
            "success": success
        }
        
        self._append_to_metrics_log(metrics)
    
    def get_performance_summary(self, days: int = 7) -> dict:
        """Get performance summary for troubleshooting"""
        # Implementation for metrics aggregation
        pass

Health Checks

@mcp.tool()
def server_status() -> dict:
    """Get comprehensive server status and capabilities"""
    return {
        "status": "healthy",
        "version": get_version(),
        "capabilities": {
            "github_integration": bool(Config.GITHUB_TOKEN),
            "claude_cli_integration": check_claude_cli_available(),
            "pattern_detection": True,
            "educational_content": True
        },
        "performance": {
            "cache_size": get_cache_size(),
            "patterns_loaded": len(get_loaded_patterns()),
            "uptime_seconds": get_uptime()
        }
    }

Future Architecture Considerations

Extensibility

  • Plugin Architecture: Support for custom pattern definitions
  • Language Support: Modular language analyzers for JavaScript, Go, Rust
  • Integration Points: Webhooks for CI/CD pipeline integration
  • Community Patterns: Framework for community-contributed patterns

Scalability

  • Distributed Analysis: Support for analyzing large codebases
  • Caching Optimization: Intelligent cache invalidation and compression
  • Resource Management: Memory-efficient analysis for large files
  • Async Processing: Non-blocking analysis pipeline for responsiveness

This technical architecture provides the foundation for a robust, scalable, and maintainable anti-pattern detection system that integrates seamlessly with Claude Code workflows while maintaining strong privacy and performance characteristics.