leslieo2 · leslieo2 · Nov 8, 2025 · Nov 8, 2025
diff --git a/.env.template b/.env.template
@@ -1,5 +1,8 @@
 LLM_PROVIDER=deepseek
 
+# Logging
+LIEGRAPH_LOG_LEVEL=INFO
+
 # Core API keys
 OPENAI_API_KEY=your-key
 LANGSMITH_TRACING=true

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,209 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+LieGraph is an AI-powered implementation of the social deduction game "Who Is Spy" built with LangGraph. It features autonomous AI agents that use LLM reasoning to find the spy among them.
+
+- **Main Language**: Python 3.12+
+- **Core Framework**: LangGraph for workflow orchestration
+- **AI Integration**: LangChain with structured LLM outputs
+- **Frontend**: React 19.2 with LangGraph SDK
+
+## Development Commands
+
+### Initial Setup
+```bash
+# Install Python dependencies (uses uv package manager)
+uv sync
+
+# Create .env file from template
+cp .env.template .env
+# Edit .env with your API keys (OpenAI, DeepSeek, or OpenRouter)
+
+# Install frontend dependencies
+cd ui-web/frontend && npm install
+```
+
+### Running the Application
+```bash
+# Terminal 1: Start LangGraph backend (from project root)
+langgraph dev --config langgraph.json --port 8124 --allow-blocking
+
+# Terminal 2: Start React frontend (from ui-web/frontend)
+npm start
+
+# Access UI at: http://localhost:3000
+```
+
+### Testing
+```bash
+# Run all tests
+python -m pytest tests/ -v
+
+# Run specific test file
+python -m pytest tests/test_game_rules.py -v
+
+# Run specific test
+python -m pytest tests/test_game_rules.py::test_assign_roles -v
+```
+
+### Linting/Formatting
+```bash
+# Format Python code
+black src/ tests/
+```
+
+## Architecture Overview
+
+### LangGraph Workflow Flow
+```
+START → host_setup → host_stage_switch
+                        ↓
+                    speaking phase (sequential player_speech nodes)
+                        ↓
+                    voting phase (concurrent player_vote nodes)
+                        ↓
+                    check_votes_and_transition
+                        ↓
+                    host_result → (continue or END)
+```
+
+### Key Architectural Patterns
+
+1. **State Management**: TypedDict-based GameState with private state separation
+   - `GameState`: Shared public state (speech history, votes, game status)
+   - `HostState`: Private host mindset (invariant after setup)
+   - `PlayerState`: Private player mindsets (evolving beliefs about identities)
+
+2. **Concurrent Voting**: Multiple players vote in parallel using LangGraph reducers
+   - Reducer: `merge_votes` handles timestamp-based conflict resolution
+   - Each vote node is independent but writes to shared state
+
+3. **AI Strategy System** (`src/game/strategy/`):
+   - `strategy_core.py`: Main LLM coordination
+   - `builders/`: Context and prompt builders for speech/voting/inference
+   - `llm_schemas.py`: Pydantic models for structured LLM outputs
+
+4. **Agent Tools** (`src/game/agent_tools/`):
+   - `speech_tools.py`: Structured reasoning for speech generation
+   - `vote_tools.py`: Evidence-based voting decisions
+   - Uses TrustCall for reliable structured output extraction
+
+5 **Reducers**: State conflict resolution functions
+   - `merge_private_states`: Combine incremental mindset updates
+   - `merge_votes`: Handle concurrent vote submissions
+   - Use `add` for append-only collections (speeches, votes)
+
+6. **Conditional Routing**: `src/game/graph.py` uses dynamic edge routing
+   - `host_stage_switch`: Routes between speaking and voting phases
+   - Checks `current_speaker` and vote readiness
+   - Returns edge names for graph transitions
+
+7. **PyDict Structured Output**: Uses dict exports for serialization
+   - GameState uses `PyDict` export methods for LangGraph checkpoints
+   - Ensure all Pydantic models in state have proper serialization
+
+8 **Private State Updates**: Player/hod nodes return private state deltas
+   - Add "_" prefix: `{player_name: PlayerState}` → `{"_" + player_name: PlayerState}`
+   - Host returns `f"_{HOST_NAME}": HostState`
+   - Graph middleware merges private states using configured reducers
+
+9. **Channel Configuration**: Define channels for each node in graph
+   - Player nodes: `"_" + player_name` channels
+   - Host node: `"_" + HOST_NAME` channel
+   - Channels must match reducer keys
+
+## Configuration
+
+**LLM Configuration** (`.env`):
+- Supports OpenAI, DeepSeek, and OpenRouter providers
+- Set provider-specific API keys and model names
+- Example models: `gpt-4o-mini`, `deepseek-chat`, `anthropic/claude-sonnet-4.5`
+
+**Game Configuration** (`config.yaml`):
+- `player_count`: Number of players (3-8)
+- `vocabulary`: Word pairs for civilian/spy assignments
+- `player_names`: Pool of available player names
+- `metrics.enabled`: Toggle metrics collection on/off
+
+## Testing Strategy
+
+**Test Coverage** (50 tests across 6 modules):
+- `test_game_rules.py`: Core game logic, role assignment, win conditions
+- `test_state.py`: State management and reducer functions
+- `test_host_nodes.py`: Host node behavior and phase transitions
+- `test_player_nodes.py`: Player speech and voting nodes
+- `test_llm_strategy.py`: AI strategy builders and prompt generation
+- `agents/test_speech_tools.py`: LLM tool behavior and structured outputs
+
+**Key Testing Patterns**:
+- Use fixtures for common GameState configurations
+- Mock LLM responses for deterministic AI behavior tests
+- Test both sequential (speaking) and concurrent (voting) nodes
+- Verify private state updates and mindset evolution
+
+## Metrics and Quality Tracking
+
+**Built-in Metrics** (`src/game/metrics.py`):
+- Win balance tracking (civilian vs spy win rates)
+- Identification accuracy (role inference quality)
+- Speech diversity (lexical variety measurement)
+- Auto-saves to `logs/metrics/{game_id}.json`
+- Overall summary at `logs/metrics/overall.json`
+
+**Quality Scoring**:
+```python
+from src.game.metrics import metrics_collector
+
+# Get quality score
+deterministic_score = metrics_collector.compute_quality_score()
+
+# Or use LLM-based evaluation
+llm_score = metrics_collector.compute_quality_score(method="llm", llm=client)
+```
+
+**Metrics History**: Track prompts and configurations in `docs/metrics-history.md`
+
+## Common Development Tasks
+
+### Adding a New Game Phase
+1. Add node function to `src/game/nodes/`
+2. Register node in graph with `graph.add_node(node_name, node_function)`
+3. Add conditional routing logic in transition nodes
+4. Update state types if adding new fields
+
+### Modifying AI Strategy
+1. Update prompt builders in `src/game/strategy/builders/`
+2. Modify Pydantic schemas in `src/game/strategy/llm_schemas.py` if changing output structure
+3. Adjust strategy coordination in `src/game/strategy/strategy_core.py`
+4. Test with `pytest tests/test_llm_strategy.py`
+
+### Debugging Game Flow
+1. Enable LangSmith tracing: `LANGSMITH_TRACING=true` in `.env`
+2. Check LangGraph Studio at `http://localhost:8123`
+3. Review game logs in `logs/metrics/`
+4. Use `print()` in nodes to debug state (visible in LangGraph Studio traces)
+
+### Adding New Metrics
+1. Add metric collection hooks in `src/game/metrics.py`
+2. Update quality scoring computation
+3. Add metric tests in `tests/test_metrics_history.py`
+4. Document in `docs/metrics-history.md`
+
+### Working with Player-Specific Hooks (callbacks) for Metrics
+When implementing player-specific behaviors that need to track metrics per player:
+- Use the `metrics_collector.on_player_speech(player_name, is_spy, round_num, speech)` hook within player speech nodes to collect speech diversity metrics
+- Use the `metrics_collector.on_vote_cast()` hook in player vote nodes to collect voting pattern data.
+- Metrics collection respects the `metrics.enabled` flag in `config.yaml` and will be no-ops when metrics are disabled.
+
+## LangGraph Development Notes
+
+**Checkpointing**: State is automatically checkpointed between nodes - you don't need to manually persist
+
+**State Mutation**: Always return new state dicts rather than mutating existing state in nodes
+
+**Error Handling**: LangGraph nodes should handle exceptions gracefully to prevent workflow crashes
+
+**See**: [ARCHITECTURE.md](ARCHITECTURE.md) for detailed system design and [README.md](README.md) for project overview
diff --git a/src/game/agent_tools/speech_tools.py b/src/game/agent_tools/speech_tools.py
@@ -13,13 +13,16 @@
 
 from langchain.tools import tool
 
+from src.game.logger import get_logger
 from src.game.state import GameState, PlayerMindset, alive_players
 from src.game.strategy.builders.prompt_builder import determine_clarity
 from src.game.strategy.serialization import normalize_mindset, to_plain_dict
 
 SelfBeliefDict = Dict[str, Any]
 SuspicionDict = Dict[str, Any]
 
+logger = get_logger(__name__)
+
 
 def speech_planning_tools(
     state: GameState,
@@ -157,12 +160,12 @@ def plan_speech() -> Dict[str, Any]:
             "top_suspicions": suspects_payload,
         }
 
-        print(
-            "🛠️ SPEECH PLAN TOOL:",
-            f"player={bound_player_id}",
-            f"round={current_round}",
-            f"clarity={clarity_code}",
-            f"goal={goal.get('label')}",
+        logger.info(
+            "Speech plan tool executed for %s round %d clarity=%s goal=%s",
+            bound_player_id,
+            current_round,
+            clarity_code,
+            goal.get("label"),
         )
 
         return plan

diff --git a/src/game/config.py b/src/game/config.py
@@ -22,6 +22,8 @@
 import yaml
 from pydantic import BaseModel, Field, ValidationError, model_validator
 
+from .logger import get_logger
+
 
 class ConfigurationError(RuntimeError):
     """Raised when configuration cannot be loaded or validated."""
@@ -256,12 +258,13 @@ def validate_config(self) -> bool:
                 raise ValueError("Name generation failed")
             return True
         except Exception as exc:
-            print(f"Configuration validation failed: {exc}")
+            logger.error("Configuration validation failed: %s", exc)
             return False
 
 
 # Global configuration instance
 _config_instance: GameConfig | None = None
+logger = get_logger(__name__)
 
 
 def get_config(config_path: str | Path | None = None) -> GameConfig:

diff --git a/src/game/graph.py b/src/game/graph.py
@@ -26,13 +26,16 @@
 from langgraph.constants import START
 from langgraph.graph import END, StateGraph
 
+from src.game.logger import get_logger
 from src.game.nodes.host import host_setup, host_stage_switch, host_result
 from src.game.nodes.player import player_speech, player_vote
 from src.game.nodes.transition import check_votes_and_transition
 from src.game.state import GameState, votes_ready, next_alive_player
 from src.tools import save_graph_image
 from src.game.config import get_config
 
+logger = get_logger(__name__)
+
 
 def route_from_stage(state: GameState) -> list[str] | str:
     """Route to appropriate nodes based on current game phase.
@@ -170,7 +173,7 @@ def build_workflow(config=None):
     # Generate player names based on configuration
     players = game_config.generate_player_names()
 
-    print(f"🎮 Building workflow with {len(players)} players: {players}")
+    logger.info("Building workflow with %d players: %s", len(players), players)
 
     return build_workflow_with_players(players)
 
@@ -183,10 +186,10 @@ def main():
     # Generate player names based on configuration
     players = config.generate_player_names()
 
-    print(f"Game Configuration:")
-    print(f"  Player count: {config.player_count}")
-    print(f"  Players: {players}")
-    print(f"  Vocabulary pairs: {len(config.vocabulary)}")
+    logger.info("Game configuration loaded")
+    logger.info("Player count: %d", config.player_count)
+    logger.debug("Players: %s", players)
+    logger.info("Vocabulary pairs: %d", len(config.vocabulary))
 
     # Build and run the workflow
     app = build_workflow_with_players(players)
@@ -206,7 +209,7 @@ async def _run_workflow():
         return await app.ainvoke(initial_state, config=langgraph_config)
 
     result = asyncio.run(_run_workflow())
-    print(result)
+    logger.info("Workflow result: %s", result)
 
 
 if __name__ == "__main__":

diff --git a/src/game/logger.py b/src/game/logger.py
@@ -0,0 +1,36 @@
+"""
+Centralized logging utilities for the LieGraph game engine.
+
+Ensures every module uses a consistent logger configuration, while
+still allowing runtime control via the ``LIEGRAPH_LOG_LEVEL`` env var.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+from typing import Optional
+
+_IS_CONFIGURED = False
+
+
+def _configure_logging() -> None:
+    """Configure the standard logging module once."""
+    global _IS_CONFIGURED
+    if _IS_CONFIGURED:
+        return
+
+    level_name = os.getenv("LIEGRAPH_LOG_LEVEL", "INFO").upper()
+    level = getattr(logging, level_name, logging.INFO)
+
+    logging.basicConfig(
+        level=level,
+        format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
+    )
+    _IS_CONFIGURED = True
+
+
+def get_logger(name: Optional[str] = None) -> logging.Logger:
+    """Return a configured logger scoped to the provided name."""
+    _configure_logging()
+    return logging.getLogger(name or "liegraph")