Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .env.template
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
LLM_PROVIDER=deepseek

# Logging
LIEGRAPH_LOG_LEVEL=INFO

# Core API keys
OPENAI_API_KEY=your-key
LANGSMITH_TRACING=true
Expand Down
209 changes: 209 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

LieGraph is an AI-powered implementation of the social deduction game "Who Is Spy" built with LangGraph. It features autonomous AI agents that use LLM reasoning to find the spy among them.

- **Main Language**: Python 3.12+
- **Core Framework**: LangGraph for workflow orchestration
- **AI Integration**: LangChain with structured LLM outputs
- **Frontend**: React 19.2 with LangGraph SDK

## Development Commands

### Initial Setup
```bash
# Install Python dependencies (uses uv package manager)
uv sync

# Create .env file from template
cp .env.template .env
# Edit .env with your API keys (OpenAI, DeepSeek, or OpenRouter)

# Install frontend dependencies
cd ui-web/frontend && npm install
```

### Running the Application
```bash
# Terminal 1: Start LangGraph backend (from project root)
langgraph dev --config langgraph.json --port 8124 --allow-blocking

# Terminal 2: Start React frontend (from ui-web/frontend)
npm start

# Access UI at: http://localhost:3000
```

### Testing
```bash
# Run all tests
python -m pytest tests/ -v

# Run specific test file
python -m pytest tests/test_game_rules.py -v

# Run specific test
python -m pytest tests/test_game_rules.py::test_assign_roles -v
```

### Linting/Formatting
```bash
# Format Python code
black src/ tests/
```

## Architecture Overview

### LangGraph Workflow Flow
```
START → host_setup → host_stage_switch
speaking phase (sequential player_speech nodes)
voting phase (concurrent player_vote nodes)
check_votes_and_transition
host_result → (continue or END)
```

### Key Architectural Patterns

1. **State Management**: TypedDict-based GameState with private state separation
- `GameState`: Shared public state (speech history, votes, game status)
- `HostState`: Private host mindset (invariant after setup)
- `PlayerState`: Private player mindsets (evolving beliefs about identities)

2. **Concurrent Voting**: Multiple players vote in parallel using LangGraph reducers
- Reducer: `merge_votes` handles timestamp-based conflict resolution
- Each vote node is independent but writes to shared state

3. **AI Strategy System** (`src/game/strategy/`):
- `strategy_core.py`: Main LLM coordination
- `builders/`: Context and prompt builders for speech/voting/inference
- `llm_schemas.py`: Pydantic models for structured LLM outputs

4. **Agent Tools** (`src/game/agent_tools/`):
- `speech_tools.py`: Structured reasoning for speech generation
- `vote_tools.py`: Evidence-based voting decisions
- Uses TrustCall for reliable structured output extraction

5 **Reducers**: State conflict resolution functions
- `merge_private_states`: Combine incremental mindset updates
- `merge_votes`: Handle concurrent vote submissions
- Use `add` for append-only collections (speeches, votes)

6. **Conditional Routing**: `src/game/graph.py` uses dynamic edge routing
- `host_stage_switch`: Routes between speaking and voting phases
- Checks `current_speaker` and vote readiness
- Returns edge names for graph transitions

7. **PyDict Structured Output**: Uses dict exports for serialization
- GameState uses `PyDict` export methods for LangGraph checkpoints
- Ensure all Pydantic models in state have proper serialization

8 **Private State Updates**: Player/hod nodes return private state deltas
- Add "_" prefix: `{player_name: PlayerState}` → `{"_" + player_name: PlayerState}`
- Host returns `f"_{HOST_NAME}": HostState`
- Graph middleware merges private states using configured reducers

9. **Channel Configuration**: Define channels for each node in graph
- Player nodes: `"_" + player_name` channels
- Host node: `"_" + HOST_NAME` channel
- Channels must match reducer keys

## Configuration

**LLM Configuration** (`.env`):
- Supports OpenAI, DeepSeek, and OpenRouter providers
- Set provider-specific API keys and model names
- Example models: `gpt-4o-mini`, `deepseek-chat`, `anthropic/claude-sonnet-4.5`

**Game Configuration** (`config.yaml`):
- `player_count`: Number of players (3-8)
- `vocabulary`: Word pairs for civilian/spy assignments
- `player_names`: Pool of available player names
- `metrics.enabled`: Toggle metrics collection on/off

## Testing Strategy

**Test Coverage** (50 tests across 6 modules):
- `test_game_rules.py`: Core game logic, role assignment, win conditions
- `test_state.py`: State management and reducer functions
- `test_host_nodes.py`: Host node behavior and phase transitions
- `test_player_nodes.py`: Player speech and voting nodes
- `test_llm_strategy.py`: AI strategy builders and prompt generation
- `agents/test_speech_tools.py`: LLM tool behavior and structured outputs

**Key Testing Patterns**:
- Use fixtures for common GameState configurations
- Mock LLM responses for deterministic AI behavior tests
- Test both sequential (speaking) and concurrent (voting) nodes
- Verify private state updates and mindset evolution

## Metrics and Quality Tracking

**Built-in Metrics** (`src/game/metrics.py`):
- Win balance tracking (civilian vs spy win rates)
- Identification accuracy (role inference quality)
- Speech diversity (lexical variety measurement)
- Auto-saves to `logs/metrics/{game_id}.json`
- Overall summary at `logs/metrics/overall.json`

**Quality Scoring**:
```python
from src.game.metrics import metrics_collector

# Get quality score
deterministic_score = metrics_collector.compute_quality_score()

# Or use LLM-based evaluation
llm_score = metrics_collector.compute_quality_score(method="llm", llm=client)
```

**Metrics History**: Track prompts and configurations in `docs/metrics-history.md`

## Common Development Tasks

### Adding a New Game Phase
1. Add node function to `src/game/nodes/`
2. Register node in graph with `graph.add_node(node_name, node_function)`
3. Add conditional routing logic in transition nodes
4. Update state types if adding new fields

### Modifying AI Strategy
1. Update prompt builders in `src/game/strategy/builders/`
2. Modify Pydantic schemas in `src/game/strategy/llm_schemas.py` if changing output structure
3. Adjust strategy coordination in `src/game/strategy/strategy_core.py`
4. Test with `pytest tests/test_llm_strategy.py`

### Debugging Game Flow
1. Enable LangSmith tracing: `LANGSMITH_TRACING=true` in `.env`
2. Check LangGraph Studio at `http://localhost:8123`
3. Review game logs in `logs/metrics/`
4. Use `print()` in nodes to debug state (visible in LangGraph Studio traces)

### Adding New Metrics
1. Add metric collection hooks in `src/game/metrics.py`
2. Update quality scoring computation
3. Add metric tests in `tests/test_metrics_history.py`
4. Document in `docs/metrics-history.md`

### Working with Player-Specific Hooks (callbacks) for Metrics
When implementing player-specific behaviors that need to track metrics per player:
- Use the `metrics_collector.on_player_speech(player_name, is_spy, round_num, speech)` hook within player speech nodes to collect speech diversity metrics
- Use the `metrics_collector.on_vote_cast()` hook in player vote nodes to collect voting pattern data.
- Metrics collection respects the `metrics.enabled` flag in `config.yaml` and will be no-ops when metrics are disabled.

## LangGraph Development Notes

**Checkpointing**: State is automatically checkpointed between nodes - you don't need to manually persist

**State Mutation**: Always return new state dicts rather than mutating existing state in nodes

**Error Handling**: LangGraph nodes should handle exceptions gracefully to prevent workflow crashes

**See**: [ARCHITECTURE.md](ARCHITECTURE.md) for detailed system design and [README.md](README.md) for project overview
15 changes: 9 additions & 6 deletions src/game/agent_tools/speech_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,16 @@

from langchain.tools import tool

from src.game.logger import get_logger
from src.game.state import GameState, PlayerMindset, alive_players
from src.game.strategy.builders.prompt_builder import determine_clarity
from src.game.strategy.serialization import normalize_mindset, to_plain_dict

SelfBeliefDict = Dict[str, Any]
SuspicionDict = Dict[str, Any]

logger = get_logger(__name__)


def speech_planning_tools(
state: GameState,
Expand Down Expand Up @@ -157,12 +160,12 @@ def plan_speech() -> Dict[str, Any]:
"top_suspicions": suspects_payload,
}

print(
"🛠️ SPEECH PLAN TOOL:",
f"player={bound_player_id}",
f"round={current_round}",
f"clarity={clarity_code}",
f"goal={goal.get('label')}",
logger.info(
"Speech plan tool executed for %s round %d clarity=%s goal=%s",
bound_player_id,
current_round,
clarity_code,
goal.get("label"),
)

return plan
Expand Down
5 changes: 4 additions & 1 deletion src/game/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
import yaml
from pydantic import BaseModel, Field, ValidationError, model_validator

from .logger import get_logger


class ConfigurationError(RuntimeError):
"""Raised when configuration cannot be loaded or validated."""
Expand Down Expand Up @@ -256,12 +258,13 @@ def validate_config(self) -> bool:
raise ValueError("Name generation failed")
return True
except Exception as exc:
print(f"Configuration validation failed: {exc}")
logger.error("Configuration validation failed: %s", exc)
return False


# Global configuration instance
_config_instance: GameConfig | None = None
logger = get_logger(__name__)


def get_config(config_path: str | Path | None = None) -> GameConfig:
Expand Down
15 changes: 9 additions & 6 deletions src/game/graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,16 @@
from langgraph.constants import START
from langgraph.graph import END, StateGraph

from src.game.logger import get_logger
from src.game.nodes.host import host_setup, host_stage_switch, host_result
from src.game.nodes.player import player_speech, player_vote
from src.game.nodes.transition import check_votes_and_transition
from src.game.state import GameState, votes_ready, next_alive_player
from src.tools import save_graph_image
from src.game.config import get_config

logger = get_logger(__name__)


def route_from_stage(state: GameState) -> list[str] | str:
"""Route to appropriate nodes based on current game phase.
Expand Down Expand Up @@ -170,7 +173,7 @@ def build_workflow(config=None):
# Generate player names based on configuration
players = game_config.generate_player_names()

print(f"🎮 Building workflow with {len(players)} players: {players}")
logger.info("Building workflow with %d players: %s", len(players), players)

return build_workflow_with_players(players)

Expand All @@ -183,10 +186,10 @@ def main():
# Generate player names based on configuration
players = config.generate_player_names()

print(f"Game Configuration:")
print(f" Player count: {config.player_count}")
print(f" Players: {players}")
print(f" Vocabulary pairs: {len(config.vocabulary)}")
logger.info("Game configuration loaded")
logger.info("Player count: %d", config.player_count)
logger.debug("Players: %s", players)
logger.info("Vocabulary pairs: %d", len(config.vocabulary))

# Build and run the workflow
app = build_workflow_with_players(players)
Expand All @@ -206,7 +209,7 @@ async def _run_workflow():
return await app.ainvoke(initial_state, config=langgraph_config)

result = asyncio.run(_run_workflow())
print(result)
logger.info("Workflow result: %s", result)


if __name__ == "__main__":
Expand Down
36 changes: 36 additions & 0 deletions src/game/logger.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
"""
Centralized logging utilities for the LieGraph game engine.

Ensures every module uses a consistent logger configuration, while
still allowing runtime control via the ``LIEGRAPH_LOG_LEVEL`` env var.
"""

from __future__ import annotations

import logging
import os
from typing import Optional

_IS_CONFIGURED = False


def _configure_logging() -> None:
"""Configure the standard logging module once."""
global _IS_CONFIGURED
if _IS_CONFIGURED:
return

level_name = os.getenv("LIEGRAPH_LOG_LEVEL", "INFO").upper()
level = getattr(logging, level_name, logging.INFO)

logging.basicConfig(
level=level,
format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
)
_IS_CONFIGURED = True


def get_logger(name: Optional[str] = None) -> logging.Logger:
"""Return a configured logger scoped to the provided name."""
_configure_logging()
return logging.getLogger(name or "liegraph")
Loading
Loading