A Connect Four AI learning system using reinforcement learning. This project implements a reinforcement learning agent that improves through self-play using Deep Q-Networks (DQN).
- Implement Connect Four game environment ✓
- Create reinforcement learning agent that improves through self-play ✓
- Track and analyze training progress ✓
- Observe and replay AI games ✓
- Develop web interface to visualize learning and play against the AI (planned)
- Ensure compatibility with both Raspberry Pi and desktop platforms ✓
- Python 3
- PyTorch (neural networks and training)
- Gymnasium (reinforcement learning environment)
- Flask (future web interface)
ai-learning-connect4/
├── README.md
├── run.py # Main entry point
├── setup.py # Installation file
├── data/ # Data storage directory
│ ├── jobs.json # Training jobs tracking
│ ├── models.json # Model registry
│ ├── games/ # Saved games for replay
│ └── logs/ # Episode logs directory
├── connect4/ # Core package
│ ├── __init__.py
│ ├── debug.py # Debug and logging system
│ ├── utils.py # Common utilities
│ ├── game/ # Game core module
│ │ ├── __init__.py
│ │ ├── board.py # Game board representation
│ │ └── rules.py # Game rules and Gymnasium environment
│ ├── ai/ # AI module
│ │ ├── __init__.py
│ │ ├── dqn.py # DQN implementation
│ │ ├── replay_buffer.py # Experience replay buffer
│ │ ├── training.py # Self-play training framework
│ │ └── utils.py # AI-specific utilities
│ ├── interfaces/ # User interfaces
│ │ ├── __init__.py
│ │ └── cli.py # Command-line interface
│ └── data/ # Data management module
│ ├── __init__.py
│ └── data_manager.py # Training data management
├── models/ # Saved model checkpoints
└── tests/ # Unit tests
- Clone the repository:
git clone https://github.com/yourusername/ai-learning-connect4.git
cd ai-learning-connect4- Install dependencies:
pip install -e .Or install from requirements.txt:
pip install -r requirements.txtThe project provides a comprehensive command-line interface through run.py for interacting with the Connect Four game and AI.
Play Connect Four against a random AI:
python run.py game playPlay with two human players:
python run.py game play --ai noneTest a specific board position:
python run.py game test --position 0,0,0,0,1,1,1,0,2,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0Run all validation tests:
python run.py game test_allRun performance tests:
python run.py game benchmark --iterations 5000Initialize a new model:
python run.py ai init --hidden_size 128The --hidden_size parameter determines the neural network complexity:
- Higher values (256, 512) give more learning capacity but train slower
- Lower values (64, 128) train faster but may have less potential
- Default is 128, which works well for Connect Four
Play against a trained model:
python run.py ai play --model models/final_model_TIMESTAMPReplace TIMESTAMP with the actual timestamp from your saved model.
Train the AI through self-play:
python run.py ai train --episodes 1000 --debug_level infoTraining options:
--episodes: Number of games to play (higher = better learning but more time)--model: Continue training from an existing model--hidden_size: Neural network size if creating a new model--log_interval: How often to print updates (default: episodes/100)--debug_level: Control logging verbosity
Training guidance:
- 1,000-5,000 episodes: Basic learning (beginner level)
- 10,000-50,000 episodes: Intermediate strategies
- 100,000+ episodes: Advanced play (significant training time)
Continue training from an existing model:
python run.py ai train --episodes 2000 --model models/final_model_TIMESTAMPView all training jobs:
python run.py ai jobsView details of a specific job:
python run.py ai jobs --job_id 1Purge all data (requires confirmation):
python run.py ai purge --confirmList all saved games:
python run.py ai games --listList games for a specific job:
python run.py ai games --list --job_id 1Replay a specific game:
python run.py ai games --replay 0Control playback speed:
python run.py ai games --replay 5 --delay 1.0The system has several debug levels for controlling log output:
python run.py ai train --debug_level [level]Available levels:
none: No output (completely silent)error: Only error messageswarning: Errors and warningsinfo: Informational messages (default)debug: Detailed debugging informationtrace: Most verbose output
Or use the shorthand:
python run.py ai train --debugThis enables full debug output (equivalent to --debug_level debug).
The system tracks training data in JSON files:
- Jobs: Each training run is tracked as a job with parameters and progress
- Episode Logs: Training progress is stored in two formats:
- Recent logs: Full details for the last 1000 episodes
- Historical logs: Sampled data (1 in 50) for older episodes
- Game Replays: Move sequences from interesting games are saved for replay
- Model Registry: Tracks all saved model checkpoints
All data is stored in the data directory, making it portable and human-readable.
During training, the system tracks:
- Win rates for Player 1 and Player 2
- Draw rates
- Average game length
- Exploration rate (epsilon)
- Training loss values
These metrics help evaluate when the model has reached diminishing returns, which typically occurs when:
- Win rates stabilize over many episodes
- Draw rates increase and plateau
- Loss values flatten at a low level
The system intelligently saves game replays for:
- First and last games of training
- Regular milestone games (every 50 episodes)
- Random sampling (~5% of games)
- Unusually short or long games
- Games with interesting patterns
This allows you to observe how the AI's strategy evolves throughout training without excessive storage requirements.
Models are saved to the models directory with timestamps in their filenames:
- Periodic checkpoints during training (e.g.,
model_TIMESTAMP_ep100.pt) - Final model after training completes (e.g.,
final_model_TIMESTAMP.pt)
To start fresh with no previous models:
python run.py ai purge --confirm
python run.py ai init --hidden_size 128
python run.py ai train --episodes 1000The AI uses a Deep Q-Network (DQN) approach for reinforcement learning:
-
Neural Network Architecture:
- Input: 3-channel representation of the board state
- Convolutional layers to capture spatial patterns
- Fully connected hidden layer (configurable size)
- Output: Q-values for each possible column
-
Training Components:
- Experience Replay: Stores and randomly samples past experiences
- Target Network: Stabilizes learning with delayed updates
- Epsilon-Greedy Strategy: Balances exploration vs. exploitation
-
Self-Play Training Loop:
- Agent plays against itself, improving with each game
- Both sides share the same neural network
- Learning from wins, losses, and draws
-
Reward System:
- +1.0 for winning a game
- -1.0 for losing a game
- +0.1 for a draw
- -0.01 small penalty per move to encourage efficient play
- Web interface with Flask for visualizing training progress
- Configurable neural network architecture
- Advanced opponent models for improved training
- Model evaluation and comparison tools
- Training visualization and analysis
- Neural network architecture exploration
Common issues:
-
"ModuleNotFoundError": Make sure you've installed the package with
pip install -e . -
PyTorch errors: Ensure you have PyTorch installed correctly for your platform
-
"No module named 'gymnasium'": Install the Gymnasium package with
pip install gymnasium -
"No module named 'filelock'": Install the filelock package with
pip install filelock -
File permission errors: Ensure the data directory has write permissions
-
Empty job listings: Make sure at least one training job has been run
-
Model loading errors: Verify you're using the correct model path
MIT License
- The OpenAI Gymnasium framework for reinforcement learning environments
- PyTorch for neural network implementation