Chess Position Outcome Predictor

Small Python project to practice scikit-learn. It extracts features from chess positions, trains a Random Forest model, and compares it against a simple baseline predictor.

Description of the project

Goal: Predict the outcome of a chess position (white win, draw, black win) in any game.
Workflow:
1. (Optional) Extract positions from PGN files (e.g. from Lichess.org)
2. Compute features for each position
3. Evaluate feature importance using mutual information
4. Train a Random Forest model
5. Compare performance against a baseline predictor

I personally selected Random Forest because it is a robust algorithm. I just wanted to practice the main concepts of the scikit-learn library without digging too much into parameter hypertuning.

Installation

Clone the repository

   git clone https://github.com/Layyser/chess-feature-predictor.git
   cd chess-feature-predictor

Create and activate a virtual environment

   python3 -m venv venv
   source venv/bin/activate

Install dependencies

   pip install -r requirements.txt

Usage

1. (Optional) Extract features

To extract features, you need to download a PGN file for example in Lichess.org and execute extract_features.py

    python src/feature_extraction/extract_features.py 
    --pgn path/to/your/pgn
    --games 20000 
    --workers 8 
    --output data/processed/features.csv

NOTE: PGN files are around 300GB so consider using the dataset.csv provided in data/raw/dataset.csv instead of extracting the features from the PGN file

2. Compute feature importance

   python src/feature_selection/compute_importance.py 
   --input data/raw/dataset.csv

3. Train and evaluate model

Consider to modify the code and save the model if needed

   python src/modeling/train_model.py 
   --input data/raw/dataset.csv

4. Run baseline predictor to compare

   python src/baseline/baseline_predictor.py

Results

Test set accuracy: 0.8949

              precision    recall  f1-score   support
         0.0       0.89      0.90      0.89     54075
         1.0       0.97      0.79      0.87      6847
         2.0       0.89      0.91      0.90     56968

    accuracy                           0.89    117890
   macro avg       0.92      0.86      0.89    117890
weighted avg       0.90      0.89      0.89    117890

Dependencies

pandas
numpy
scikit-learn
python-chess

Next steps & ideas to explore

1. Hyperparameter tuning & model exploration

Test additional algorithms (e.g. XGBoost, LightGBM, neural networks)
Perform grid search or Bayesian optimization to fine‑tune parameters (e.g. tree depth, learning rate, number of estimators)
Use cross‑validation (k‑fold, stratified) to get more robust performance estimates

2. Build a more representative dataset

Aggregate PGN data over several months or years instead of a single month to ensure diversity
Filter out anomalous or low‑quality games (e.g. abandonments, ultra‑short games)

3. Advanced feature engineering

Design new board‑state features (e.g. king safety metrics, mobility scores...)
Encode move‑history information (e.g. repetition counts, move timestamps)
Incorporate opening classifications or engine evaluations as features

4. Handle class imbalance

Experiment with resampling techniques (SMOTE, ADASYN, under‑sampling)
Adjust class weights in models or use cost‑sensitive learning

5. Pipeline & reproducibility improvements

Implement a full sklearn pipeline that handles preprocessing, feature selection, and modeling in one workflow
Add unit tests for feature extraction and model evaluation

6. Deployment & user interface

Implement the model into my Light-Chess deployment

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data/raw		data/raw
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Chess Position Outcome Predictor

Description of the project

Installation

Usage

1. (Optional) Extract features

2. Compute feature importance

3. Train and evaluate model

4. Run baseline predictor to compare

Results

Dependencies

Next steps & ideas to explore

1. Hyperparameter tuning & model exploration

2. Build a more representative dataset

3. Advanced feature engineering

4. Handle class imbalance

5. Pipeline & reproducibility improvements

6. Deployment & user interface

License

About

Uh oh!

Releases

Packages

Languages

License

Layyser/chess-feature-predictor

Folders and files

Latest commit

History

Repository files navigation

Chess Position Outcome Predictor

Description of the project

Installation

Usage

1. (Optional) Extract features

2. Compute feature importance

3. Train and evaluate model

4. Run baseline predictor to compare

Results

Dependencies

Next steps & ideas to explore

1. Hyperparameter tuning & model exploration

2. Build a more representative dataset

3. Advanced feature engineering

4. Handle class imbalance

5. Pipeline & reproducibility improvements

6. Deployment & user interface

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages