The codebase has only been tested on Linux, specifically WSL2 running Ubuntu.
- Python 3.13 (free-threaded) - For optimal performance with multithreading
- uv - Fast Python package manager
- CMake - For compiling VAL submodule
- System packages:
build-essential,graphviz,graphviz-dev
- numberlink - For numberlink problem generation
- VAL - PDDL plan validator
curl -LsSf https://astral.sh/uv/install.sh | shgit clone --recursive https://github.com/yourusername/nl3pddl.git
cd nl3pddlIf you already cloned without submodules:
git submodule update --init --recursivesudo apt-get update
sudo apt-get install -y cmake build-essential graphviz graphviz-devCreate a virtual environment with Python 3.13 free-threaded (no GIL):
uv venv --python 3.13tInstall Python dependencies:
uv synccd submodules/VAL
mkdir -p build
cd build
cmake ..
make
cd ../../../cp .env_template .envEdit the .env file and add your API keys for the LLM providers you want to use.
Make sure that you call, uv is weird with env variables from .env
export PYTHON_GIL=0Test that GIL is disabled (should print False):
uv run python scripts/test_gil_enabled.pyTest all problem generators:
uv run python run.py -tRun the main script to execute experiments and produce a results.csv file:
uv run python run.py -rModify experiment parameters in experiment_config.yaml to customize:
- Models to test
- Problem domains
- Description strategies
- Feedback pipelines
- Number of trials
After experiments complete, generate visualization plots:
uv run python run.py -pThis will use the latest results.csv file to create figures.
Test all problem generators (runs tests for sizes 1-10):
uv run python run.py -tTest a specific generator with default size (5):
uv run python run.py -t blocksTest a specific generator with custom size:
uv run python run.py -t blocks 8Available generators: blocks, bookseller, checkers-jumping, elevators, flow, miconic, etc.