FORM: Learning Expressive and Transferable First-Order Logic Reward Machines

Abstract

Reward machines (RMs) are an effective approach for addressing non-Markovian rewards in reinforcement learning (RL) through finite-state machines. Traditional RMs, which label edges with propositional logic formulae, inherit the limited expressivity of propositional logic. This limitation hinders the learnability and transferability of RMs since complex tasks will require numerous states and edges. To overcome these challenges, we propose First-Order Reward Machines (FORMs), which use first-order logic to label edges, resulting in more compact and transferable RMs. We introduce a novel method for learning FORMs and a multi-agent formulation for exploiting them and facilitate their transferability, where multiple agents collaboratively learn policies for a shared FORM. Our experimental results demonstrate the scalability of FORMs with respect to traditional RMs. Specifically, we show that FORMs can be effectively learnt for tasks where traditional RM learning approaches fail. We also show significant improvements in learning speed and task transferability thanks to the multi-agent learning framework and the abstraction provided by the first-order language.

Setup

Python 3.10.12

# clone the repo
git clone https://github.com/leoardon/form.git
cd form

# create virtual environment
python -m venv .venv

# activate the environment
source .venv/bin/activate

# install dependencies
python -m pip install -U pip pypatch
python -m pip install -r requirements.txt

# apply patch
pypatch apply `pwd`/patches/ray.patch ray

ILASP installation

wget https://github.com/ilaspltd/ILASP-releases/releases/download/v4.4.0/ILASP-4.4.0-ubuntu.tar.gz

mkdir ILASP
tar -xvzf ILASP-4.4.0-ubuntu.tar.gz -C ./ILASP/
mv ./ILASP/ILASP .venv/bin/
rm -rf ILASP

Run

python src/run.py [--rm] [--rm_learning] [--prop] [--shared_policy] --env ["FOLRoom-AllYellow-2"|"FOLRoom-GreenButOne-NoLava"|"FOLRoom-Blue-AllYellow-7"|"FOLRoom-AllYellow-4"|"FOLRoom-AllYellow-6"]

the rm flag indicates whether to use PPO with RM or just PPO.
the rm_learning flag is used to learn the RM along with the policies.
the prop flag is used to learn a Propositional RM instead of a $\texttt{FORM}$.
the shared_policy flag is used to learn a RM using a CRM-like approach with a single policy learnt.
the env option is used to specify the environment.

Cite us

@inproceedings{ardonFORM25,
	title        = {{\texttt{FORM}: Learning Expressive and Transferable First-Order Logic Reward Machines}},
	author       = {Ardon, Leo and Furelos Blanco, Daniel and Para\'c, Roko and Russo, Alessandra},
	year         = 2025,
	booktitle    = {Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems},
	location     = {Detroit, U.S.A.},
	series       = {AAMAS '25},
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
imgs		imgs
patches		patches
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FORM: Learning Expressive and Transferable First-Order Logic Reward Machines

Abstract

Setup

Run

Cite us

About

Releases

Packages

Languages

License

leoardon/form

Folders and files

Latest commit

History

Repository files navigation

FORM: Learning Expressive and Transferable First-Order Logic Reward Machines

Abstract

Setup

Run

Cite us

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages