Formation: Multi-Agent Reinforcement Learning (TorchRL)

About the project

This project implements a multi-agent reinforcement learning (MARL) system using TorchRL to train agents to form specific geometric shapes. The agents learn cooperative behavior through PPO (Proximal Policy Optimization) to achieve formation control.

Key Features

Multiple Shape Support: Circle, Polygon, and Star formations with dynamic reconfiguration
Assignment Strategies: Hungarian and Greedy assignment strategies for optimal agent-target matching
Reward Functions: Support for both SDF-based (shape boundary) and assignment-based (target position) rewards
Multi-Shape Scenes: Agents can be assigned to multiple different shapes simultaneously
Visualization: Real-time rendering and GIF generation of trained policies
Testing & CI/CD: Comprehensive test suite with automated pipelines

How It Works

Environment: Agents are placed in an arena and receive observations of their relative positions
Target Formation: A geometric shape defines the desired formation
Assignment: Agents are assigned to specific target positions using Hungarian or Greedy algorithms
Training: PPO trains agents to move toward their assigned positions while respecting arena boundaries
Evaluation: Trained models can be visualized and evaluated on formation accuracy

Technologies

TorchRL: Multi-agent reinforcement learning framework
PyTorch: Deep learning backend
Hydra: Configuration management
Weights & Biases: Experiment tracking and visualization

Setup

Make sure you have python version 3.11 or at least 3.10
Check by running python --version. If you have older version please update.
Create virtual environment and activate it

python -m venv .venv && source .venv/bin/activate

Upgrade pip

python -m pip install --upgrade pip

Install runtime dependencies

pip install -r requirements.txt

Usage

Training

To train agents on a formation task, use:

python main.py

The default configuration trains agents to form a circle. Output including training metrics and model checkpoints are logged to W&B.

Configuration

Training behavior is controlled through YAML config files in the configs/ directory:

configs/base/main_setup.yaml: Global settings (device, seed, project name)
configs/algo/ppo.yaml: PPO algorithm hyperparameters (learning rate, epochs, clip epsilon)
configs/env/formation.yaml: Environment settings (num_agents, arena_size, shape_type)
configs/experiment/default_exp.yaml: Experiment configuration (combines all above)

Defining Shapes

Shapes are defined in configs/env/formation.yaml. Each shape type has specific parameters:

Circle Formation

shape_type: circle
circle:
  center: [0.0, 0.0]    # Center coordinates [x, y]
  radius: 2.0           # Circle radius

Polygon Formation

shape_type: polygon
polygon:
  vertices: [           # List of [x, y] vertices
    [-2.0, -2.0],
    [2.0, -2.0],
    [2.0, 2.0],
    [-2.0, 2.0]
  ]

Supports both convex and non-convex polygons. Agents are distributed evenly along the perimeter.

Star Formation

shape_type: star
star:
  center: [0.0, 0.0]    # Center coordinates
  r1: 1.0               # Inner radius
  r2: 2.0               # Outer radius
  n_points: 5           # Number of star points

Multi-Shape Scenes with Reconfiguration

For complex scenarios with multiple shapes, use the multishape type:

shape_type: multishape

multishape:
  shapes:
    - type: circle
      center: [-3.0, 0.0]
      radius: 1.5
      agent_count: 5     # Agents assigned to this shape
    
    - type: polygon
      vertices: [
          [2.0, -2.0],
          [4.0, -2.0],
          [4.0, 2.0],
          [2.0, 2.0]
        ]
      agent_count: 5     # Remaining agents assigned here

# Dynamic reconfiguration (switch formations mid-episode)
reconfig_step: 200      # When should the reconfiguration happen
reconfig_shape:
  shape_type: multishape    # Shape defined to switch to
  multishape:
    - type: polygon
      vertices: [[-4.0, 0.0], [-2.0, 0.0], [-2.0, -2.0], [-4.0, -2.0]] 
      agent_count: 5
    - type: circle
      center: [3.0, 0.0]
      radius: 1.5
      agent_count: 5

Assignment Strategies

Choose how agents are assigned to target positions:

# Hungarian algorithm (optimal but slower)
assignment_method: "hungarian"

# Greedy algorithm (faster, near-optimal)
assignment_method: "greedy"

Example Configurations

Circle with Hungarian Assignment

shape_type: "circle"
circle:
  center: [0.0, 0.0]
  radius: 2.0
assignment_method: "hungarian"
num_agents: 10

Multi-Shape with Reconfiguration

shape_type: multishape
num_agents: 20

multishape:
  shapes:
    - type: circle
      center: [-2.0, 0.0]
      radius: 1.5
      agent_count: 10
    - type: star
      center: [2.0, 0.0]
      r1: 0.8
      r2: 1.8
      n_points: 5
      agent_count: 10

reconfig_shape:
  shape_type: multishape
  multishape:
    - type: polygon
      vertices: [[0, -2], [2, 0], [0, 2], [-2, 0]]
      agent_count: 10
    - type: circle
      center: [0.0, 0.0]
      radius: 2.0
      agent_count: 10

Visualization

After training, visualize the learned policy using:

python visualize.py

This script:

Loads the most recent trained model from W&B
Runs the policy in the environment for several episodes
Renders real-time visualization of agents forming the target shape
Generates a GIF of the formation process
Displays formation accuracy and episode metrics

Example output GIF:

Metrics & Evaluation

There are 3 evaluation metrics after training. The evaluation is run at the end of training in main.py via evaluate_with_metrics(...) (see src/rollout/evaluator.py and src/rollout/metrics.py) and is logged to W&B.

Implemented metrics:

Boundary Error (SDF distance to the target boundary)
- Mean / max boundary error over agents
- Percent of agents considered "on boundary"
Uniformity (nearest-neighbor distance statistics)
- Mean / std of nearest-neighbor distances
- Coefficient of variation (std / mean)
Collisions (collision of agents)
- Collision count
- Collision rate as percent of colliding pairs

W&B keys (logged at the end of training):

Evaluation/Boundary_Error_Mean, Evaluation/Boundary_Error_Max, Evaluation/Agents_On_Boundary_Pct
Evaluation/Uniformity_Mean, Evaluation/Uniformity_Std, Evaluation/Uniformity_Coefficient
Evaluation/Collision_Count_Mean, Evaluation/Collision_Rate_Pct

Ablations

Two Hydra multirun ablations are included under configs/experiment/.

Learning rate ablation (configs/experiment/sweep_lr.yaml)

python main.py -m -cn experiment/sweep_lr

This sweep fixes seeds (base.seed: 0,1,2), fixes assignment to Hungarian (env.assignment_method: hungarian), and varies algo.lr.

Assignment method ablation (configs/experiment/sweep_assign.yaml)

python main.py -m -cn experiment/sweep_assign

This sweep fixes seeds (base.seed: 0,1,2) and varies env.assignment_method (greedy,hungarian).

Hydra writes sweep outputs under:

multirun/YYYY-MM-DD/HH-MM-SS/<job_num>/...

Ablation analysis

For quick aggregation across seeds and variants, use the included analysis script. The --sweep-id should match the Hydra sweep directory (the multirun/.../HH-MM-SS folder).

Learning rate sweep (group by algo.lr):

python analyze_ablations.py --group algo.lr --sweep-id "multirun/YYYY-MM-DD/HH-MM-SS"

Assignment method sweep (group by env.assignment_method):

python analyze_ablations.py --group env.assignment_method --sweep-id "multirun/YYYY-MM-DD/HH-MM-SS"

This prints mean/std for the tracked metrics and writes a runs.csv at the repository root.

Charts

To create a chart from the latest runs.csv use:

python scripts/plot_runs_csv.py --csv runs.csv --group algo.lr

You can also pass --title to override the figure title, and --exclude-reward to plot only evaluation metrics.

This writes a PNG under docs/charts/ (default: docs/charts/ablation_algo.lr.png).

Example ablation plot:

Notes:

The arrows next to subplot titles indicate direction of better: ↓ lower is better, ↑ higher is better.

Reproducibility pack

Hydra configs (experiments + sweeps)

configs/experiment/default_exp.yaml
configs/experiment/sweep_lr.yaml
configs/experiment/sweep_assign.yaml
configs/env/formation.yaml
configs/algo/ppo.yaml
configs/base/main_setup.yaml

Dockerfile (build + run)

Build:

docker build -f docker/Dockerfile -t formation-task .

Run training in the container:

docker run --rm formation-task

If you want to avoid W&B login inside Docker, run in offline mode:

docker run --rm -e WANDB_MODE=offline formation-task

Unit / smoke tests

Tests cover environment initialization/reset/step, SDF sanity check, assignment behavior, and PPO actor output.

Run with pytest:

pytest -q

Or run unittest discovery:

python -m unittest discover -s test

Running tests

python -m unittest discover -s test

Work distribution

Márk Baricz

SDF interface, including support for three shapes (circle, polygon, star)
Redesign of the observations and rewards with SDF terms
Render support for the new shapes
Fixing the visualizer script and GIF generation (visualizing target shapes and positions, loading trained model)
Implementation of the Hungarian and Greedy assignment strategies
Support for multi-shape scenes
Support for dynamic reconfiguration mid-episode (even with multi-shape scenes)
Fixing the tests and CI/CD pipelines

Sipos Richard

Implementation of evaluation metrics (Boundary Error / Collisions / Uniformity) + unit tests
Hydra multirun support for sweeps and reproducibility
Sweep configurations for learning-rate and assignment-method ablations
Ablation analysis producing aggregated metrics + saving results to CSV
Plotting script to visualize differences between ablations
Docker packaging fixes (build context / run command / large-folder issues)
Windows-specific test fixes

Assignment checklist

Task 1: Core Functionality 20 pts
- SDF interface and three shape families (including one non-convex) 10 pts
- Observation and reward redesign with SDF terms (distance, normal/tangent) 5 pts
- Success criterion and renderer support for new shapes 5 pts
Task 2: Assignment Strategies 10 pts
- Implement and compare two strategies (Hungarian periodic vs. distributed greedy) 10 pts
Task 3: Scenarios 10 pts
- One multi-shape scene with required allocation 5 pts
- One dynamic reconfiguration scenario mid-episode 5 pts
Task 4: Metrics and Evaluation 10 pts
- Implement and report three chosen metrics from the following: Boundary Error, Uniformity, Time-to-Form, Collisions, Generalization, Reconfiguration Time 10 pts
Task 5: Ablations 10 pts
- Run at least two ablations (e.g., geometry features, assignment, reward shaping, curriculum) with fixed seeds and analysis 5+5 pts
Task 6: Reproducibility Pack 6 pts
- Hydra configs for experiments and sweeps 2 pts
- Dockerfile builds and runs training/evaluation 2 pts
- Two unit/smoke tests (SDF, assignment, env reset/step) 2 pts
Task 7: Reporting Quality 4 pts
- README with quick start, experiment matrix, plots/tables, and failure analysis 4 pts
Bonus up to +10 pts
- Robustness tests with sensor noise or actuation delay +5 pts
- Additional research feature (e.g., communication, curriculum, new non-convex shape family) +5 pts

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github/workflows		.github/workflows
configs		configs
docker		docker
docs/charts		docs/charts
scripts		scripts
src		src
test		test
.dockerignore		.dockerignore
.flake8		.flake8
.gitignore		.gitignore
CI_assignment2_Mark_Ricsi.pdf		CI_assignment2_Mark_Ricsi.pdf
CI_assignment2_Mark_Ricsi.pptx		CI_assignment2_Mark_Ricsi.pptx
README.md		README.md
analyze_ablations.py		analyze_ablations.py
formation.gif		formation.gif
main.py		main.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
runs.csv		runs.csv
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Formation: Multi-Agent Reinforcement Learning (TorchRL)

About the project

Key Features

How It Works

Technologies

Setup

Usage

Training

Configuration

Defining Shapes

Multi-Shape Scenes with Reconfiguration

Assignment Strategies

Example Configurations

Visualization

Metrics & Evaluation

Ablations

Ablation analysis

Charts

Reproducibility pack

Running tests

Work distribution

Márk Baricz

Sipos Richard

Assignment checklist

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Formation: Multi-Agent Reinforcement Learning (TorchRL)

About the project

Key Features

How It Works

Technologies

Setup

Usage

Training

Configuration

Defining Shapes

Multi-Shape Scenes with Reconfiguration

Assignment Strategies

Example Configurations

Visualization

Metrics & Evaluation

Ablations

Ablation analysis

Charts

Reproducibility pack

Running tests

Work distribution

Márk Baricz

Sipos Richard

Assignment checklist

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages