Skip to content

elte-collective-intelligence/student-formation

Repository files navigation

Formation: Multi-Agent Reinforcement Learning (TorchRL)

CI Docker codecov License: CC BY-NC-ND 4.0

About the project

This project implements a multi-agent reinforcement learning (MARL) system using TorchRL to train agents to form specific geometric shapes. The agents learn cooperative behavior through PPO (Proximal Policy Optimization) to achieve formation control.

Key Features

  • Multiple Shape Support: Circle, Polygon, and Star formations with dynamic reconfiguration
  • Assignment Strategies: Hungarian and Greedy assignment strategies for optimal agent-target matching
  • Reward Functions: Support for both SDF-based (shape boundary) and assignment-based (target position) rewards
  • Multi-Shape Scenes: Agents can be assigned to multiple different shapes simultaneously
  • Visualization: Real-time rendering and GIF generation of trained policies
  • Testing & CI/CD: Comprehensive test suite with automated pipelines

How It Works

  1. Environment: Agents are placed in an arena and receive observations of their relative positions
  2. Target Formation: A geometric shape defines the desired formation
  3. Assignment: Agents are assigned to specific target positions using Hungarian or Greedy algorithms
  4. Training: PPO trains agents to move toward their assigned positions while respecting arena boundaries
  5. Evaluation: Trained models can be visualized and evaluated on formation accuracy

Technologies

  • TorchRL: Multi-agent reinforcement learning framework
  • PyTorch: Deep learning backend
  • Hydra: Configuration management
  • Weights & Biases: Experiment tracking and visualization

Setup

  1. Make sure you have python version 3.11 or at least 3.10
    Check by running python --version. If you have older version please update.

  2. Create virtual environment and activate it

python -m venv .venv && source .venv/bin/activate
  1. Upgrade pip
python -m pip install --upgrade pip
  1. Install runtime dependencies
pip install -r requirements.txt

Usage

Training

To train agents on a formation task, use:

python main.py

The default configuration trains agents to form a circle. Output including training metrics and model checkpoints are logged to W&B.

Configuration

Training behavior is controlled through YAML config files in the configs/ directory:

  • configs/base/main_setup.yaml: Global settings (device, seed, project name)
  • configs/algo/ppo.yaml: PPO algorithm hyperparameters (learning rate, epochs, clip epsilon)
  • configs/env/formation.yaml: Environment settings (num_agents, arena_size, shape_type)
  • configs/experiment/default_exp.yaml: Experiment configuration (combines all above)

Defining Shapes

Shapes are defined in configs/env/formation.yaml. Each shape type has specific parameters:

Circle Formation

shape_type: circle
circle:
  center: [0.0, 0.0]    # Center coordinates [x, y]
  radius: 2.0           # Circle radius

Polygon Formation

shape_type: polygon
polygon:
  vertices: [           # List of [x, y] vertices
    [-2.0, -2.0],
    [2.0, -2.0],
    [2.0, 2.0],
    [-2.0, 2.0]
  ]

Supports both convex and non-convex polygons. Agents are distributed evenly along the perimeter.

Star Formation

shape_type: star
star:
  center: [0.0, 0.0]    # Center coordinates
  r1: 1.0               # Inner radius
  r2: 2.0               # Outer radius
  n_points: 5           # Number of star points

Multi-Shape Scenes with Reconfiguration

For complex scenarios with multiple shapes, use the multishape type:

shape_type: multishape

multishape:
  shapes:
    - type: circle
      center: [-3.0, 0.0]
      radius: 1.5
      agent_count: 5     # Agents assigned to this shape
    
    - type: polygon
      vertices: [
          [2.0, -2.0],
          [4.0, -2.0],
          [4.0, 2.0],
          [2.0, 2.0]
        ]
      agent_count: 5     # Remaining agents assigned here

# Dynamic reconfiguration (switch formations mid-episode)
reconfig_step: 200      # When should the reconfiguration happen
reconfig_shape:
  shape_type: multishape    # Shape defined to switch to
  multishape:
    - type: polygon
      vertices: [[-4.0, 0.0], [-2.0, 0.0], [-2.0, -2.0], [-4.0, -2.0]] 
      agent_count: 5
    - type: circle
      center: [3.0, 0.0]
      radius: 1.5
      agent_count: 5

Assignment Strategies

Choose how agents are assigned to target positions:

# Hungarian algorithm (optimal but slower)
assignment_method: "hungarian"

# Greedy algorithm (faster, near-optimal)
assignment_method: "greedy"

Example Configurations

Circle with Hungarian Assignment

shape_type: "circle"
circle:
  center: [0.0, 0.0]
  radius: 2.0
assignment_method: "hungarian"
num_agents: 10

Multi-Shape with Reconfiguration

shape_type: multishape
num_agents: 20

multishape:
  shapes:
    - type: circle
      center: [-2.0, 0.0]
      radius: 1.5
      agent_count: 10
    - type: star
      center: [2.0, 0.0]
      r1: 0.8
      r2: 1.8
      n_points: 5
      agent_count: 10

reconfig_shape:
  shape_type: multishape
  multishape:
    - type: polygon
      vertices: [[0, -2], [2, 0], [0, 2], [-2, 0]]
      agent_count: 10
    - type: circle
      center: [0.0, 0.0]
      radius: 2.0
      agent_count: 10

Visualization

After training, visualize the learned policy using:

python visualize.py

This script:

  • Loads the most recent trained model from W&B
  • Runs the policy in the environment for several episodes
  • Renders real-time visualization of agents forming the target shape
  • Generates a GIF of the formation process
  • Displays formation accuracy and episode metrics

Example output GIF:

Example formation rollout

Metrics & Evaluation

There are 3 evaluation metrics after training. The evaluation is run at the end of training in main.py via evaluate_with_metrics(...) (see src/rollout/evaluator.py and src/rollout/metrics.py) and is logged to W&B.

Implemented metrics:

  • Boundary Error (SDF distance to the target boundary)
    • Mean / max boundary error over agents
    • Percent of agents considered "on boundary"
  • Uniformity (nearest-neighbor distance statistics)
    • Mean / std of nearest-neighbor distances
    • Coefficient of variation (std / mean)
  • Collisions (collision of agents)
    • Collision count
    • Collision rate as percent of colliding pairs

W&B keys (logged at the end of training):

  • Evaluation/Boundary_Error_Mean, Evaluation/Boundary_Error_Max, Evaluation/Agents_On_Boundary_Pct
  • Evaluation/Uniformity_Mean, Evaluation/Uniformity_Std, Evaluation/Uniformity_Coefficient
  • Evaluation/Collision_Count_Mean, Evaluation/Collision_Rate_Pct

Ablations

Two Hydra multirun ablations are included under configs/experiment/.

  1. Learning rate ablation (configs/experiment/sweep_lr.yaml)
python main.py -m -cn experiment/sweep_lr

This sweep fixes seeds (base.seed: 0,1,2), fixes assignment to Hungarian (env.assignment_method: hungarian), and varies algo.lr.

  1. Assignment method ablation (configs/experiment/sweep_assign.yaml)
python main.py -m -cn experiment/sweep_assign

This sweep fixes seeds (base.seed: 0,1,2) and varies env.assignment_method (greedy,hungarian).

Hydra writes sweep outputs under:

  • multirun/YYYY-MM-DD/HH-MM-SS/<job_num>/...

Ablation analysis

For quick aggregation across seeds and variants, use the included analysis script. The --sweep-id should match the Hydra sweep directory (the multirun/.../HH-MM-SS folder).

Learning rate sweep (group by algo.lr):

python analyze_ablations.py --group algo.lr --sweep-id "multirun/YYYY-MM-DD/HH-MM-SS"

Assignment method sweep (group by env.assignment_method):

python analyze_ablations.py --group env.assignment_method --sweep-id "multirun/YYYY-MM-DD/HH-MM-SS"

This prints mean/std for the tracked metrics and writes a runs.csv at the repository root.

Charts

To create a chart from the latest runs.csv use:

python scripts/plot_runs_csv.py --csv runs.csv --group algo.lr

You can also pass --title to override the figure title, and --exclude-reward to plot only evaluation metrics.

This writes a PNG under docs/charts/ (default: docs/charts/ablation_algo.lr.png).

Example ablation plot:

Learning-rate ablation example

Notes:

  • The arrows next to subplot titles indicate direction of better: lower is better, higher is better.

Reproducibility pack

Hydra configs (experiments + sweeps)

  • configs/experiment/default_exp.yaml
  • configs/experiment/sweep_lr.yaml
  • configs/experiment/sweep_assign.yaml
  • configs/env/formation.yaml
  • configs/algo/ppo.yaml
  • configs/base/main_setup.yaml

Dockerfile (build + run)

Build:

docker build -f docker/Dockerfile -t formation-task .

Run training in the container:

docker run --rm formation-task

If you want to avoid W&B login inside Docker, run in offline mode:

docker run --rm -e WANDB_MODE=offline formation-task

Unit / smoke tests

Tests cover environment initialization/reset/step, SDF sanity check, assignment behavior, and PPO actor output.

  • Run with pytest:
pytest -q
  • Or run unittest discovery:
python -m unittest discover -s test

Running tests

python -m unittest discover -s test

Work distribution

Márk Baricz

  • SDF interface, including support for three shapes (circle, polygon, star)
  • Redesign of the observations and rewards with SDF terms
  • Render support for the new shapes
  • Fixing the visualizer script and GIF generation (visualizing target shapes and positions, loading trained model)
  • Implementation of the Hungarian and Greedy assignment strategies
  • Support for multi-shape scenes
  • Support for dynamic reconfiguration mid-episode (even with multi-shape scenes)
  • Fixing the tests and CI/CD pipelines

Sipos Richard

  • Implementation of evaluation metrics (Boundary Error / Collisions / Uniformity) + unit tests
  • Hydra multirun support for sweeps and reproducibility
  • Sweep configurations for learning-rate and assignment-method ablations
  • Ablation analysis producing aggregated metrics + saving results to CSV
  • Plotting script to visualize differences between ablations
  • Docker packaging fixes (build context / run command / large-folder issues)
  • Windows-specific test fixes

Assignment checklist

  • Task 1: Core Functionality 20 pts

    • SDF interface and three shape families (including one non-convex) 10 pts
    • Observation and reward redesign with SDF terms (distance, normal/tangent) 5 pts
    • Success criterion and renderer support for new shapes 5 pts
  • Task 2: Assignment Strategies 10 pts

    • Implement and compare two strategies (Hungarian periodic vs. distributed greedy) 10 pts
  • Task 3: Scenarios 10 pts

    • One multi-shape scene with required allocation 5 pts
    • One dynamic reconfiguration scenario mid-episode 5 pts
  • Task 4: Metrics and Evaluation 10 pts

    • Implement and report three chosen metrics from the following: Boundary Error, Uniformity, Time-to-Form, Collisions, Generalization, Reconfiguration Time 10 pts
  • Task 5: Ablations 10 pts

    • Run at least two ablations (e.g., geometry features, assignment, reward shaping, curriculum) with fixed seeds and analysis 5+5 pts
  • Task 6: Reproducibility Pack 6 pts

    • Hydra configs for experiments and sweeps 2 pts
    • Dockerfile builds and runs training/evaluation 2 pts
    • Two unit/smoke tests (SDF, assignment, env reset/step) 2 pts
  • Task 7: Reporting Quality 4 pts

    • README with quick start, experiment matrix, plots/tables, and failure analysis 4 pts
  • Bonus up to +10 pts

    • Robustness tests with sensor noise or actuation delay +5 pts
    • Additional research feature (e.g., communication, curriculum, new non-convex shape family) +5 pts

About

MARL research project, where agents collaboratively self-organize inside dynamically generated geometric patterns.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors