This project implements a multi-agent reinforcement learning (MARL) system using TorchRL to train agents to form specific geometric shapes. The agents learn cooperative behavior through PPO (Proximal Policy Optimization) to achieve formation control.
- Multiple Shape Support: Circle, Polygon, and Star formations with dynamic reconfiguration
- Assignment Strategies: Hungarian and Greedy assignment strategies for optimal agent-target matching
- Reward Functions: Support for both SDF-based (shape boundary) and assignment-based (target position) rewards
- Multi-Shape Scenes: Agents can be assigned to multiple different shapes simultaneously
- Visualization: Real-time rendering and GIF generation of trained policies
- Testing & CI/CD: Comprehensive test suite with automated pipelines
- Environment: Agents are placed in an arena and receive observations of their relative positions
- Target Formation: A geometric shape defines the desired formation
- Assignment: Agents are assigned to specific target positions using Hungarian or Greedy algorithms
- Training: PPO trains agents to move toward their assigned positions while respecting arena boundaries
- Evaluation: Trained models can be visualized and evaluated on formation accuracy
- TorchRL: Multi-agent reinforcement learning framework
- PyTorch: Deep learning backend
- Hydra: Configuration management
- Weights & Biases: Experiment tracking and visualization
-
Make sure you have python version 3.11 or at least 3.10
Check by runningpython --version. If you have older version please update. -
Create virtual environment and activate it
python -m venv .venv && source .venv/bin/activate- Upgrade pip
python -m pip install --upgrade pip- Install runtime dependencies
pip install -r requirements.txtTo train agents on a formation task, use:
python main.pyThe default configuration trains agents to form a circle. Output including training metrics and model checkpoints are logged to W&B.
Training behavior is controlled through YAML config files in the configs/ directory:
configs/base/main_setup.yaml: Global settings (device, seed, project name)configs/algo/ppo.yaml: PPO algorithm hyperparameters (learning rate, epochs, clip epsilon)configs/env/formation.yaml: Environment settings (num_agents, arena_size, shape_type)configs/experiment/default_exp.yaml: Experiment configuration (combines all above)
Shapes are defined in configs/env/formation.yaml. Each shape type has specific parameters:
Circle Formation
shape_type: circle
circle:
center: [0.0, 0.0] # Center coordinates [x, y]
radius: 2.0 # Circle radiusPolygon Formation
shape_type: polygon
polygon:
vertices: [ # List of [x, y] vertices
[-2.0, -2.0],
[2.0, -2.0],
[2.0, 2.0],
[-2.0, 2.0]
]Supports both convex and non-convex polygons. Agents are distributed evenly along the perimeter.
Star Formation
shape_type: star
star:
center: [0.0, 0.0] # Center coordinates
r1: 1.0 # Inner radius
r2: 2.0 # Outer radius
n_points: 5 # Number of star pointsFor complex scenarios with multiple shapes, use the multishape type:
shape_type: multishape
multishape:
shapes:
- type: circle
center: [-3.0, 0.0]
radius: 1.5
agent_count: 5 # Agents assigned to this shape
- type: polygon
vertices: [
[2.0, -2.0],
[4.0, -2.0],
[4.0, 2.0],
[2.0, 2.0]
]
agent_count: 5 # Remaining agents assigned here
# Dynamic reconfiguration (switch formations mid-episode)
reconfig_step: 200 # When should the reconfiguration happen
reconfig_shape:
shape_type: multishape # Shape defined to switch to
multishape:
- type: polygon
vertices: [[-4.0, 0.0], [-2.0, 0.0], [-2.0, -2.0], [-4.0, -2.0]]
agent_count: 5
- type: circle
center: [3.0, 0.0]
radius: 1.5
agent_count: 5Choose how agents are assigned to target positions:
# Hungarian algorithm (optimal but slower)
assignment_method: "hungarian"
# Greedy algorithm (faster, near-optimal)
assignment_method: "greedy"Circle with Hungarian Assignment
shape_type: "circle"
circle:
center: [0.0, 0.0]
radius: 2.0
assignment_method: "hungarian"
num_agents: 10Multi-Shape with Reconfiguration
shape_type: multishape
num_agents: 20
multishape:
shapes:
- type: circle
center: [-2.0, 0.0]
radius: 1.5
agent_count: 10
- type: star
center: [2.0, 0.0]
r1: 0.8
r2: 1.8
n_points: 5
agent_count: 10
reconfig_shape:
shape_type: multishape
multishape:
- type: polygon
vertices: [[0, -2], [2, 0], [0, 2], [-2, 0]]
agent_count: 10
- type: circle
center: [0.0, 0.0]
radius: 2.0
agent_count: 10After training, visualize the learned policy using:
python visualize.pyThis script:
- Loads the most recent trained model from W&B
- Runs the policy in the environment for several episodes
- Renders real-time visualization of agents forming the target shape
- Generates a GIF of the formation process
- Displays formation accuracy and episode metrics
Example output GIF:
There are 3 evaluation metrics after training. The evaluation is run at the end of training in main.py via evaluate_with_metrics(...) (see src/rollout/evaluator.py and src/rollout/metrics.py) and is logged to W&B.
Implemented metrics:
- Boundary Error (SDF distance to the target boundary)
- Mean / max boundary error over agents
- Percent of agents considered "on boundary"
- Uniformity (nearest-neighbor distance statistics)
- Mean / std of nearest-neighbor distances
- Coefficient of variation (std / mean)
- Collisions (collision of agents)
- Collision count
- Collision rate as percent of colliding pairs
W&B keys (logged at the end of training):
Evaluation/Boundary_Error_Mean,Evaluation/Boundary_Error_Max,Evaluation/Agents_On_Boundary_PctEvaluation/Uniformity_Mean,Evaluation/Uniformity_Std,Evaluation/Uniformity_CoefficientEvaluation/Collision_Count_Mean,Evaluation/Collision_Rate_Pct
Two Hydra multirun ablations are included under configs/experiment/.
- Learning rate ablation (
configs/experiment/sweep_lr.yaml)
python main.py -m -cn experiment/sweep_lrThis sweep fixes seeds (base.seed: 0,1,2), fixes assignment to Hungarian (env.assignment_method: hungarian), and varies algo.lr.
- Assignment method ablation (
configs/experiment/sweep_assign.yaml)
python main.py -m -cn experiment/sweep_assignThis sweep fixes seeds (base.seed: 0,1,2) and varies env.assignment_method (greedy,hungarian).
Hydra writes sweep outputs under:
multirun/YYYY-MM-DD/HH-MM-SS/<job_num>/...
For quick aggregation across seeds and variants, use the included analysis script. The --sweep-id should match the Hydra sweep directory (the multirun/.../HH-MM-SS folder).
Learning rate sweep (group by algo.lr):
python analyze_ablations.py --group algo.lr --sweep-id "multirun/YYYY-MM-DD/HH-MM-SS"Assignment method sweep (group by env.assignment_method):
python analyze_ablations.py --group env.assignment_method --sweep-id "multirun/YYYY-MM-DD/HH-MM-SS"This prints mean/std for the tracked metrics and writes a runs.csv at the repository root.
To create a chart from the latest runs.csv use:
python scripts/plot_runs_csv.py --csv runs.csv --group algo.lrYou can also pass --title to override the figure title, and --exclude-reward to plot only evaluation metrics.
This writes a PNG under docs/charts/ (default: docs/charts/ablation_algo.lr.png).
Example ablation plot:
Notes:
- The arrows next to subplot titles indicate direction of better:
↓lower is better,↑higher is better.
Hydra configs (experiments + sweeps)
configs/experiment/default_exp.yamlconfigs/experiment/sweep_lr.yamlconfigs/experiment/sweep_assign.yamlconfigs/env/formation.yamlconfigs/algo/ppo.yamlconfigs/base/main_setup.yaml
Dockerfile (build + run)
Build:
docker build -f docker/Dockerfile -t formation-task .Run training in the container:
docker run --rm formation-taskIf you want to avoid W&B login inside Docker, run in offline mode:
docker run --rm -e WANDB_MODE=offline formation-taskUnit / smoke tests
Tests cover environment initialization/reset/step, SDF sanity check, assignment behavior, and PPO actor output.
- Run with pytest:
pytest -q- Or run unittest discovery:
python -m unittest discover -s testpython -m unittest discover -s test- SDF interface, including support for three shapes (circle, polygon, star)
- Redesign of the observations and rewards with SDF terms
- Render support for the new shapes
- Fixing the visualizer script and GIF generation (visualizing target shapes and positions, loading trained model)
- Implementation of the Hungarian and Greedy assignment strategies
- Support for multi-shape scenes
- Support for dynamic reconfiguration mid-episode (even with multi-shape scenes)
- Fixing the tests and CI/CD pipelines
- Implementation of evaluation metrics (Boundary Error / Collisions / Uniformity) + unit tests
- Hydra multirun support for sweeps and reproducibility
- Sweep configurations for learning-rate and assignment-method ablations
- Ablation analysis producing aggregated metrics + saving results to CSV
- Plotting script to visualize differences between ablations
- Docker packaging fixes (build context / run command / large-folder issues)
- Windows-specific test fixes
-
Task 1: Core Functionality 20 pts
- SDF interface and three shape families (including one non-convex) 10 pts
- Observation and reward redesign with SDF terms (distance, normal/tangent) 5 pts
- Success criterion and renderer support for new shapes 5 pts
-
Task 2: Assignment Strategies 10 pts
- Implement and compare two strategies (Hungarian periodic vs. distributed greedy) 10 pts
-
Task 3: Scenarios 10 pts
- One multi-shape scene with required allocation 5 pts
- One dynamic reconfiguration scenario mid-episode 5 pts
-
Task 4: Metrics and Evaluation 10 pts
- Implement and report three chosen metrics from the following: Boundary Error, Uniformity, Time-to-Form, Collisions, Generalization, Reconfiguration Time 10 pts
-
Task 5: Ablations 10 pts
- Run at least two ablations (e.g., geometry features, assignment, reward shaping, curriculum) with fixed seeds and analysis 5+5 pts
-
Task 6: Reproducibility Pack 6 pts
- Hydra configs for experiments and sweeps 2 pts
- Dockerfile builds and runs training/evaluation 2 pts
- Two unit/smoke tests (SDF, assignment, env reset/step) 2 pts
-
Task 7: Reporting Quality 4 pts
- README with quick start, experiment matrix, plots/tables, and failure analysis 4 pts
-
Bonus up to +10 pts
- Robustness tests with sensor noise or actuation delay +5 pts
- Additional research feature (e.g., communication, curriculum, new non-convex shape family) +5 pts

