Multi-Agent CooRdination Backbone with LLM Engine
MultiAgentBench is a modular and extensible framework designed to facilitate the development, testing, and evaluation of multi-agent systems leveraging Large Language Models (LLMs). It provides a structured environment where agents can interact within various simulated environments, utilizing cognitive abilities and communication mechanisms to perform tasks collaboratively or competitively.
- Modular Design: Easily extend or replace components like agents, environments, and LLM integrations.
- Multi-Agent Support: Model complex interactions between multiple agents with hierarchical or cooperative execution modes.
- LLM Integration: Interface with various LLM providers (OpenAI, etc.) through a unified API.
- Shared Memory: Implement shared memory mechanisms for agent communication and collaboration.
- Flexible Environments: Support for different simulated environments like web-based tasks.
- Metrics and Evaluation: Built-in evaluation metrics to assess agent performance on tasks.
- Industrial Coding Standards: High-quality, well-documented code adhering to industry best practices.
- Docker Support: Containerized setup for consistent deployment and easy experimentation.
Use a virtual environment, e.g. with anaconda3:
conda create -n marble python=3.10
conda activate marble
curl -sSL https://install.python-poetry.org | python3
export PATH="$HOME/.local/bin:$PATH"
Environment variables such as OPENAI_API_KEY
and Together_API_KEY
related configs are required to run the code. The recommended way to set all the required variable is
- Copy the
.env.template
file into the project root with the name.env
.
cp .env.template .env
- Fill the required environment variables in the
.env
file.
To run examples provided in the examples
:
poetry install
cd scripts
cd werewolf
bash run_simulation.sh
git checkout -b feature/feature-name
and PR to main
branch.
Run poetry run pytest
to make sure all tests pass (this will ensure dynamic typing passed with beartype) and poetry run mypy --config-file pyproject.toml .
to check static typing. (You can also run pre-commit run --all-files
to run all checks)
Check the github action result to make sure all tests pass. If not, fix the errors and push again.
Please cite the following paper if you find Marble helpful!
@misc{zhu2025multiagentbenchevaluatingcollaborationcompetition,
title={MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents},
author={Kunlun Zhu and Hongyi Du and Zhaochen Hong and Xiaocheng Yang and Shuyi Guo and Zhe Wang and Zhenhailong Wang and Cheng Qian and Xiangru Tang and Heng Ji and Jiaxuan You},
year={2025},
eprint={2503.01935},
archivePrefix={arXiv},
primaryClass={cs.MA},
url={https://arxiv.org/abs/2503.01935},
}