Awesome AI Agents for Machine Learning

A curated collection of 50+ open-source projects that use AI agents for machine learning research, training, and experimentation.

LLM-powered agents are fundamentally transforming ML research and engineering — from autonomous scientific discovery to automated Kaggle competitions. This list tracks the best open-source projects in this fast-moving space.

                         AI Agents for ML Landscape
    ┌─────────────────────────────────────────────────────────────────┐
    │                                                                 │
    │   Research          Training           Data Science             │
    │   ┌──────────┐     ┌──────────┐       ┌──────────┐             │
    │   │AI-Scien- │     │  AIDE    │       │  Deep-   │             │
    │   │tist, auto│     │  AutoML  │       │  Analyze │             │
    │   │research  │     │  Agent   │       │  DS-Agent│             │
    │   └────┬─────┘     └────┬─────┘       └────┬─────┘             │
    │        │                │                   │                   │
    │        └────────────────┼───────────────────┘                   │
    │                         │                                       │
    │              ┌──────────┴──────────┐                            │
    │              │   Agent Frameworks  │                            │
    │              │  AutoGen / CrewAI   │                            │
    │              │  MetaGPT / DSPy     │                            │
    │              └──────────┬──────────┘                            │
    │                         │                                       │
    │        ┌────────────────┼────────────────┐                      │
    │        │                │                │                      │
    │   ┌────┴─────┐    ┌────┴─────┐    ┌─────┴────┐                 │
    │   │  MLOps   │    │Benchmarks│    │ RL Agent │                 │
    │   │  MLflow  │    │ MLE-bench│    │ Training │                 │
    │   │  ZenML   │    │ ML-Bench │    │ rllm,R1  │                 │
    │   └──────────┘    └──────────┘    └──────────┘                 │
    │                                                                 │
    └─────────────────────────────────────────────────────────────────┘

Automated ML Research Agents — autonomous scientific discovery & paper generation
ML Training & Engineering Agents — model training, AutoML, Kaggle automation
Data Science Agents — data analysis, feature engineering, EDA
Agent Frameworks for ML — general frameworks powering ML workflows
RL for Training LLM Agents — reinforcement learning to train better agents
ML Agent Benchmarks & Evaluation — evaluating agent ML capabilities
MLOps & Platform Agents — deployment, monitoring, pipelines
Research Assistants & Paper Agents — paper search, reading, literature review
AI for Science — scientific discovery beyond ML
Software Engineering Agents — code agents applicable to ML
Key Trends (2024-2026)

Automated ML Research Agents

Agents that autonomously conduct ML research — from generating hypotheses to running experiments and writing papers.

Project	Stars	Description	Highlights
AI-Scientist	12.8k	First fully autonomous system for end-to-end scientific discovery. Published in Nature.	Idea generation -> experiment -> paper writing -> automated peer review
AI-Scientist-v2	2.7k	Enhanced version using agentic tree search. First AI-generated paper accepted at ICLR 2025 workshop.	Template-free, open-ended exploration across ML domains
autoresearch	58k	Karpathy's 630-line tool for autonomous ML experiments. Ran ~700 experiments, found ~20 genuine improvements cutting GPT-2 training time by 11%.	Single GPU, markdown-defined research, git-tracked results
AI-Researcher	5k	Autonomously identifies research gaps and executes full research pipeline. NeurIPS 2025 Spotlight.	Writer Agent for hierarchical paper generation, web GUI
AgentLaboratory	5.4k	End-to-end autonomous research workflow with specialized LLM agents. Introduced AgentRxiv preprint server.	Literature review -> experimentation -> report writing
Auto-Research	13	Framework for fully automated research agents across the entire scientific lifecycle.	Dual-layer memory, Docker/SSH sandbox, session persistence

What makes a good ML research agent?

The best ML research agents share these traits:

End-to-end automation: from idea to validated result
Tree search over hypotheses: exploring multiple directions, not just one linear path
Self-evaluation: automated review/critique of generated results
Reproducibility: git-tracked experiments with clear provenance

ML Training & Engineering Agents

Agents that automate model training, hyperparameter tuning, ML code generation, and experiment management.

Project	Stars	Description	Highlights
aideml (AIDE)	1.2k	ML engineering agent using tree-structured search over solution space. SOTA on Kaggle/MLE-Bench. ICLR 2025.	Surpasses 50% of Kaggle participants across 60+ competitions
ML-Agent	58	First LLM agent trained via online RL for autonomous ML engineering. 7B model outperforms DeepSeek-R1 (671B).	RL-based training, cross-task generalization
automl-agent	112	Multi-agent LLM framework for full-pipeline AutoML. ICML 2025.	Data retrieval -> preprocessing -> NAS -> deployment
autogluon-assistant	263	Multi-agent system (MLZero) for end-to-end multimodal ML automation. NeurIPS 2025.	6 gold medals on MLE-Bench Lite, works with 8B LLM
AutoKaggle	287	Multi-agent system with 5 specialized agents for automating Kaggle competitions.	Reader -> Planner -> Developer -> Reviewer -> Summarizer
FLAML	4.3k	Microsoft's fast library for AutoML and tuning. Economical automation for ML workflows.	Low cost, MLflow integration, foundation model tuning
DATAGEN	1.7k	AI-driven multi-agent assistant automating hypothesis generation, data analysis, and report writing.	LangChain + LangGraph, specialized agents, visualization

Data Science Agents

Agents for data analysis, feature engineering, data preprocessing, and end-to-end data science workflows.

Project	Stars	Description	Highlights
ai-data-science-team	5.1k	Library of specialized agents for data science workflows + AI Pipeline Studio.	Loading/cleaning/EDA/SQL/feature engineering agents
DeepAnalyze	1.9k	First end-to-end agentic LLM (8B) for autonomous data science. Analyst-grade reports.	Full DS pipeline, multi-format support (CSV/Excel/JSON/XML)
DS-Agent	231	Automated data science via LLMs with case-based reasoning. ICML 2024.	Case-based reasoning for pipeline construction
DataMind	73	Scalable agent training for generalist data-analytic agents. 14B model outperforms GPT-5. ICLR/AAAI 2026.	DataMind-12K trajectories, open-source 7B/14B models
LAMBDA	—	Large Model Based Data Agent. Published in Journal of the American Statistical Association (2025).	Statistical analysis powered by LLM

Agent Frameworks for ML

General-purpose agent frameworks widely used for building ML workflows and multi-agent ML systems.

Project	Stars	Description	Highlights
OpenHands	70k	AI-driven development platform. 72% on SWE-Bench Verified. ICLR 2025.	Agent SDK v1.0, sandboxed execution, MCP integration
MetaGPT	66k	Multi-agent framework with Data Interpreter achieving SOTA on ML tasks.	AFlow for automated workflow generation (ICLR 2025 oral)
autogen	56k	Microsoft's framework for agentic AI with multi-agent conversations.	Code execution, tool use, no-code Studio, .NET support
crewAI	47k	Framework for orchestrating role-playing autonomous AI agents.	Role-based design, A2A support, fast setup
dspy	33k	Framework for programming — not prompting — language models. Stanford.	Automatic prompt optimization, composable modules
langgraph	28k	Build resilient language agents as graphs with durable execution.	Stateful workflows, checkpointing, human-in-the-loop
camel	16.5k	First multi-agent framework. Role-playing collaboration. NeurIPS 2023.	OWL multi-agent, OASIS million-agent simulation
AutoAgent	8.7k	Fully-automated zero-code LLM agent framework.	Self-developing agent systems, auto orchestration

RL for Training LLM Agents

Using reinforcement learning to train better LLM agents for ML tasks and beyond.

Project	Stars	Description	Highlights
rllm	5.3k	Democratizing RL for LLMs. Agents beat models 50x their size.	GRPO/REINFORCE/RLOO, multi-GPU + single-machine
RLinf	2.9k	RL infrastructure for embodied and agentic AI.	PPO/GRPO/SAC, scalable to large GPU clusters
Agent-R1	1.3k	Training powerful LLM agents with end-to-end RL.	Multi-turn tool calling, process rewards per tool call
AgentGym-RL	650	Training LLM agents for long-horizon decision making via multi-turn RL.	Long-horizon task training, multi-turn RL
MARTI	467	Multi-agent reinforced training and inference. Tsinghua.	Tree search-augmented RL, multi-agent collaboration

ML Agent Benchmarks & Evaluation

Benchmarks and tools for evaluating how well AI agents perform ML tasks.

Project	Stars	Description	Highlights
mle-bench	1.4k	OpenAI's benchmark of 75 Kaggle competitions for ML engineering agents.	AIDE/MLAB/OpenHands scaffolds, pass@k evaluation
MLAgentBench	335	Stanford. 13 end-to-end ML experimentation tasks (CIFAR-10, BabyLM, etc.).	LangChain/AutoGPT agents, multi-LLM support
ML-Bench	316	Yale. Evaluating LLMs and agents on repository-level ML code.	Real-world ML codebase evaluation
mlrbench	24	201 tasks from ICLR/ICML/NeurIPS workshops for open-ended ML research evaluation.	MLR-Agent scaffold, MLR-Judge automated review

MLOps & Platform Agents

Agents and platforms for ML deployment, monitoring, and pipeline management.

Project	Stars	Description	Highlights
mlflow	25k	Open-source AI engineering platform. 30M+ monthly downloads.	Experiment tracking, agent tracing/evaluation/monitoring
opik	18.5k	Debug, evaluate, and monitor LLM apps and agentic workflows. Agent Optimizer SDK.	Comprehensive tracing, automated evaluations, self-hostable
metaflow	10k	Netflix's framework for data science and ML pipelines. Agentic support since 2025.	Recursive/conditional steps for agents, Kubernetes
zenml	5.3k	"One AI Platform from Pipelines to Agents." Run on any infrastructure.	Infrastructure-agnostic, MLflow/W&B integration
weave	—	W&B toolkit for developing, evaluating, and monitoring AI apps and agents.	LLM-as-judge, execution metrics, GenAI observability

Research Assistants & Paper Agents

Agents that help read, search, summarize, and manage ML research papers.

Project	Stars	Description	Highlights
gpt-researcher	26k	Autonomous deep research agent producing factual reports with citations.	Parallelized work, MCP server, multi-LLM support
open_deep_research	11k	LangChain's open-source deep research solution with Open Agent Platform UI.	Any LLM via init_chat_model, customizable MCP tools
pasa	1.4k	ByteDance's paper search agent. Surpasses Google Scholar by 37.78% in recall@20.	Autonomous search/read/reference selection, RL-optimized
openpaper	243	Research library workbench with AI assistant for literature review.	Annotation, AI-powered paper understanding

AI for Science

Agents designed for scientific research and discovery powered by ML.

Project	Stars	Description	Highlights
virtual-lab	652	Stanford's virtual lab of LLM agents for science. Nature (2025) — SARS-CoV-2 nanobody design.	PI agent + specialist team, AlphaFold/Rosetta integration
chemcrow-public	888	LLM agent with 18 chemistry tools for synthesis and drug discovery. Nature Machine Intelligence (2024).	RDKit/PubChem tools, autonomous synthesis planning
SciToolAgent	399	Agent framework integrating scientific tools via knowledge graph. Nature Computational Science (2025).	Planner/Executor/Summarizer, SciToolKG

Software Engineering Agents (ML-applicable)

Originally built for software engineering, increasingly used for ML codebases and research.

Project	Stars	Description	Highlights
SWE-agent	19k	Takes a GitHub issue and automatically fixes it. NeurIPS 2024.	Custom agent-computer interface, multi-LLM
mini-swe-agent	3.5k	100-line AI agent scoring >74% on SWE-bench Verified.	Minimal, hackable, high-performance
SWE-Gym	651	First environment for training real-world SWE agents. ICML 2025.	Training data generation for SWE agents
SWE-smith	606	Toolkit for scaling training data for SWE-agents. NeurIPS 2025 Spotlight.	Automated training data generation
Archon	190	Stanford. Architecture search for inference-time techniques. Outperforms GPT-4o by 11-15%.	Generators/fusers/critics/rankers/verifiers

Key Trends (2024-2026)

Trend	Signal	Representative Projects
Autonomous Research is Real	AI-generated papers pass peer review; agents find genuine ML improvements	AI-Scientist, autoresearch
Tree Search > Linear Pipelines	Tree-structured exploration outperforms sequential approaches	AI-Scientist-v2, AIDE
RL-Trained Agents Scale Down	Small RL-trained agents outperform 50-100x larger models	ML-Agent (7B > 671B), rllm
Multi-Agent = ML Teams	Specialized agent roles mirror real research team dynamics	MetaGPT, AutoGen, AutoML-Agent
Benchmarks Maturing	Standardized evaluation from Kaggle to open-ended research	MLE-bench, MLR-Bench
Code Agents + ML Converge	SWE agents increasingly applied to ML research & debugging	OpenHands, SWE-agent

Star History

If you find this collection useful, please consider giving it a star!

Contributing

Contributions are welcome! Please read the contributing guidelines before submitting a PR.

To add a project:

Ensure it is open-source and related to AI agents for ML
Add it to the appropriate category in README.md
Include: project link, stars, description, and highlights
Submit a pull request

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome AI Agents for Machine Learning

Contents

Automated ML Research Agents

ML Training & Engineering Agents

Data Science Agents

Agent Frameworks for ML

RL for Training LLM Agents

ML Agent Benchmarks & Evaluation

MLOps & Platform Agents

Research Assistants & Paper Agents

AI for Science

Software Engineering Agents (ML-applicable)

Key Trends (2024-2026)

Star History

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Awesome AI Agents for Machine Learning

Contents

Automated ML Research Agents

ML Training & Engineering Agents

Data Science Agents

Agent Frameworks for ML

RL for Training LLM Agents

ML Agent Benchmarks & Evaluation

MLOps & Platform Agents

Research Assistants & Paper Agents

AI for Science

Software Engineering Agents (ML-applicable)

Key Trends (2024-2026)

Star History

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages