Skip to content

OpenJobsAI/awesome-ai-agents-for-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Awesome AI Agents for Machine Learning Awesome

License: MIT PRs Welcome Last Updated

A curated collection of 50+ open-source projects that use AI agents for machine learning research, training, and experimentation.

LLM-powered agents are fundamentally transforming ML research and engineering — from autonomous scientific discovery to automated Kaggle competitions. This list tracks the best open-source projects in this fast-moving space.

                         AI Agents for ML Landscape
    ┌─────────────────────────────────────────────────────────────────┐
    │                                                                 │
    │   Research          Training           Data Science             │
    │   ┌──────────┐     ┌──────────┐       ┌──────────┐             │
    │   │AI-Scien- │     │  AIDE    │       │  Deep-   │             │
    │   │tist, auto│     │  AutoML  │       │  Analyze │             │
    │   │research  │     │  Agent   │       │  DS-Agent│             │
    │   └────┬─────┘     └────┬─────┘       └────┬─────┘             │
    │        │                │                   │                   │
    │        └────────────────┼───────────────────┘                   │
    │                         │                                       │
    │              ┌──────────┴──────────┐                            │
    │              │   Agent Frameworks  │                            │
    │              │  AutoGen / CrewAI   │                            │
    │              │  MetaGPT / DSPy     │                            │
    │              └──────────┬──────────┘                            │
    │                         │                                       │
    │        ┌────────────────┼────────────────┐                      │
    │        │                │                │                      │
    │   ┌────┴─────┐    ┌────┴─────┐    ┌─────┴────┐                 │
    │   │  MLOps   │    │Benchmarks│    │ RL Agent │                 │
    │   │  MLflow  │    │ MLE-bench│    │ Training │                 │
    │   │  ZenML   │    │ ML-Bench │    │ rllm,R1  │                 │
    │   └──────────┘    └──────────┘    └──────────┘                 │
    │                                                                 │
    └─────────────────────────────────────────────────────────────────┘

Contents


Automated ML Research Agents

Agents that autonomously conduct ML research — from generating hypotheses to running experiments and writing papers.

Project Stars Description Highlights
AI-Scientist 12.8k First fully autonomous system for end-to-end scientific discovery. Published in Nature. Idea generation -> experiment -> paper writing -> automated peer review
AI-Scientist-v2 2.7k Enhanced version using agentic tree search. First AI-generated paper accepted at ICLR 2025 workshop. Template-free, open-ended exploration across ML domains
autoresearch 58k Karpathy's 630-line tool for autonomous ML experiments. Ran ~700 experiments, found ~20 genuine improvements cutting GPT-2 training time by 11%. Single GPU, markdown-defined research, git-tracked results
AI-Researcher 5k Autonomously identifies research gaps and executes full research pipeline. NeurIPS 2025 Spotlight. Writer Agent for hierarchical paper generation, web GUI
AgentLaboratory 5.4k End-to-end autonomous research workflow with specialized LLM agents. Introduced AgentRxiv preprint server. Literature review -> experimentation -> report writing
Auto-Research 13 Framework for fully automated research agents across the entire scientific lifecycle. Dual-layer memory, Docker/SSH sandbox, session persistence
What makes a good ML research agent?

The best ML research agents share these traits:

  • End-to-end automation: from idea to validated result
  • Tree search over hypotheses: exploring multiple directions, not just one linear path
  • Self-evaluation: automated review/critique of generated results
  • Reproducibility: git-tracked experiments with clear provenance

ML Training & Engineering Agents

Agents that automate model training, hyperparameter tuning, ML code generation, and experiment management.

Project Stars Description Highlights
aideml (AIDE) 1.2k ML engineering agent using tree-structured search over solution space. SOTA on Kaggle/MLE-Bench. ICLR 2025. Surpasses 50% of Kaggle participants across 60+ competitions
ML-Agent 58 First LLM agent trained via online RL for autonomous ML engineering. 7B model outperforms DeepSeek-R1 (671B). RL-based training, cross-task generalization
automl-agent 112 Multi-agent LLM framework for full-pipeline AutoML. ICML 2025. Data retrieval -> preprocessing -> NAS -> deployment
autogluon-assistant 263 Multi-agent system (MLZero) for end-to-end multimodal ML automation. NeurIPS 2025. 6 gold medals on MLE-Bench Lite, works with 8B LLM
AutoKaggle 287 Multi-agent system with 5 specialized agents for automating Kaggle competitions. Reader -> Planner -> Developer -> Reviewer -> Summarizer
FLAML 4.3k Microsoft's fast library for AutoML and tuning. Economical automation for ML workflows. Low cost, MLflow integration, foundation model tuning
DATAGEN 1.7k AI-driven multi-agent assistant automating hypothesis generation, data analysis, and report writing. LangChain + LangGraph, specialized agents, visualization

Data Science Agents

Agents for data analysis, feature engineering, data preprocessing, and end-to-end data science workflows.

Project Stars Description Highlights
ai-data-science-team 5.1k Library of specialized agents for data science workflows + AI Pipeline Studio. Loading/cleaning/EDA/SQL/feature engineering agents
DeepAnalyze 1.9k First end-to-end agentic LLM (8B) for autonomous data science. Analyst-grade reports. Full DS pipeline, multi-format support (CSV/Excel/JSON/XML)
DS-Agent 231 Automated data science via LLMs with case-based reasoning. ICML 2024. Case-based reasoning for pipeline construction
DataMind 73 Scalable agent training for generalist data-analytic agents. 14B model outperforms GPT-5. ICLR/AAAI 2026. DataMind-12K trajectories, open-source 7B/14B models
LAMBDA Large Model Based Data Agent. Published in Journal of the American Statistical Association (2025). Statistical analysis powered by LLM

Agent Frameworks for ML

General-purpose agent frameworks widely used for building ML workflows and multi-agent ML systems.

Project Stars Description Highlights
OpenHands 70k AI-driven development platform. 72% on SWE-Bench Verified. ICLR 2025. Agent SDK v1.0, sandboxed execution, MCP integration
MetaGPT 66k Multi-agent framework with Data Interpreter achieving SOTA on ML tasks. AFlow for automated workflow generation (ICLR 2025 oral)
autogen 56k Microsoft's framework for agentic AI with multi-agent conversations. Code execution, tool use, no-code Studio, .NET support
crewAI 47k Framework for orchestrating role-playing autonomous AI agents. Role-based design, A2A support, fast setup
dspy 33k Framework for programming — not prompting — language models. Stanford. Automatic prompt optimization, composable modules
langgraph 28k Build resilient language agents as graphs with durable execution. Stateful workflows, checkpointing, human-in-the-loop
camel 16.5k First multi-agent framework. Role-playing collaboration. NeurIPS 2023. OWL multi-agent, OASIS million-agent simulation
AutoAgent 8.7k Fully-automated zero-code LLM agent framework. Self-developing agent systems, auto orchestration

RL for Training LLM Agents

Using reinforcement learning to train better LLM agents for ML tasks and beyond.

Project Stars Description Highlights
rllm 5.3k Democratizing RL for LLMs. Agents beat models 50x their size. GRPO/REINFORCE/RLOO, multi-GPU + single-machine
RLinf 2.9k RL infrastructure for embodied and agentic AI. PPO/GRPO/SAC, scalable to large GPU clusters
Agent-R1 1.3k Training powerful LLM agents with end-to-end RL. Multi-turn tool calling, process rewards per tool call
AgentGym-RL 650 Training LLM agents for long-horizon decision making via multi-turn RL. Long-horizon task training, multi-turn RL
MARTI 467 Multi-agent reinforced training and inference. Tsinghua. Tree search-augmented RL, multi-agent collaboration

ML Agent Benchmarks & Evaluation

Benchmarks and tools for evaluating how well AI agents perform ML tasks.

Project Stars Description Highlights
mle-bench 1.4k OpenAI's benchmark of 75 Kaggle competitions for ML engineering agents. AIDE/MLAB/OpenHands scaffolds, pass@k evaluation
MLAgentBench 335 Stanford. 13 end-to-end ML experimentation tasks (CIFAR-10, BabyLM, etc.). LangChain/AutoGPT agents, multi-LLM support
ML-Bench 316 Yale. Evaluating LLMs and agents on repository-level ML code. Real-world ML codebase evaluation
mlrbench 24 201 tasks from ICLR/ICML/NeurIPS workshops for open-ended ML research evaluation. MLR-Agent scaffold, MLR-Judge automated review

MLOps & Platform Agents

Agents and platforms for ML deployment, monitoring, and pipeline management.

Project Stars Description Highlights
mlflow 25k Open-source AI engineering platform. 30M+ monthly downloads. Experiment tracking, agent tracing/evaluation/monitoring
opik 18.5k Debug, evaluate, and monitor LLM apps and agentic workflows. Agent Optimizer SDK. Comprehensive tracing, automated evaluations, self-hostable
metaflow 10k Netflix's framework for data science and ML pipelines. Agentic support since 2025. Recursive/conditional steps for agents, Kubernetes
zenml 5.3k "One AI Platform from Pipelines to Agents." Run on any infrastructure. Infrastructure-agnostic, MLflow/W&B integration
weave W&B toolkit for developing, evaluating, and monitoring AI apps and agents. LLM-as-judge, execution metrics, GenAI observability

Research Assistants & Paper Agents

Agents that help read, search, summarize, and manage ML research papers.

Project Stars Description Highlights
gpt-researcher 26k Autonomous deep research agent producing factual reports with citations. Parallelized work, MCP server, multi-LLM support
open_deep_research 11k LangChain's open-source deep research solution with Open Agent Platform UI. Any LLM via init_chat_model, customizable MCP tools
pasa 1.4k ByteDance's paper search agent. Surpasses Google Scholar by 37.78% in recall@20. Autonomous search/read/reference selection, RL-optimized
openpaper 243 Research library workbench with AI assistant for literature review. Annotation, AI-powered paper understanding

AI for Science

Agents designed for scientific research and discovery powered by ML.

Project Stars Description Highlights
virtual-lab 652 Stanford's virtual lab of LLM agents for science. Nature (2025) — SARS-CoV-2 nanobody design. PI agent + specialist team, AlphaFold/Rosetta integration
chemcrow-public 888 LLM agent with 18 chemistry tools for synthesis and drug discovery. Nature Machine Intelligence (2024). RDKit/PubChem tools, autonomous synthesis planning
SciToolAgent 399 Agent framework integrating scientific tools via knowledge graph. Nature Computational Science (2025). Planner/Executor/Summarizer, SciToolKG

Software Engineering Agents (ML-applicable)

Originally built for software engineering, increasingly used for ML codebases and research.

Project Stars Description Highlights
SWE-agent 19k Takes a GitHub issue and automatically fixes it. NeurIPS 2024. Custom agent-computer interface, multi-LLM
mini-swe-agent 3.5k 100-line AI agent scoring >74% on SWE-bench Verified. Minimal, hackable, high-performance
SWE-Gym 651 First environment for training real-world SWE agents. ICML 2025. Training data generation for SWE agents
SWE-smith 606 Toolkit for scaling training data for SWE-agents. NeurIPS 2025 Spotlight. Automated training data generation
Archon 190 Stanford. Architecture search for inference-time techniques. Outperforms GPT-4o by 11-15%. Generators/fusers/critics/rankers/verifiers

Key Trends (2024-2026)

Trend Signal Representative Projects
Autonomous Research is Real AI-generated papers pass peer review; agents find genuine ML improvements AI-Scientist, autoresearch
Tree Search > Linear Pipelines Tree-structured exploration outperforms sequential approaches AI-Scientist-v2, AIDE
RL-Trained Agents Scale Down Small RL-trained agents outperform 50-100x larger models ML-Agent (7B > 671B), rllm
Multi-Agent = ML Teams Specialized agent roles mirror real research team dynamics MetaGPT, AutoGen, AutoML-Agent
Benchmarks Maturing Standardized evaluation from Kaggle to open-ended research MLE-bench, MLR-Bench
Code Agents + ML Converge SWE agents increasingly applied to ML research & debugging OpenHands, SWE-agent

Star History

If you find this collection useful, please consider giving it a star!

Contributing

Contributions are welcome! Please read the contributing guidelines before submitting a PR.

To add a project:

  1. Ensure it is open-source and related to AI agents for ML
  2. Add it to the appropriate category in README.md
  3. Include: project link, stars, description, and highlights
  4. Submit a pull request

License

MIT

About

A curated collection of 50+ open-source projects that use AI agents for machine learning research, training, and experimentation.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors