From a866e9e89114122840bc09b7a0b6564c267deffc Mon Sep 17 00:00:00 2001 From: asdasd2323wxs Date: Thu, 14 May 2026 02:05:10 +0800 Subject: [PATCH 1/7] feat: v0.1.87 --- ROADMAP_v0.1.87.md => ROADMAP_v0.1.88.md | 0 massgen/__init__.py | 2 +- 2 files changed, 1 insertion(+), 1 deletion(-) rename ROADMAP_v0.1.87.md => ROADMAP_v0.1.88.md (100%) diff --git a/ROADMAP_v0.1.87.md b/ROADMAP_v0.1.88.md similarity index 100% rename from ROADMAP_v0.1.87.md rename to ROADMAP_v0.1.88.md diff --git a/massgen/__init__.py b/massgen/__init__.py index 15681da24..71f581d1a 100644 --- a/massgen/__init__.py +++ b/massgen/__init__.py @@ -86,7 +86,7 @@ from .message_templates import MessageTemplates, get_templates from .orchestrator import Orchestrator, create_orchestrator -__version__ = "0.1.86" +__version__ = "0.1.87" __author__ = "MassGen Contributors" From 27d6d97c22bbc10b7cd6dd0d74f576e63108273f Mon Sep 17 00:00:00 2001 From: ncrispino Date: Fri, 15 May 2026 07:58:34 -0700 Subject: [PATCH 2/7] docs: add CrewAI/LangGraph/AutoGen comparisons and llms.txt index Closes #1083: three new comparison pages under reference/comparisons/ covering MassGen vs CrewAI, LangGraph, and AutoGen/AG2. Updates the comparisons hub to drop the "coming soon" note and add a toctree. Closes #1082: publishes llms.txt (curated, llmstxt.org spec) and llms-full.txt (concatenated docs corpus) at the docs site root via html_extra_path and a Sphinx build-finished hook in conf.py. README and index.rst gain one-line pointers for AI agents and crawlers. Co-Authored-By: Claude Opus 4.7 --- README.md | 2 + README_PYPI.md | 2 + docs/source/_extra/llms.txt | 56 +++++++ docs/source/conf.py | 69 ++++++++ docs/source/index.rst | 4 + docs/source/reference/comparisons.rst | 20 ++- docs/source/reference/comparisons/autogen.rst | 147 ++++++++++++++++++ docs/source/reference/comparisons/crewai.rst | 132 ++++++++++++++++ .../reference/comparisons/langgraph.rst | 128 +++++++++++++++ 9 files changed, 554 insertions(+), 6 deletions(-) create mode 100644 docs/source/_extra/llms.txt create mode 100644 docs/source/reference/comparisons/autogen.rst create mode 100644 docs/source/reference/comparisons/crewai.rst create mode 100644 docs/source/reference/comparisons/langgraph.rst diff --git a/README.md b/README.md index 697855742..fbd50625d 100644 --- a/README.md +++ b/README.md @@ -218,6 +218,8 @@ This collaborative approach ensures that the final output leverages collective i --- > πŸ“– **Complete Documentation:** For comprehensive guides, API reference, and detailed examples, visit **[MassGen Official Documentation](https://docs.massgen.ai/)** +> +> πŸ€– **For AI agents:** A curated [`llms.txt`](https://docs.massgen.ai/en/latest/llms.txt) index and full [`llms-full.txt`](https://docs.massgen.ai/en/latest/llms-full.txt) dump are published with the docs ([llmstxt.org spec](https://llmstxt.org)). --- diff --git a/README_PYPI.md b/README_PYPI.md index f22188334..a1e1839ae 100644 --- a/README_PYPI.md +++ b/README_PYPI.md @@ -217,6 +217,8 @@ This collaborative approach ensures that the final output leverages collective i --- > πŸ“– **Complete Documentation:** For comprehensive guides, API reference, and detailed examples, visit **[MassGen Official Documentation](https://docs.massgen.ai/)** +> +> πŸ€– **For AI agents:** A curated [`llms.txt`](https://docs.massgen.ai/en/latest/llms.txt) index and full [`llms-full.txt`](https://docs.massgen.ai/en/latest/llms-full.txt) dump are published with the docs ([llmstxt.org spec](https://llmstxt.org)). --- diff --git a/docs/source/_extra/llms.txt b/docs/source/_extra/llms.txt new file mode 100644 index 000000000..1ed93a4ba --- /dev/null +++ b/docs/source/_extra/llms.txt @@ -0,0 +1,56 @@ +# MassGen + +> MassGen is a multi-agent framework that coordinates AI agents through redundancy and iterative refinement. Multiple agents tackle the same task in parallel, observe each other's progress, and vote to converge on the best answer. Agents can use tools (MCP), execute code, and read/write files in your project. + +This file follows the [llms.txt convention](https://llmstxt.org). It is a curated index of canonical MassGen documentation for AI agents and crawlers. A larger, concatenated dump of the same documentation lives at [`/llms-full.txt`](llms-full.txt). + +## Project + +- [Project README](https://github.com/Leezekun/MassGen): What MassGen is, screenshots, quickstart, the "How It Works" section. +- [AI_USAGE.md](https://github.com/Leezekun/MassGen/blob/main/AI_USAGE.md): How LLM agents should invoke MassGen via the CLI (always with `--automation`). +- [Changelog](changelog.html): Release notes for every MassGen version. + +## Quickstart + +- [Installation](quickstart/installation.html): Install MassGen via uv / pip. +- [Running MassGen](quickstart/running-massgen.html): First run, CLI, WebUI, automation mode. +- [Configuration](quickstart/configuration.html): YAML config structure, agents, backends, orchestrator settings. + +## User guide + +- [Core concepts](user_guide/concepts.html): Agents, orchestrator, voting, consensus, refinement. +- [Backends](user_guide/backends.html): Supported model providers and how to configure each (Claude, Gemini, GPT, Grok, Azure, LM Studio, OpenRouter, Codex, Claude Code SDK). +- [Skills](user_guide/skills.html): MassGen-as-a-skill for AI coding agents (Claude Code, Cursor, Copilot, …). +- [WebUI](user_guide/webui.html): Browser UI for side-by-side agent visualization and voting. +- [Filesystem & project integration](user_guide/files/index.html): Context paths, permissions, workspaces, snapshots. +- [Tools (MCP, code execution, custom tools)](user_guide/tools/index.html): How agents call tools and how to add your own. +- [Multimodal](user_guide/multimodal.html): Image and audio inputs. +- [Sessions and multi-turn](user_guide/sessions/index.html): Multi-turn conversations, memory, restart, cancellation. +- [Integration](user_guide/integration/index.html): Python API, LiteLLM custom provider, HTTP server, automation mode. +- [Validating configs](user_guide/validating_configs.html): Config schema validation. +- [Logging](user_guide/logging.html): How to read MassGen run logs. + +## Reference + +- [CLI reference](reference/cli.html): All `massgen` command flags. +- [Python API reference](reference/python_api.html): Async API for embedding MassGen. +- [YAML schema](reference/yaml_schema.html): Full configuration schema. +- [Configuration examples](reference/configuration_examples.html): Worked examples per use case. +- [MCP server registry](reference/mcp_server_registry.html): MCP servers known to MassGen. +- [Supported models](reference/supported_models.html): Model registry. +- [Timeouts](reference/timeouts.html): Timeout configuration. +- [Status file](reference/status_file.html): The `status.json` file emitted in automation mode. +- [Comparisons](reference/comparisons.html): MassGen vs LLM Council, CrewAI, LangGraph, AutoGen. + +## Examples + +- [Available configs](examples/available_configs.html): Index of YAML configs shipped with MassGen. +- [Basic examples](examples/basic_examples.html): Single-agent and small multi-agent setups. +- [Advanced patterns](examples/advanced_patterns.html): Complex coordination patterns. +- [Case studies](examples/case_studies.html): Side-by-side visual comparisons of MassGen vs single-agent solutions. + +## Optional + +- [Development & contributing](development/contributing.html): How to contribute to MassGen. +- [Architecture](development/architecture.html): Internal architecture notes. +- [Roadmap](development/roadmap.html): What's planned. diff --git a/docs/source/conf.py b/docs/source/conf.py index 40efa61c9..aad44cbd0 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -107,6 +107,10 @@ templates_path = ["_templates"] exclude_patterns = ["case_studies"] # Exclude standalone HTML from Sphinx processing +# Files in _extra/ are copied verbatim to the build output root (e.g. llms.txt). +# Use this for files that need to live at the site root, not under /_static/. +html_extra_path = ["_extra"] + # Autodoc settings autodoc_default_options = { "members": True, @@ -214,3 +218,68 @@ } hoverxref_tooltip_maxwidth = 600 hoverxref_tooltip_theme = "tooltipster-shadow" + + +# -- llms-full.txt generation (https://llmstxt.org) -------------------------- +# At build-finish, walk the curated documentation roots and concatenate their +# source files into a single llms-full.txt at the build output root. The +# hand-curated index lives at _extra/llms.txt and is copied via html_extra_path. + +_LLMS_FULL_ROOTS = ("quickstart", "user_guide", "reference") +_LLMS_FULL_EXTS = (".rst", ".md") + + +def _generate_llms_full_txt(app, exception): + if exception is not None: + return + if app.builder.name != "html": + return + + source_root = os.path.abspath(app.srcdir) + out_path = os.path.join(app.outdir, "llms-full.txt") + + header = ( + "# MassGen β€” full documentation dump\n\n" + "> Concatenated source of MassGen's quickstart, user guide, and reference\n" + "> documentation. For a curated index see /llms.txt. Generated at\n" + "> Sphinx build time from docs/source/{quickstart,user_guide,reference}.\n\n" + ) + + sections = [] + for root in _LLMS_FULL_ROOTS: + root_path = os.path.join(source_root, root) + if not os.path.isdir(root_path): + continue + for dirpath, _dirnames, filenames in os.walk(root_path): + for name in sorted(filenames): + if not name.endswith(_LLMS_FULL_EXTS): + continue + file_path = os.path.join(dirpath, name) + rel_path = os.path.relpath(file_path, source_root) + try: + with open(file_path, encoding="utf-8") as fh: + body = fh.read() + except (OSError, UnicodeDecodeError): + continue + sections.append((rel_path, body)) + + sections.sort(key=lambda item: item[0]) + + try: + with open(out_path, "w", encoding="utf-8") as fh: + fh.write(header) + for rel_path, body in sections: + fh.write(f"\n\n---\n\n## {rel_path}\n\n") + fh.write(body) + if not body.endswith("\n"): + fh.write("\n") + except OSError as exc: + print(f"Warning: failed to write llms-full.txt: {exc}") + return + + print(f"Wrote llms-full.txt ({len(sections)} files) -> {out_path}") + + +def setup(app): + app.connect("build-finished", _generate_llms_full_txt) + return {"parallel_read_safe": True, "parallel_write_safe": True} diff --git a/docs/source/index.rst b/docs/source/index.rst index 844b35f2e..4d29b62ad 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -80,6 +80,10 @@ See visual comparisons between MassGen and single-agent solutions, highlighting Use MassGen from Claude Code, Codex, Copilot, Cursor, and other AI coding agents. +.. note:: + + **For AI agents and crawlers:** This site publishes a curated `llms.txt `_ index following the `llmstxt.org spec `_, plus a concatenated `llms-full.txt `_ dump of the user guide and reference docs. + How Does MassGen Compare? ------------------------- diff --git a/docs/source/reference/comparisons.rst b/docs/source/reference/comparisons.rst index 9cf134ce9..338ceffae 100644 --- a/docs/source/reference/comparisons.rst +++ b/docs/source/reference/comparisons.rst @@ -145,11 +145,19 @@ Technical Architecture Differences The key difference: LLM Council uses a fixed 3-stage pipeline with a designated chairman, while MassGen uses dynamic coordination where agents naturally converge on the best solution through voting. -.. note:: +More Comparisons +---------------- - More comparisons coming soon: +Dedicated comparison pages for the most common "MassGen vs …" questions: - - MassGen vs Claude Code - - MassGen vs AG2 (and AutoGen) - - MassGen vs LangGraph - - MassGen vs CrewAI +- :doc:`comparisons/crewai` β€” role-based decomposition with a hosted control plane +- :doc:`comparisons/langgraph` β€” low-level graph orchestration with the LangChain stack +- :doc:`comparisons/autogen` β€” multi-agent conversations (Microsoft AutoGen and the community AG2 continuation) + +.. toctree:: + :hidden: + :maxdepth: 1 + + comparisons/crewai + comparisons/langgraph + comparisons/autogen diff --git a/docs/source/reference/comparisons/autogen.rst b/docs/source/reference/comparisons/autogen.rst new file mode 100644 index 000000000..954527e79 --- /dev/null +++ b/docs/source/reference/comparisons/autogen.rst @@ -0,0 +1,147 @@ +========================== +MassGen vs AutoGen / AG2 +========================== + +`AutoGen `_ is Microsoft's multi-agent conversation framework (CC-BY-4.0 docs / MIT code, ~58K GitHub stars as of May 2026). It pioneered the "agents that chat with each other and tools" pattern that much of the field now builds on. `AG2 `_ is the community-governed continuation of AutoGen (Apache 2.0 with original MIT components, ~4.6K stars, hosted under the new ``ag2ai`` organization). Both descend from the same codebase but have diverged in stewardship and roadmap. + +.. note:: + + **Maintenance status β€” read this first.** + + - **Microsoft AutoGen** is in maintenance mode. Microsoft has positioned `microsoft/agent-framework `_ as the enterprise successor, with documented migration paths from both AutoGen and Semantic Kernel, supporting Python + .NET with graph-based orchestration. AutoGen continues to receive bug fixes but no new features are planned. + - **AG2** is actively developed and serves as the community continuation of the AutoGen lineage. It is the project MassGen's own README cites as a direct predecessor β€” the "multi-agent conversation" idea in AG2 is part of what MassGen builds on. + + If you are choosing today: AG2 for the AutoGen-style API with active development, Microsoft Agent Framework for the new Microsoft-stack story, and AutoGen itself only for existing codebases pinned to it. + +This page compares MassGen with the AutoGen / AG2 lineage. Where AutoGen and AG2 differ, the differences are called out. + +.. contents:: On This Page + :local: + :depth: 2 + +Overview +-------- + +.. list-table:: + :header-rows: 1 + :widths: 18 41 41 + + * - Aspect + - MassGen + - AutoGen / AG2 + * - **Primary Goal** + - Parallel coordination of agents on the same task with voting and consensus + - Multi-agent conversation: agents and tools exchange messages to solve a task + * - **Architecture** + - All agents tackle the full task in parallel and converge through voting + - ``ConversableAgent`` base + group chat / swarm / nested chats / society-of-mind patterns + * - **Maintenance** + - Actively developed with regular releases + - **AutoGen:** maintenance only (successor: Microsoft Agent Framework). **AG2:** actively developed. + +Architecture & Coordination Model +--------------------------------- + +Both **AutoGen** and **AG2** model multi-agent work as a *conversation*. The shared lineage gives them a common shape: + +- ``ConversableAgent`` is the base abstraction β€” agents send and receive messages. +- Group chat coordinates multiple agents through a *speaker selection* policy (round-robin, manager-chosen, etc.). +- Higher-level patterns (swarms, nested chats, society-of-mind) compose conversations into richer flows. +- Tools are registered as Python functions and exposed to agents; MCP servers are supported via extensions. +- Termination is rule-based (max turns, sentinel message, predicate) β€” there is no native voting / consensus primitive. + +AutoGen layers this as Core / AgentChat / Extensions APIs and also ships AutoGen Studio (a no-code GUI). AG2 keeps the same conceptual model but emphasizes open governance ("AgentOS" branding) and is iterating on the API independently of Microsoft. + +**MassGen** runs all agents in parallel on the *same* task. Coordination is voting-based: at each step every agent decides between submitting a new answer or voting for an existing one. The orchestrator detects consensus automatically and the winner presents. + +In one line: AutoGen / AG2 model multi-agent work as a *conversation* where turn-taking is the control primitive. MassGen models it as *parallel attempts with collective validation* where voting is the control primitive. Both are valid; they optimize for different shapes of problem. + +Feature Comparison +------------------ + +.. list-table:: + :header-rows: 1 + :widths: 22 22 22 34 + + * - Feature + - MassGen + - AutoGen / AG2 + - Notes + * - **License** + - Apache 2.0 + - AutoGen: MIT (code) / CC-BY-4.0 (docs). AG2: Apache 2.0 with original MIT components. + - Both lineages fully open source for self-hosted use + * - **Languages** + - Python + - AutoGen: Python, .NET / C#. AG2: Python. + - AutoGen's .NET track is one reason to prefer it on the Microsoft stack + * - **CLI** + - βœ… ``massgen``, ``massgen --automation``, ``massgen --web`` + - AutoGen: ``autogenstudio ui``. AG2: Python-first; CLI present but less emphasized. + - Different focuses + * - **Python API** + - βœ… Async API + - βœ… Core, AgentChat, Extensions (AutoGen); ``ConversableAgent`` + orchestration patterns (AG2) + - Both layered; pick the level you want + * - **WebUI / Studio** + - βœ… Side-by-side agent panels with live streaming and vote/consensus view + - AutoGen Studio (no-code GUI; docs note it is not production-ready without extra hardening) + - Different roles + * - **MCP tools** + - βœ… First-class on every backend (Claude, Codex, Gemini, OpenAI-compatible, Grok, Claude Code SDK) + - βœ… MCP server support via extensions in both AutoGen and AG2 + - Both work + * - **Model providers** + - 10+ direct backends with per-agent heterogeneity (Claude, Gemini, GPT, Grok, Azure, LM Studio, OpenRouter, Codex, Claude Code SDK) + - OpenAI primary; other providers via extension clients / generic ``LLMConfig`` + - MassGen's backend matrix is broader and first-class + * - **Voting / consensus** + - βœ… Core mechanism; agents vote, winner presents + - ❌ Not built in (group chat uses speaker selection + termination, not voting) + - This is the central design difference + * - **Maintenance** + - Active development + - AutoGen: maintenance only. AG2: active. + - Affects long-term roadmap, not current functionality + * - **Successor / continuation** + - n/a + - AutoGen β†’ `microsoft/agent-framework `_ (Python + .NET, graph-based; migration paths from both AutoGen and Semantic Kernel). AG2 is the community continuation. + - For new work, evaluate AG2 (Python-first) or Microsoft Agent Framework (Python + .NET) + +Voting and Consensus (the MassGen Differentiator) +------------------------------------------------- + +AutoGen and AG2 group chats pick the *next speaker*; MassGen's protocol picks the *winner*. The two are not the same: + +- Speaker selection is a *turn-taking* mechanism β€” useful when one agent's output is the input to the next. +- MassGen's voting is a *selection* mechanism β€” useful when you want N agents to attempt the same thing and the system to identify the strongest answer. + +If your task is genuinely conversational (an agent asks another agent to do something, they trade messages, the chat terminates on a condition), AutoGen / AG2 is well-shaped for it. If your task benefits from many parallel attempts converging on the best answer, MassGen is purpose-built for it. + +When to Use Each +---------------- + +**Choose AG2 when you need:** + +- An *AutoGen-style API* (``ConversableAgent``, group chats, swarms, nested chats) with active community-led development. +- An open governance model independent of any single corporate steward. +- Compatibility with the broader AutoGen ecosystem of notebooks and patterns. + +**Choose Microsoft AutoGen when you need:** + +- Compatibility with an existing AutoGen codebase you cannot migrate. +- The .NET / C# code path alongside Python on the Microsoft stack. (Note: for new Microsoft-stack work, Microsoft Agent Framework is the recommended forward path.) + +**Choose MassGen when you need:** + +- *Parallel attempts + voting* as a first-class control flow. +- Side-by-side live visualization of every agent's reasoning and answer. +- Heterogeneous backends per agent (Claude + Gemini + GPT + Grok all on the same task). +- An actively developed open-source project with regular releases and a broad backend matrix. + +Related +------- + +- :doc:`crewai` β€” role-based decomposition framework +- :doc:`langgraph` β€” graph-based orchestration substrate +- :doc:`../comparisons` β€” back to comparisons hub diff --git a/docs/source/reference/comparisons/crewai.rst b/docs/source/reference/comparisons/crewai.rst new file mode 100644 index 000000000..8e98a29a8 --- /dev/null +++ b/docs/source/reference/comparisons/crewai.rst @@ -0,0 +1,132 @@ +================== +MassGen vs CrewAI +================== + +`CrewAI `_ is a popular open-source framework (MIT, ~51K GitHub stars as of May 2026) for orchestrating role-playing AI agents. It is independent of LangChain and ships with both a Python SDK and the commercial *CrewAI AMP* (Agent Management Platform) for hosted execution and observability. + +This page compares CrewAI with MassGen. The intent is fair-handed: both projects are healthy, the right choice depends on what you are trying to build. + +.. contents:: On This Page + :local: + :depth: 2 + +Overview +-------- + +.. list-table:: + :header-rows: 1 + :widths: 20 40 40 + + * - Aspect + - MassGen + - CrewAI + * - **Primary Goal** + - Parallel multi-agent coordination through voting and consensus on the *same* task + - Sequential / hierarchical role-based agent teams ("crews") that *decompose* a task across roles + * - **Architecture** + - All agents tackle the full task in parallel, observe each other, then vote on a winning answer + - "Crews" of role-played agents execute task graphs; "Flows" add event-driven control over multiple crews + * - **Hosted product** + - Open source only; runs locally, in CI, or in your infra + - Open source SDK + hosted *Crew Control Plane* / AMP for managed deployment and observability + +Architecture & Coordination Model +--------------------------------- + +**CrewAI** treats a multi-agent task as a *workflow*. The unit of work is a ``Task``, the unit of work-doing is an ``Agent`` with a role/goal/backstory, and a ``Crew`` is the team plus the process (sequential or hierarchical) that runs the tasks. ``Flow`` adds event-driven orchestration so multiple crews can be triggered and composed deterministically. The mental model is closer to a structured pipeline than a debate: each task is owned by one agent, and the framework's job is to dispatch and chain them. + +**MassGen** treats a multi-agent task as a *redundant parallel attempt*. All agents receive the same task and produce candidate answers in parallel. At each step every agent sees other agents' most recent answers and can either submit a new answer or vote for an existing one. Coordination ends when consensus is reached, and the winning answer is the one with the most votes. See :doc:`../../user_guide/concepts` for the full coordination model. + +In one line: CrewAI is built for *decomposition* (different roles do different sub-tasks). MassGen is built for *refinement* (many agents attack the same task and converge). + +Feature Comparison +------------------ + +.. list-table:: + :header-rows: 1 + :widths: 25 20 20 35 + + * - Feature + - MassGen + - CrewAI + - Notes + * - **License** + - Apache 2.0 + - MIT + - Both fully open source for self-hosted use + * - **CLI** + - βœ… ``massgen``, ``massgen --automation``, ``massgen --web`` + - βœ… ``crewai`` (project scaffolding, run, install) + - Different focuses: MassGen CLI is the primary interactive entry point; CrewAI CLI is mostly project bootstrap + * - **Python API** + - βœ… Async API, LiteLLM custom provider + - βœ… Synchronous API, role-based abstractions + - CrewAI's API centers on ``Agent``/``Task``/``Crew``; MassGen's centers on parallel runs and votes + * - **WebUI** + - βœ… Side-by-side agent panels, live streaming, vote/consensus view + - βœ… CrewAI AMP for hosted deployment, traces, and observability + - Different roles: MassGen's WebUI visualizes the *coordination*; CrewAI AMP is more of a *deployment dashboard* + * - **MCP tools** + - βœ… First-class on every backend (Claude, Codex, Gemini, OpenAI-compatible, Grok, Claude Code SDK) + - βœ… First-class via ``mcps`` field on Agent and ``MCPServerAdapter`` + - Both support stdio, SSE, and streamable HTTP transports + * - **Code execution / filesystem tools** + - βœ… Sandboxed Python/Bash, filesystem with permissioned context paths + - βœ… Tool ecosystem (web search, code, files) via ``crewai-tools`` + - Different defaults: MassGen ships filesystem permissions and workspace snapshots; CrewAI relies on its tool library + * - **Backend / model providers** + - 10+ direct backends (Claude, Gemini, OpenAI, Grok, Azure, LM Studio, OpenRouter, …) + Claude Code SDK + Codex + - OpenAI default; Ollama, Anthropic, Gemini, and others via configuration + - MassGen's backend abstraction is heterogenous-by-design (each agent can use a different provider) + * - **Voting / consensus** + - βœ… Core mechanism; agents vote, winner presents + - ❌ Not built in (the framework is task-decomposition oriented) + - This is the central design difference + * - **Live streaming** + - βœ… Token-level streaming to TUI and WebUI + - βœ… Event/step streaming + - Both stream; MassGen also streams per-agent in parallel side by side + * - **Hosted control plane** + - ❌ + - βœ… CrewAI AMP (hosted + self-hosted offerings) + - Use CrewAI if you specifically want a managed deployment surface + +Voting and Consensus (the MassGen Differentiator) +------------------------------------------------- + +CrewAI does not have a native voting mechanism. A "consensus" pattern in CrewAI is something you build yourself by orchestrating multiple agents and writing a reducer task. + +In MassGen voting is *the* coordination protocol, not an optional pattern: + +- Every agent sees the most recent answer from every other agent at each step. +- Every agent at each step picks one of: submit a new answer, or vote for an existing answer. +- The orchestrator detects consensus automatically and the winner presents. +- Combined with checklist-gated evaluation criteria (see :doc:`../../user_guide/concepts`), this enforces refinement until quality is genuinely achieved rather than declared. + +If your task benefits from diverse parallel attempts with collective validation β€” e.g. writing, design, math, code synthesis with verifier feedback β€” voting is what MassGen adds that role-based frameworks don't. + +When to Use Each +---------------- + +**Choose CrewAI when you need:** + +- A *role-based decomposition* of a task β€” clear sub-tasks owned by clearly-named agents. +- A managed control plane (CrewAI AMP) for deployment, tracing, and team ergonomics. +- A large existing community / ecosystem of role recipes and tools. + +**Choose MassGen when you need:** + +- *Parallel refinement* of one task with multiple agents converging on a best answer. +- Side-by-side live visualization of every agent's reasoning and answer. +- Heterogeneous backends per agent (Claude + Gemini + GPT + Grok all on the same task). +- Voting / consensus as a first-class control flow, not a pattern to re-implement. +- A local-first / Apache 2.0 stack with no managed control plane dependency. + +Choosing CrewAI does not exclude MassGen and vice versa β€” they solve adjacent problems. A common pattern is to use MassGen at decision points where multiple strong attempts and voting genuinely add quality, and CrewAI (or similar) where the work cleanly decomposes into roles. + +Related +------- + +- :doc:`langgraph` β€” graph-based orchestration (more low-level than CrewAI) +- :doc:`autogen` β€” multi-agent conversations (in maintenance mode; see successor) +- :doc:`../comparisons` β€” back to comparisons hub diff --git a/docs/source/reference/comparisons/langgraph.rst b/docs/source/reference/comparisons/langgraph.rst new file mode 100644 index 000000000..4fa878142 --- /dev/null +++ b/docs/source/reference/comparisons/langgraph.rst @@ -0,0 +1,128 @@ +===================== +MassGen vs LangGraph +===================== + +`LangGraph `_ is LangChain's low-level orchestration framework for stateful, graph-based agent workflows (MIT, ~32K GitHub stars as of May 2026). It powers production agents built on the LangChain stack and is paired with the commercial *LangSmith Studio* / *LangGraph Platform* for visual prototyping, deployment, and observability. + +This page compares LangGraph with MassGen. The two operate at very different levels of abstraction β€” LangGraph is a graph runtime, MassGen is a coordination protocol. They are often complementary rather than substitutes. + +.. contents:: On This Page + :local: + :depth: 2 + +Overview +-------- + +.. list-table:: + :header-rows: 1 + :widths: 20 40 40 + + * - Aspect + - MassGen + - LangGraph + * - **Primary Goal** + - Parallel multi-agent coordination through voting and consensus on the same task + - Low-level orchestration of stateful graphs of nodes (agents, tools, branches, retries) + * - **Architecture** + - All agents tackle the full task in parallel and converge through voting + - Explicit ``StateGraph`` of nodes and edges with durable execution and persistent state + * - **Hosted product** + - Open source only + - Open source SDK + *LangGraph Platform* / *LangSmith Studio* for deployment and visual debugging + +Architecture & Coordination Model +--------------------------------- + +**LangGraph** is a graph runtime. You define a typed state, a set of nodes (functions / agents / tools), and edges (conditional branches, parallel fan-outs, loops). The runtime executes the graph, persists state, supports human-in-the-loop interrupts, and can resume from failures. Coordination patterns β€” supervisor, swarm, plan-and-execute, debate β€” are *encodings* in the graph, not first-class primitives. + +**MassGen** is a coordination *protocol*. Agents run in parallel on the same task, observe each other's most recent answers, and choose between "answer" and "vote." The protocol guarantees the orchestrator can detect consensus and pick a winner deterministically. Refinement is bounded by the protocol, not by a graph the user has to author. + +In one line: LangGraph gives you the substrate to build any agent topology. MassGen gives you one specific topology β€” parallel attempts plus voting β€” implemented end-to-end with a TUI, WebUI, and backend matrix. + +Feature Comparison +------------------ + +.. list-table:: + :header-rows: 1 + :widths: 25 20 20 35 + + * - Feature + - MassGen + - LangGraph + - Notes + * - **License** + - Apache 2.0 + - MIT + - Both fully open source for self-hosted use + * - **Abstraction level** + - High β€” pre-built coordination protocol + - Low β€” author your own graph + - Different products; LangGraph is closer to a workflow runtime than an agent framework + * - **CLI** + - βœ… ``massgen``, ``massgen --automation``, ``massgen --web`` + - βœ… ``langgraph`` CLI for the LangGraph Platform / Studio + - Different focuses + * - **Python API** + - βœ… Async API + - βœ… Python and JS/TS APIs + - LangGraph's API is broader by virtue of being multi-language + * - **WebUI** + - βœ… Side-by-side agent panels, live streaming, vote/consensus view + - βœ… LangSmith Studio for graph visualization, traces, debugging + - Studio focuses on *graph* execution; MassGen WebUI focuses on *parallel agents* + voting + * - **MCP tools** + - βœ… First-class on every backend + - βœ… Via the ``langchain-mcp-adapters`` bridge (converts MCP tools to LangChain ``BaseTool``) + - Both work; LangGraph's path goes through LangChain's tool abstraction + * - **Model providers** + - 10+ direct backends including Claude Code SDK + Codex; per-agent heterogeneity + - Whatever LangChain integrates (extensive) + - LangChain's integration surface is the largest in the ecosystem + * - **Voting / consensus** + - βœ… Core mechanism + - ❌ Not built in (you can implement it as a node) + - This is the central design difference + * - **Durable execution** + - Workspace snapshots, status files, checkpoint MCP for save/restore + - βœ… Durable state, checkpoints, resume-after-failure as first-class features + - LangGraph is the more general purpose runtime here + * - **Hosted platform** + - ❌ + - βœ… LangGraph Platform / LangSmith Studio + - Use LangGraph if you want a managed deployment + observability stack + +Voting and Consensus (the MassGen Differentiator) +------------------------------------------------- + +LangGraph can *express* a voting topology β€” define N parallel agent nodes, fan out, then a reducer node that picks a winner. It does not *provide* one. That means: + +- You decide when to stop iterating (loop condition vs. quality criteria). +- You write the reducer logic (majority? weighted? based on a verifier?). +- You wire the visualization to surface "this is what each agent said and who won" yourself. + +MassGen ships all of the above as a single product: streaming side-by-side panels, vote arrows in the WebUI consensus map, checklist-gated criteria, and a TUI consensus visualization. If parallel + voting is the *primary* thing you want, MassGen is purpose-built for it. If voting is one of many topologies your system needs alongside ETL, branching, and tool-heavy flows, LangGraph is the better substrate. + +When to Use Each +---------------- + +**Choose LangGraph when you need:** + +- *Arbitrary agent topologies* you author yourself (supervisor, swarm, plan-execute, custom). +- Durable, resumable execution as a first-class concern (long-running flows, human approvals). +- Tight LangChain ecosystem integration (vector stores, retrievers, evaluators, deployment via LangGraph Platform). + +**Choose MassGen when you need:** + +- A pre-built *parallel + voting* coordination protocol you don't have to reimplement. +- Heterogeneous backends per agent on the same task (Claude + Gemini + GPT + Grok, etc.). +- A polished TUI / WebUI showing all agents working simultaneously and their consensus path. +- A local-first stack without a managed deployment platform dependency. + +LangGraph and MassGen are at different levels and can be combined: MassGen can be invoked as a tool / subgraph from a larger LangGraph workflow when a particular step benefits from parallel attempts and voting. + +Related +------- + +- :doc:`crewai` β€” role-based decomposition framework +- :doc:`autogen` β€” multi-agent conversations (in maintenance mode; see successor) +- :doc:`../comparisons` β€” back to comparisons hub From 821bbff5ab5c0b841966dcaff6f47763315032b3 Mon Sep 17 00:00:00 2001 From: ncrispino Date: Fri, 15 May 2026 07:59:43 -0700 Subject: [PATCH 3/7] Small chagnes --- massgen/orchestrator.py | 9 +++++++++ massgen/tests/test_bootstrap_criteria.py | 6 ++++++ 2 files changed, 15 insertions(+) diff --git a/massgen/orchestrator.py b/massgen/orchestrator.py index d10d6df6d..e5e0e9e2f 100644 --- a/massgen/orchestrator.py +++ b/massgen/orchestrator.py @@ -1298,10 +1298,19 @@ async def _run_bootstrap_discriminator_step(self) -> int: except Exception as _exc: logger.debug("[bootstrap_criteria] notify_runtime_subagent_started failed: %s", _exc) + # refine=False is the canonical single-shot knob: SubagentManager + # sets max_new_answers_per_agent=1, skip_voting=True, and + # skip_final_presentation=True at the orchestrator level (where + # they actually win β€” the coordination-dict overrides we also set + # above are belt-and-suspenders, but the orchestrator-level + # `max_new_answers_per_agent: 3` default would otherwise shadow + # them, as observed live in log_20260513_095921_816676's + # subagent_config_bootstrap_discriminator_1.yaml). result = await manager.spawn_subagent( task=prompt, subagent_id=subagent_id, timeout_seconds=180, + refine=False, ) except Exception as exc: logger.warning("[bootstrap_criteria] discriminator spawn failed: %s", exc, exc_info=True) diff --git a/massgen/tests/test_bootstrap_criteria.py b/massgen/tests/test_bootstrap_criteria.py index 9bf28bf38..bd298ed63 100644 --- a/massgen/tests/test_bootstrap_criteria.py +++ b/massgen/tests/test_bootstrap_criteria.py @@ -899,6 +899,12 @@ def capture_config(*args, **kwargs): # the run, and max_new_answers_global=1 is the hard cap. assert coord.get("voting_threshold") == 1, f"discriminator must lower voting_threshold so single-agent self-vote ends the run, got {coord}" assert coord.get("max_new_answers_global") == 1, f"discriminator must set max_new_answers_global=1 as a hard cap, got {coord}" + # And the canonical knob: refine=False on spawn_subagent. This is the + # one SubagentManager actually respects at the orchestrator level (the + # coordination-dict overrides get shadowed by the orchestrator-level + # max_new_answers_per_agent=3 default without it). + spawn_kwargs = mock_manager.spawn_subagent.call_args.kwargs + assert spawn_kwargs.get("refine") is False, f"discriminator must pass refine=False to spawn_subagent for single-shot, got {spawn_kwargs}" def test_variant_b_discriminator_picks_up_criteria_json_artifact(self, tmp_path): """When the subagent writes criteria.json to its workspace, the From 4d9a722922e706c1bf58c264a2ff124d46b478b6 Mon Sep 17 00:00:00 2001 From: ncrispino Date: Fri, 15 May 2026 08:10:22 -0700 Subject: [PATCH 4/7] Small compare changes --- docs/source/reference/comparisons/autogen.rst | 2 +- docs/source/reference/comparisons/langgraph.rst | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/reference/comparisons/autogen.rst b/docs/source/reference/comparisons/autogen.rst index 954527e79..e7271b719 100644 --- a/docs/source/reference/comparisons/autogen.rst +++ b/docs/source/reference/comparisons/autogen.rst @@ -134,7 +134,7 @@ When to Use Each **Choose MassGen when you need:** -- *Parallel attempts + voting* as a first-class control flow. +- *Parallel attempts + voting* as a first-class control flow with iterative refinement. - Side-by-side live visualization of every agent's reasoning and answer. - Heterogeneous backends per agent (Claude + Gemini + GPT + Grok all on the same task). - An actively developed open-source project with regular releases and a broad backend matrix. diff --git a/docs/source/reference/comparisons/langgraph.rst b/docs/source/reference/comparisons/langgraph.rst index 4fa878142..3432e8ba2 100644 --- a/docs/source/reference/comparisons/langgraph.rst +++ b/docs/source/reference/comparisons/langgraph.rst @@ -113,7 +113,7 @@ When to Use Each **Choose MassGen when you need:** -- A pre-built *parallel + voting* coordination protocol you don't have to reimplement. +- A pre-built *parallel + voting* coordination protocol focused on iterative refinement that you don't have to reimplement. - Heterogeneous backends per agent on the same task (Claude + Gemini + GPT + Grok, etc.). - A polished TUI / WebUI showing all agents working simultaneously and their consensus path. - A local-first stack without a managed deployment platform dependency. From b85c629a4d41f303ebe24994bce5f48e9d428e9d Mon Sep 17 00:00:00 2001 From: ncrispino Date: Fri, 15 May 2026 08:32:06 -0700 Subject: [PATCH 5/7] docs: fix llms.txt relative link and list all comparisons on landing page Two follow-ups discovered when previewing the docs locally: - `href="/llms.txt"` resolved to `file:///llms.txt` on local builds (and would 404 on RTD without a root redirect). Switched to relative `href="llms.txt"` which works in both contexts. - The "How Does MassGen Compare?" section only mentioned LLM Council. Expanded it to list all four comparison pages (LLM Council, CrewAI, LangGraph, AutoGen/AG2) with their core differentiators. Co-Authored-By: Claude Opus 4.7 --- docs/source/index.rst | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index 4d29b62ad..b3e55f956 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -82,13 +82,18 @@ Use MassGen from Claude Code, Codex, Copilot, Cursor, and other AI coding agents .. note:: - **For AI agents and crawlers:** This site publishes a curated `llms.txt `_ index following the `llmstxt.org spec `_, plus a concatenated `llms-full.txt `_ dump of the user guide and reference docs. + **For AI agents and crawlers:** This site publishes a curated `llms.txt `_ index following the `llmstxt.org spec `_, plus a concatenated `llms-full.txt `_ dump of the user guide and reference docs. How Does MassGen Compare? ------------------------- -**MassGen vs LLM Council:** While LLM Council follows a fixed 3-stage pipeline, MassGen agents autonomously decide to contribute new answers or vote for others, reaching consensus organically. Plus, MassGen agents can use tools, execute code, and read/write files in your codebase β€” backed by active development with regular releases. :doc:`See full comparison β†’ ` +MassGen sits in a different design space than typical multi-agent frameworks. The core differentiator across the board is *parallel attempts with voting and consensus* β€” agents tackle the same task in parallel, observe each other, and converge on a winner β€” backed by tools, code execution, filesystem integration, and active development. + +- :doc:`MassGen vs LLM Council ` β€” dynamic voting / consensus vs a fixed 3-stage pipeline (responses β†’ ranking β†’ chairman synthesis). +- :doc:`MassGen vs CrewAI ` β€” parallel refinement on one task vs role-based decomposition into sub-tasks. +- :doc:`MassGen vs LangGraph ` β€” a pre-built parallel + voting protocol vs a low-level graph runtime you author yourself. +- :doc:`MassGen vs AutoGen / AG2 ` β€” parallel attempts with collective validation vs conversation-based multi-agent message passing. Quick Start From 5fb01f3d546d31ee09fbdc719fa0020191e7fcee Mon Sep 17 00:00:00 2001 From: asdasd2323wxs Date: Sat, 16 May 2026 00:01:01 +0800 Subject: [PATCH 6/7] docs: docs for v0.1.87 --- CHANGELOG.md | 33 +++++++++- CONTRIBUTING.md | 8 +-- README.md | 51 ++++++++------- README_PYPI.md | 51 ++++++++------- ROADMAP.md | 27 ++++++-- ROADMAP_v0.1.88.md | 9 +-- docs/announcements/archive/v0.1.86.md | 68 ++++++++++++++++++++ docs/announcements/current-release.md | 46 ++++++------- docs/announcements/github-release-v0.1.86.md | 32 --------- docs/announcements/github-release-v0.1.87.md | 25 +++++++ docs/source/index.rst | 12 ++-- massgen/configs/README.md | 24 +++++-- 12 files changed, 251 insertions(+), 135 deletions(-) create mode 100644 docs/announcements/archive/v0.1.86.md delete mode 100644 docs/announcements/github-release-v0.1.86.md create mode 100644 docs/announcements/github-release-v0.1.87.md diff --git a/CHANGELOG.md b/CHANGELOG.md index b2a514677..aaab6bad2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## Recent Releases +**v0.1.87 (May 15, 2026)** - Documentation: Framework Comparisons & `llms.txt` +Documentation release adding three "MassGen vs ..." comparison pages (CrewAI, LangGraph, AutoGen/AG2), a curated `llms.txt` index plus full-corpus `llms-full.txt` dump (per [llmstxt.org](https://llmstxt.org) spec), and small README/landing-page pointers so AI agents and crawlers can discover the docs. Also ships a one-line `refine=False` fix for the `bootstrap_subagent` discriminator that was being shadowed by the orchestrator's default `max_new_answers_per_agent`. + **v0.1.86 (May 13, 2026)** - `bootstrap_subagent` Discriminator + Codex MCP Approval Fix Variant B (`criteria_mode: bootstrap_subagent`) is now functional: the orchestrator runs an in-process critic between rounds, merges critic-proposed criteria into the accumulator, and augments the next round's checklist. This release also fixes Codex MCP tool calls under `codex exec` by writing the approval bypasses needed for non-interactive runs. @@ -18,8 +21,34 @@ New `orchestrator.coordination.criteria_mode` option lets evaluation criteria em **v0.1.84 (May 8, 2026)** - TUI Consensus Map A compact visual map below the agent status ribbon during multi-agent runs. Shows agent nodes with latest answer labels, vote arrows, current vote leader, winner state, and waiting/working indicators β€” driven by existing coordination events without backend schema changes. Hidden on welcome and single-agent runs. -**v0.1.83 (May 1, 2026)** - In-Session Standalone Checkpoint MCP Integration -The standalone checkpoint MCP server can now be exposed *inside* a normal MassGen run via a new `coordination.standalone_checkpoint` config block, giving single-agent sessions access to the richer `init` + `checkpoint` tools backed by their own reviewer team. Enhanced checkpoint tool card visualization separates primary operations from system tasks. +--- + +## [0.1.87] - 2026-05-15 + +### Added +- **Framework Comparison Pages** ([#1094](https://github.com/massgen/MassGen/pull/1094)): Three new "MassGen vs ..." pages under `docs/source/reference/comparisons/` β€” `crewai.rst`, `langgraph.rst`, `autogen.rst`. Each page positions MassGen's parallel-refinement-with-voting model against the target framework's coordination shape and lists when to reach for one versus the other +- **`llms.txt` Index** ([#1094](https://github.com/massgen/MassGen/pull/1094)): Curated [llmstxt.org](https://llmstxt.org)-spec index published at the docs site root via Sphinx `html_extra_path` (`docs/source/_extra/llms.txt`) β€” gives AI agents a small, hand-picked map of the docs +- **`llms-full.txt` Corpus** ([#1094](https://github.com/massgen/MassGen/pull/1094)): Concatenated full-docs dump (~1 MB across 59 files), generated by a Sphinx `build-finished` hook in `docs/source/conf.py` and shipped alongside `llms.txt` for crawlers that want the complete corpus +- **Docs Landing Page Update** ([#1094](https://github.com/massgen/MassGen/pull/1094)): "How Does MassGen Compare?" section on `docs/source/index.rst` now lists all four comparisons (LLM Council + the three new ones), with the parent `docs/source/reference/comparisons.rst` losing its "coming soon" note and gaining a toctree +- **README Pointers** ([#1094](https://github.com/massgen/MassGen/pull/1094)): One-line pointers in `README.md` (and synced `README_PYPI.md`) directing AI agents to `llms.txt` / `llms-full.txt` + +### Fixed +- **`bootstrap_subagent` Discriminator Single-Shot** ([#1094](https://github.com/massgen/MassGen/pull/1094)): `Orchestrator._run_bootstrap_discriminator_step` now passes `refine=False` to `SubagentManager.spawn_subagent`. This is the canonical single-shot knob that `SubagentManager` actually respects at the orchestrator level β€” without it, the orchestrator's `max_new_answers_per_agent: 3` default shadowed the coordination-dict overrides, letting the discriminator refine instead of single-shot. Found via live log inspection (`log_20260513_095921_816676`) + - `massgen/orchestrator.py:1298` β€” `refine=False` added to `spawn_subagent` call + - `massgen/tests/test_bootstrap_criteria.py` β€” new assertion that `discriminator must pass refine=False to spawn_subagent for single-shot` + +### Documentations, Configurations and Resources +- **Comparison pages**: `docs/source/reference/comparisons/{crewai,langgraph,autogen}.rst` +- **Sphinx `build-finished` hook**: `docs/source/conf.py` β€” generates `llms-full.txt` from the source tree at build time +- **README pointers**: `README.md`, `README_PYPI.md` β€” AI agents are directed to `llms.txt` / `llms-full.txt` + +### Notes +- Originally-planned Image/Video Edit Capabilities ([#959](https://github.com/massgen/MassGen/issues/959)) and Discriminative Criteria Refinements deferred to v0.1.88. + +### Technical Details +- **Major Focus**: Make MassGen discoverable to AI agents and crawlers, and give human readers structured "MassGen vs ..." comparisons against the three frameworks most often asked about +- **PRs Merged**: [#1094](https://github.com/massgen/MassGen/pull/1094) +- **Contributors**: @ncrispino, @HenryQi and the MassGen team --- diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index d81c95dfb..6e56f093b 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -359,7 +359,7 @@ Create a `.env` file in the `massgen` directory as described in [README](README. ## πŸ”§ Development Workflow -> **Important**: Our next version is v0.1.87. If you want to contribute, please contribute to the `dev/v0.1.87` branch (or `main` if dev/v0.1.87 doesn't exist yet). +> **Important**: Our next version is v0.1.88. If you want to contribute, please contribute to the `dev/v0.1.88` branch (or `main` if dev/v0.1.88 doesn't exist yet). ### 1. Create Feature Branch @@ -368,7 +368,7 @@ Create a `.env` file in the `massgen` directory as described in [README](README. git fetch upstream # Create feature branch from dev/v0.1.60 (or main if dev branch doesn't exist yet) -git checkout -b feature/your-feature-name upstream/dev/v0.1.87 +git checkout -b feature/your-feature-name upstream/dev/v0.1.88 ``` ### 2. Make Your Changes @@ -507,7 +507,7 @@ git push origin feature/your-feature-name ``` Then create a pull request on GitHub: -- Base branch: `dev/v0.1.87` (or `main` if dev branch doesn't exist yet) +- Base branch: `dev/v0.1.88` (or `main` if dev branch doesn't exist yet) - Compare branch: `feature/your-feature-name` - Add clear description of changes - Link any related issues @@ -617,7 +617,7 @@ Have a significant feature idea not covered by existing tracks? - [ ] Tests pass locally - [ ] Documentation is updated if needed - [ ] Commit messages follow convention -- [ ] PR targets `dev/v0.1.87` branch (or `main` if dev branch doesn't exist yet) +- [ ] PR targets `dev/v0.1.88` branch (or `main` if dev branch doesn't exist yet) ### PR Description Should Include diff --git a/README.md b/README.md index fbd50625d..9a4d0f7e3 100644 --- a/README.md +++ b/README.md @@ -122,15 +122,15 @@ This project started with the "threads of thought" and "iterative refinement" id

πŸ—ΊοΈ Roadmap

-- [Recent Achievements (v0.1.86)](#recent-achievements-v0186) -- [Previous Achievements (v0.0.3 - v0.1.85)](#previous-achievements-v003---v0185) +- [Recent Achievements (v0.1.87)](#recent-achievements-v0187) +- [Previous Achievements (v0.0.3 - v0.1.86)](#previous-achievements-v003---v0186) - [Key Future Enhancements](#key-future-enhancements) - Bug Fixes & Backend Improvements - Advanced Agent Collaboration - Expanded Model, Tool & Agent Integrations - Improved Performance & Scalability - Enhanced Developer Experience -- [v0.1.87 Roadmap](#v0187-roadmap) +- [v0.1.88 Roadmap](#v0188-roadmap)
@@ -155,19 +155,20 @@ This project started with the "threads of thought" and "iterative refinement" id --- -## πŸ†• Latest Features (v0.1.86) +## πŸ†• Latest Features (v0.1.87) -**πŸŽ‰ Released: May 13, 2026** +**πŸŽ‰ Released: May 15, 2026** -**What's New in v0.1.86:** -- **🧠 `bootstrap_subagent` Discriminator** - `orchestrator.coordination.criteria_mode: bootstrap_subagent` now runs a dedicated between-rounds LLM critic that proposes criteria from the current answers, merges them into the accumulator, and augments the next round's checklist automatically. -- **🧹 Session-End Criteria Drain** - Late stdio JSONL criteria emissions are drained before final presentation so they are not stranded after the last checklist resolution pass. -- **πŸ› οΈ Codex MCP Approval Fix** - Codex workspaces now include the non-interactive approval bypasses needed for external MCP tools such as `submit_checklist`, `create_task_plan`, `new_answer`, and `read_media`. +**What's New in v0.1.87:** +- **πŸ“š Framework Comparison Pages** - Three new "MassGen vs ..." pages β€” CrewAI, LangGraph, AutoGen/AG2 β€” under `docs/source/reference/comparisons/`, positioning MassGen's parallel-refinement-with-voting model against each framework's coordination shape. +- **πŸ€– `llms.txt` for AI Agents** - A curated [`llms.txt`](https://docs.massgen.ai/en/latest/llms.txt) index plus a full-corpus [`llms-full.txt`](https://docs.massgen.ai/en/latest/llms-full.txt) dump (per [llmstxt.org spec](https://llmstxt.org)), so AI agents and crawlers can discover MassGen's docs cleanly. +- **πŸ”§ `bootstrap_subagent` Single-Shot Fix** - `Orchestrator._run_bootstrap_discriminator_step` now passes `refine=False` to `spawn_subagent` β€” the canonical knob `SubagentManager` actually respects at the orchestrator level. -**Try v0.1.86 Features:** +**Try v0.1.87 Features:** ```bash -pip install massgen==0.1.86 -uv run massgen --config massgen/configs/coordination/bootstrap_subagent_criteria.yaml "Create an SVG of an AI agent coding." +pip install massgen==0.1.87 +# Read the framework comparisons: +# https://docs.massgen.ai/en/latest/reference/comparisons.html ``` β†’ [See full release history and examples](massgen/configs/README.md#release-history--examples) @@ -1241,19 +1242,21 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch ⚠️ **Early Stage Notice:** As MassGen is in active development, please expect upcoming breaking architecture changes as we continue to refine and improve the system. -### Recent Achievements (v0.1.86) +### Recent Achievements (v0.1.87) -**πŸŽ‰ Released: May 13, 2026** +**πŸŽ‰ Released: May 15, 2026** -#### `bootstrap_subagent` Discriminator + Codex MCP Approval Fix -- **`bootstrap_subagent` Variant (fully functional)**: A dedicated between-rounds LLM critic now reads the task and each agent's latest answer, emits `proposed_criteria` as JSON, and merges them into `bootstrap_criteria_accumulator.json` for the next round's checklist -- **Answer-Snapshot Gate**: The discriminator runs once per unique answer snapshot, avoiding repeated critiques when the answer set has not changed -- **Session-End Drain**: Late stdio criteria emissions are captured before final presentation -- **Codex MCP Approval Fix**: Non-interactive Codex workspaces now write both `approval_policy = "never"` and per-MCP-server `default_tools_approval_mode = "approve"`, preventing external MCP tools from being cancelled immediately under `codex exec` -- **Example Configs**: `massgen/configs/coordination/bootstrap_subagent_criteria.yaml` for the critic-driven path and `bootstrap_inline_criteria.yaml` for agent-proposed criteria -- **Tests**: Bootstrap criteria coverage expanded to 35 tests, plus Codex workspace approval policy coverage across approval modes +#### Documentation: Framework Comparisons & `llms.txt` +- **Framework Comparison Pages** ([#1094](https://github.com/massgen/MassGen/pull/1094)): Three new "MassGen vs ..." pages β€” `crewai.rst`, `langgraph.rst`, `autogen.rst` β€” under `docs/source/reference/comparisons/`, positioning MassGen against each framework's coordination shape and listing when to reach for one versus the other +- **`llms.txt` Index** ([#1094](https://github.com/massgen/MassGen/pull/1094)): Curated [llmstxt.org](https://llmstxt.org)-spec index published at the docs root via Sphinx `html_extra_path` +- **`llms-full.txt` Corpus** ([#1094](https://github.com/massgen/MassGen/pull/1094)): Concatenated full-docs dump (~1 MB, 59 files), generated by a Sphinx `build-finished` hook in `conf.py` +- **Landing Page Update** ([#1094](https://github.com/massgen/MassGen/pull/1094)): "How Does MassGen Compare?" now lists all four comparisons; parent `comparisons.rst` drops "coming soon" and gains a toctree +- **README Pointer**: One-line pointer to `llms.txt` / `llms-full.txt` for AI agents/crawlers +- **`bootstrap_subagent` Single-Shot Fix** ([#1094](https://github.com/massgen/MassGen/pull/1094)): `_run_bootstrap_discriminator_step` now passes `refine=False` to `spawn_subagent` β€” without it the orchestrator's `max_new_answers_per_agent: 3` default shadowed the coordination-dict overrides and let the discriminator refine instead of running single-shot -### Previous Achievements (v0.0.3 - v0.1.85) +### Previous Achievements (v0.0.3 - v0.1.86) + +βœ… **`bootstrap_subagent` Discriminator + Codex MCP Approval Fix (v0.1.86)**: Variant B is now functional β€” the orchestrator runs an in-process LLM critic between rounds, merges critic-proposed criteria into the accumulator, and augments the next round's checklist. Codex MCP tool calls under `codex exec` now write both approval bypasses needed for non-interactive runs. βœ… **Discriminative Criteria Emergence (`criteria_mode`) (v0.1.85)**: New `orchestrator.coordination.criteria_mode` lets evaluation criteria emerge from observed gaps across rounds. `bootstrap_inline` is fully functional on all backends with checklist tool support, with `proposed_criteria` persisted, deduped, capped, and merged into the next round's effective checklist. @@ -1570,9 +1573,9 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch We welcome community contributions to achieve these goals. -### v0.1.87 Roadmap +### v0.1.88 Roadmap -Version 0.1.87 picks up the multimodal work deferred from v0.1.86 and continues refinement of the discriminative criteria pipeline: +Version 0.1.88 picks up the multimodal work deferred from v0.1.86/v0.1.87 and continues refinement of the discriminative criteria pipeline: #### Planned Features - **Image/Video Edit Capabilities** ([#959](https://github.com/massgen/MassGen/issues/959)): Image and video editing across providers with multi-turn editing workflows via continuation IDs diff --git a/README_PYPI.md b/README_PYPI.md index a1e1839ae..9bb711c0d 100644 --- a/README_PYPI.md +++ b/README_PYPI.md @@ -121,15 +121,15 @@ This project started with the "threads of thought" and "iterative refinement" id

πŸ—ΊοΈ Roadmap

-- [Recent Achievements (v0.1.86)](#recent-achievements-v0186) -- [Previous Achievements (v0.0.3 - v0.1.85)](#previous-achievements-v003---v0185) +- [Recent Achievements (v0.1.87)](#recent-achievements-v0187) +- [Previous Achievements (v0.0.3 - v0.1.86)](#previous-achievements-v003---v0186) - [Key Future Enhancements](#key-future-enhancements) - Bug Fixes & Backend Improvements - Advanced Agent Collaboration - Expanded Model, Tool & Agent Integrations - Improved Performance & Scalability - Enhanced Developer Experience -- [v0.1.87 Roadmap](#v0187-roadmap) +- [v0.1.88 Roadmap](#v0188-roadmap)
@@ -154,19 +154,20 @@ This project started with the "threads of thought" and "iterative refinement" id --- -## πŸ†• Latest Features (v0.1.86) +## πŸ†• Latest Features (v0.1.87) -**πŸŽ‰ Released: May 13, 2026** +**πŸŽ‰ Released: May 15, 2026** -**What's New in v0.1.86:** -- **🧠 `bootstrap_subagent` Discriminator** - `orchestrator.coordination.criteria_mode: bootstrap_subagent` now runs a dedicated between-rounds LLM critic that proposes criteria from the current answers, merges them into the accumulator, and augments the next round's checklist automatically. -- **🧹 Session-End Criteria Drain** - Late stdio JSONL criteria emissions are drained before final presentation so they are not stranded after the last checklist resolution pass. -- **πŸ› οΈ Codex MCP Approval Fix** - Codex workspaces now include the non-interactive approval bypasses needed for external MCP tools such as `submit_checklist`, `create_task_plan`, `new_answer`, and `read_media`. +**What's New in v0.1.87:** +- **πŸ“š Framework Comparison Pages** - Three new "MassGen vs ..." pages β€” CrewAI, LangGraph, AutoGen/AG2 β€” under `docs/source/reference/comparisons/`, positioning MassGen's parallel-refinement-with-voting model against each framework's coordination shape. +- **πŸ€– `llms.txt` for AI Agents** - A curated [`llms.txt`](https://docs.massgen.ai/en/latest/llms.txt) index plus a full-corpus [`llms-full.txt`](https://docs.massgen.ai/en/latest/llms-full.txt) dump (per [llmstxt.org spec](https://llmstxt.org)), so AI agents and crawlers can discover MassGen's docs cleanly. +- **πŸ”§ `bootstrap_subagent` Single-Shot Fix** - `Orchestrator._run_bootstrap_discriminator_step` now passes `refine=False` to `spawn_subagent` β€” the canonical knob `SubagentManager` actually respects at the orchestrator level. -**Try v0.1.86 Features:** +**Try v0.1.87 Features:** ```bash -pip install massgen==0.1.86 -uv run massgen --config massgen/configs/coordination/bootstrap_subagent_criteria.yaml "Create an SVG of an AI agent coding." +pip install massgen==0.1.87 +# Read the framework comparisons: +# https://docs.massgen.ai/en/latest/reference/comparisons.html ``` β†’ [See full release history and examples](massgen/configs/README.md#release-history--examples) @@ -1240,19 +1241,21 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch ⚠️ **Early Stage Notice:** As MassGen is in active development, please expect upcoming breaking architecture changes as we continue to refine and improve the system. -### Recent Achievements (v0.1.86) +### Recent Achievements (v0.1.87) -**πŸŽ‰ Released: May 13, 2026** +**πŸŽ‰ Released: May 15, 2026** -#### `bootstrap_subagent` Discriminator + Codex MCP Approval Fix -- **`bootstrap_subagent` Variant (fully functional)**: A dedicated between-rounds LLM critic now reads the task and each agent's latest answer, emits `proposed_criteria` as JSON, and merges them into `bootstrap_criteria_accumulator.json` for the next round's checklist -- **Answer-Snapshot Gate**: The discriminator runs once per unique answer snapshot, avoiding repeated critiques when the answer set has not changed -- **Session-End Drain**: Late stdio criteria emissions are captured before final presentation -- **Codex MCP Approval Fix**: Non-interactive Codex workspaces now write both `approval_policy = "never"` and per-MCP-server `default_tools_approval_mode = "approve"`, preventing external MCP tools from being cancelled immediately under `codex exec` -- **Example Configs**: `massgen/configs/coordination/bootstrap_subagent_criteria.yaml` for the critic-driven path and `bootstrap_inline_criteria.yaml` for agent-proposed criteria -- **Tests**: Bootstrap criteria coverage expanded to 35 tests, plus Codex workspace approval policy coverage across approval modes +#### Documentation: Framework Comparisons & `llms.txt` +- **Framework Comparison Pages** ([#1094](https://github.com/massgen/MassGen/pull/1094)): Three new "MassGen vs ..." pages β€” `crewai.rst`, `langgraph.rst`, `autogen.rst` β€” under `docs/source/reference/comparisons/`, positioning MassGen against each framework's coordination shape and listing when to reach for one versus the other +- **`llms.txt` Index** ([#1094](https://github.com/massgen/MassGen/pull/1094)): Curated [llmstxt.org](https://llmstxt.org)-spec index published at the docs root via Sphinx `html_extra_path` +- **`llms-full.txt` Corpus** ([#1094](https://github.com/massgen/MassGen/pull/1094)): Concatenated full-docs dump (~1 MB, 59 files), generated by a Sphinx `build-finished` hook in `conf.py` +- **Landing Page Update** ([#1094](https://github.com/massgen/MassGen/pull/1094)): "How Does MassGen Compare?" now lists all four comparisons; parent `comparisons.rst` drops "coming soon" and gains a toctree +- **README Pointer**: One-line pointer to `llms.txt` / `llms-full.txt` for AI agents/crawlers +- **`bootstrap_subagent` Single-Shot Fix** ([#1094](https://github.com/massgen/MassGen/pull/1094)): `_run_bootstrap_discriminator_step` now passes `refine=False` to `spawn_subagent` β€” without it the orchestrator's `max_new_answers_per_agent: 3` default shadowed the coordination-dict overrides and let the discriminator refine instead of running single-shot -### Previous Achievements (v0.0.3 - v0.1.85) +### Previous Achievements (v0.0.3 - v0.1.86) + +βœ… **`bootstrap_subagent` Discriminator + Codex MCP Approval Fix (v0.1.86)**: Variant B is now functional β€” the orchestrator runs an in-process LLM critic between rounds, merges critic-proposed criteria into the accumulator, and augments the next round's checklist. Codex MCP tool calls under `codex exec` now write both approval bypasses needed for non-interactive runs. βœ… **Discriminative Criteria Emergence (`criteria_mode`) (v0.1.85)**: New `orchestrator.coordination.criteria_mode` lets evaluation criteria emerge from observed gaps across rounds. `bootstrap_inline` is fully functional on all backends with checklist tool support, with `proposed_criteria` persisted, deduped, capped, and merged into the next round's effective checklist. @@ -1569,9 +1572,9 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch We welcome community contributions to achieve these goals. -### v0.1.87 Roadmap +### v0.1.88 Roadmap -Version 0.1.87 picks up the multimodal work deferred from v0.1.86 and continues refinement of the discriminative criteria pipeline: +Version 0.1.88 picks up the multimodal work deferred from v0.1.86/v0.1.87 and continues refinement of the discriminative criteria pipeline: #### Planned Features - **Image/Video Edit Capabilities** ([#959](https://github.com/massgen/MassGen/issues/959)): Image and video editing across providers with multi-turn editing workflows via continuation IDs diff --git a/ROADMAP.md b/ROADMAP.md index 2e88cd1c2..de36aa680 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -1,10 +1,10 @@ # MassGen Roadmap -**Current Version:** v0.1.86 +**Current Version:** v0.1.87 **Release Schedule:** Mondays, Wednesdays, Fridays @ 9am PT -**Last Updated:** May 13, 2026 +**Last Updated:** May 15, 2026 This roadmap outlines MassGen's development priorities for upcoming releases. Each release focuses on specific capabilities with real-world use cases. @@ -42,13 +42,30 @@ Want to contribute or collaborate on a specific track? Reach out to the track ow | Release | Target | Feature | Owner | Use Case | |---------|--------|---------|-------|----------| -| **v0.1.87** | 05/15/26 | Image/Video Edit Capabilities | @ncrispino | Check and support img/video editing capabilities β€” deferred from v0.1.86 ([#959](https://github.com/massgen/MassGen/issues/959)) | +| **v0.1.88** | 05/18/26 | Image/Video Edit Capabilities | @ncrispino | Check and support img/video editing capabilities β€” deferred from v0.1.86/v0.1.87 ([#959](https://github.com/massgen/MassGen/issues/959)) | | | | Discriminative Criteria Refinements | @ncrispino | Selection, ranking, and retirement of stale criteria for long-running refinement loops | *All releases ship on MWF @ 9am PT when ready* --- +## βœ… v0.1.87 - Documentation: Framework Comparisons & `llms.txt` (Completed) + +**Released:** May 15, 2026 | PRs: [#1094](https://github.com/massgen/MassGen/pull/1094) + +### Features +- **Framework Comparison Pages**: Three new "MassGen vs ..." pages β€” `crewai.rst`, `langgraph.rst`, `autogen.rst` β€” under `docs/source/reference/comparisons/`, positioning MassGen against each framework's coordination shape +- **`llms.txt` Index**: Curated [llmstxt.org](https://llmstxt.org)-spec index published at the docs site root via Sphinx `html_extra_path` +- **`llms-full.txt` Corpus**: Concatenated full-docs dump (~1 MB, 59 files), generated by a Sphinx `build-finished` hook in `conf.py` +- **Docs Landing Page Update**: "How Does MassGen Compare?" now lists all four comparisons (LLM Council + the three new ones); parent `comparisons.rst` drops "coming soon" and gains a toctree +- **README Pointers**: One-line pointer in `README.md` / `README_PYPI.md` directing AI agents to `llms.txt` / `llms-full.txt` +- **`bootstrap_subagent` Single-Shot Fix**: `Orchestrator._run_bootstrap_discriminator_step` passes `refine=False` to `spawn_subagent` β€” the canonical knob `SubagentManager` respects at the orchestrator level (the orchestrator's `max_new_answers_per_agent: 3` default was shadowing coordination-dict overrides) + +### Notes +- Image/Video Edit Capabilities ([#959](https://github.com/massgen/MassGen/issues/959)) and Discriminative Criteria Refinements deferred to v0.1.88. + +--- + ## βœ… v0.1.86 - `bootstrap_subagent` Discriminator + Codex MCP Approval Fix (Completed) **Released:** May 13, 2026 | PRs: [#1090](https://github.com/massgen/MassGen/pull/1090) @@ -220,11 +237,11 @@ Want to contribute or collaborate on a specific track? Reach out to the track ow --- -## πŸ“‹ v0.1.87 - Image/Video Edit & Criteria Refinements +## πŸ“‹ v0.1.88 - Image/Video Edit & Criteria Refinements (Deferred from v0.1.86/v0.1.87) ### Features -**1. Image/Video Edit Capabilities (Deferred from v0.1.86)** (@ncrispino) +**1. Image/Video Edit Capabilities** (@ncrispino) - Issue: [#959](https://github.com/massgen/MassGen/issues/959) - Investigate and support image and video editing capabilities across providers - Multi-turn editing workflows with continuation IDs diff --git a/ROADMAP_v0.1.88.md b/ROADMAP_v0.1.88.md index d7ac1c66f..8ecfa07f7 100644 --- a/ROADMAP_v0.1.88.md +++ b/ROADMAP_v0.1.88.md @@ -1,14 +1,14 @@ -# MassGen v0.1.87 Roadmap +# MassGen v0.1.88 Roadmap -**Target Release:** May 15, 2026 +**Target Release:** May 18, 2026 ## Overview -Version 0.1.87 picks up the image/video edit work deferred from v0.1.86 and continues refinement of the discriminative criteria pipeline after `bootstrap_inline` and `bootstrap_subagent` became functional. +Version 0.1.88 picks up the image/video edit work deferred from v0.1.86/v0.1.87 and continues refinement of the discriminative criteria pipeline. --- -## Feature: Image/Video Edit Capabilities (Deferred from v0.1.86) +## Feature: Image/Video Edit Capabilities (Deferred from v0.1.86/v0.1.87) **Issue:** [#959](https://github.com/massgen/MassGen/issues/959) **Owner:** @ncrispino @@ -49,6 +49,7 @@ Version 0.1.87 picks up the image/video edit work deferred from v0.1.86 and cont ## Related Tracks +- **v0.1.87**: Documentation β€” framework comparison pages (CrewAI, LangGraph, AutoGen) and `llms.txt` index ([#1094](https://github.com/massgen/MassGen/pull/1094)); plus a one-line `refine=False` fix for the `bootstrap_subagent` discriminator - **v0.1.86**: Functional `bootstrap_subagent` discriminator and Codex MCP approval fix - **v0.1.85**: Discriminative Criteria Emergence (`criteria_mode`) β€” `bootstrap_inline` and accumulator infrastructure diff --git a/docs/announcements/archive/v0.1.86.md b/docs/announcements/archive/v0.1.86.md new file mode 100644 index 000000000..809e6254a --- /dev/null +++ b/docs/announcements/archive/v0.1.86.md @@ -0,0 +1,68 @@ +# MassGen v0.1.86 Release Announcement + + + +## Release Summary + +We're excited to release MassGen v0.1.86 β€” `bootstrap_subagent` Discriminator + Codex MCP Approval Fix! πŸš€ The critic-driven criteria path is now functional: MassGen can run an in-process LLM discriminator between rounds, propose stronger evaluation criteria from the current answers, merge them into the accumulator, and augment the next round's checklist automatically. + +This release also fixes Codex MCP tool calls under `codex exec` so checklist/workflow tools no longer fail immediately with "user cancelled MCP tool call" in non-interactive runs. + +## Install + +```bash +pip install massgen==0.1.86 +``` + +## Links + +- **Release notes:** https://github.com/massgen/MassGen/releases/tag/v0.1.86 +- **X post:** [TO BE ADDED AFTER POSTING] +- **LinkedIn post:** [TO BE ADDED AFTER POSTING] + +--- + +## Full Announcement (for LinkedIn) + +Copy everything below this line, then append content from `feature-highlights.md`: + +--- + +We're excited to release MassGen v0.1.86 β€” `bootstrap_subagent` Discriminator + Codex MCP Approval Fix! πŸš€ The critic-driven criteria path is now functional: MassGen can run an in-process LLM discriminator between rounds, propose stronger evaluation criteria from the current answers, merge them into the accumulator, and augment the next round's checklist automatically. + +**Key Improvements:** + +🧠 **`bootstrap_subagent` is now functional** β€” Dedicated critic-driven criteria emergence: +- `criteria_mode: bootstrap_subagent` runs a between-rounds LLM critic via `SubagentManager` +- The critic reads the task and each agent's latest answer, then emits `proposed_criteria` as JSON +- The orchestrator merges those criteria into `bootstrap_criteria_accumulator.json` +- The next round's checklist is augmented without asking answering agents to propose criteria themselves +- The discriminator runs once per unique answer snapshot, avoiding repeated critiques of unchanged rounds + +🧹 **Session-end drain** β€” Late stdio emissions are captured before final presentation, so criteria proposed near the end of a run are not stranded after the final checklist resolution pass. + +πŸ› οΈ **Codex MCP approval fix** β€” `codex exec` workspaces now get both approval bypasses needed for non-interactive external MCP calls: +- Top-level `approval_policy = "never"` +- Per-MCP-server `default_tools_approval_mode = "approve"` + +πŸ§ͺ **Tests**: +- Expanded bootstrap criteria coverage to 35 tests +- Added Codex workspace approval policy coverage for all approval modes + +**Getting Started:** + +```bash +pip install massgen==0.1.86 +uv run massgen --config massgen/configs/coordination/bootstrap_subagent_criteria.yaml "Create an SVG of an AI agent coding." +``` + +Inspect the emerging criteria at `.massgen/massgen_logs//bootstrap_criteria_accumulator.json`. + +Release notes: https://github.com/massgen/MassGen/releases/tag/v0.1.86 + +Feature highlights: + + diff --git a/docs/announcements/current-release.md b/docs/announcements/current-release.md index 809e6254a..bd8d34e5f 100644 --- a/docs/announcements/current-release.md +++ b/docs/announcements/current-release.md @@ -1,4 +1,4 @@ -# MassGen v0.1.86 Release Announcement +# MassGen v0.1.87 Release Announcement