Skip to content

feat: v0.1.92#1107

Merged
Henry-811 merged 44 commits into
mainfrom
dev/v0.1.92
Jun 1, 2026
Merged

feat: v0.1.92#1107
Henry-811 merged 44 commits into
mainfrom
dev/v0.1.92

Conversation

@Henry-811
Copy link
Copy Markdown
Collaborator

@Henry-811 Henry-811 commented May 28, 2026

PR Title Format

Your PR title must follow the format: <type>: <brief description>

Valid types:

  • fix: - Bug fixes
  • feat: - New features
  • breaking: - Breaking changes
  • docs: - Documentation updates
  • refactor: - Code refactoring
  • test: - Test additions/modifications
  • chore: - Maintenance tasks
  • perf: - Performance improvements
  • style: - Code style changes
  • ci: - CI/CD configuration changes

Examples:

  • fix: resolve memory leak in data processing
  • feat: add export to CSV functionality
  • breaking: change API response format
  • docs: update installation guide

Description

Brief description of the changes in this PR

Type of change

  • Bug fix (fix:) - Non-breaking change which fixes an issue
  • New feature (feat:) - Non-breaking change which adds functionality
  • Breaking change (breaking:) - Fix or feature that would cause existing functionality to not work as expected
  • Documentation (docs:) - Documentation updates
  • Code refactoring (refactor:) - Code changes that neither fix a bug nor add a feature
  • Tests (test:) - Adding missing tests or correcting existing tests
  • Chore (chore:) - Maintenance tasks, dependency updates, etc.
  • Performance improvement (perf:) - Code changes that improve performance
  • Code style (style:) - Changes that do not affect the meaning of the code (formatting, missing semi-colons, etc.)
  • CI/CD (ci:) - Changes to CI/CD configuration files and scripts

Checklist

  • I have run pre-commit on my changed files and all checks pass
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Pre-commit status

# Paste the output of running pre-commit on your changed files:
# uv run pre-commit install
# git diff --name-only HEAD~1 | xargs uv run pre-commit run --files # for last commit
# git diff --name-only origin/<base branch>...HEAD | xargs uv run pre-commit run --files # for all commits in PR
# git add <your file> # if any fixes were applied
# git commit -m "chore: apply pre-commit fixes"
# git push origin <branch-name>

How to Test

Add test method for this PR.

Test CLI Command

Write down the test bash command. If there is pre-requests, please emphasize.

Expected Results

Description/screenshots of expected results.

Additional context

Add any other context about the PR here.

Summary by CodeRabbit

  • New Features

    • Improved terminal UI: adaptive refresh rates, emoji/icon fallbacks, provider/model normalization for the welcome screen, and a background stall-watchdog for debugging.
    • Enhanced final-presentation & workspace modal flows with safer snapshot/save behaviors.
    • New example config and MCP server entry for a Parallel Search connector.
  • Documentation

    • Added a detailed refactor roadmap and updated PR draft describing collaborator extraction and migration plan.
  • Tests

    • Added characterization, terminal-display, and path-normalization tests to lock observable behavior.
  • Chores

    • Version bumped to 0.1.92

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Review Change Stack

Note

Reviews paused

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Bumps package version to 0.1.92, adds PR draft and roadmap, extracts Textual display helpers, moves many orchestrator responsibilities into a new collaborators package, and updates/adds tests to follow the new collaborator wiring.

Changes

Orchestrator Refactor & Frontend Extraction

Layer / File(s) Summary
Version constant update
massgen/__init__.py
Module-level __version__ incremented to 0.1.92.
PR draft and roadmap
PR_DRAFT.md, docs/dev_notes/orchestrator_refactor_roadmap.md
Replaced draft with collaborator-extraction overview and added stepwise refactor roadmap.
Textual display helpers
massgen/frontend/displays/_textual_provider_model.py, _textual_terminal_capabilities.py, _textual_widget_debug.py, textual_terminal_display.py
Extracted provider/model normalization, terminal capability probing, and widget-debug helpers; TextualTerminalDisplay delegates to these modules.
Collaborator package init
massgen/orchestrator_collaborators/__init__.py
Re-exports new collaborator classes and defines package public API.
Orchestrator collaborators (many)
massgen/orchestrator_collaborators/*
Adds ~49 collaborator modules (e.g., SnapshotManager, ChecklistGateManager, RoundEvaluatorRunner, SubagentLifecycleCoordinator, FinalPresentationRunner, MetricsReporter, MidStreamInjectionHookInstaller, PlanningToolInjector, etc.) extracting discrete orchestration responsibilities from massgen.orchestrator.
Frontend & TUI tests
massgen/tests/frontend/test_textual_terminal_display_characterization.py
New characterization tests pin TUI public surface and capability probes.
Orchestrator tests and wiring updates
massgen/tests/*
Updated and added many unit/integration tests to instantiate or monkeypatch the new collaborators and to verify preserved public contracts and path-normalization behavior.
Configs & MCP registry
massgen/configs/tools/web-search/parallel_search_example.yaml, massgen/mcp_tools/server_registry.py
Added example Parallel Search config and registered a new parallel_search MCP server.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • a5507203
  • ncrispino
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev/v0.1.92

NormallyGaussian and others added 28 commits May 28, 2026 16:43
…onfig

Adds Parallel Web Systems' hosted Search MCP server alongside the existing
Context7, Brave, and Exa entries.

- massgen/mcp_tools/server_registry.py: new `parallel_search` entry using
  `type: streamable-http` and `url: https://search.parallel.ai/mcp`. The
  server accepts unauthenticated requests by default, so the entry is
  registered with `requires_api_key: False` and an
  `optional_api_key_env_var: PARALLEL_API_KEY` for users who want higher
  rate limits via the `Authorization: Bearer ...` header.
- massgen/configs/tools/web-search/parallel_search_example.yaml: full
  example config modelled on `exa_search_example.yaml`, with a system
  message that explains Parallel's `objective + search_queries` pattern
  and `web_search`/`web_fetch` tools, plus a commented-out `headers`
  block for opting into the API key.
- Module docstring updated to list `parallel_search`.

Docs:
- https://docs.parallel.ai/integrations/mcp/search-mcp
- https://docs.parallel.ai/api-reference/search/search
- https://docs.parallel.ai/search/best-practices

Tested with claude-sonnet-4-6 backend via the new example config.
Addresses CodeRabbit findings (Major) on PR #1108:

- server_registry.py: registry description no longer encodes tool
  invocation shape (web_search(...) / web_fetch(URL ->...)). Reworded as
  a natural-language description of what the server provides; the actual
  call schemas come from the MCP server itself.
- parallel_search_example.yaml: system message dropped the
  web_search(objective, search_queries[, ...]) and web_fetch(urls[,
  objective]) headers, and now leans on the MCP-supplied schemas for
  argument names. Usage guidance (objective + 2-3 keyword queries, mode
  preset, citation discipline) is preserved in plain prose.

Tools remain referenced by their conceptual function ("web search",
"URL extraction") rather than by hardcoded call signatures, so the
prompt stays accurate if the underlying tool schema evolves.
…p servers

Codex review identified that the new parallel_search entry advertises
PARALLEL_API_KEY support but get_server_config() only special-cased
context7's --api-key CLI flag. As written, parallel_search would always
connect anonymously even when PARALLEL_API_KEY was set, defeating the
documented higher-rate-limit path.

Generalize get_server_config() so that when a registry entry declares
optional_api_key_env_var and the env var is set:
- stdio context7 keeps its existing --api-key flag injection (unchanged
  behavior; explicit branch on server_name)
- streamable-http entries get a default Authorization: Bearer ${key}
  header (using setdefault so a registry-declared header takes
  precedence)
- everything else is unchanged

Manually verified:
- PARALLEL_API_KEY unset -> no headers added to parallel_search config
- PARALLEL_API_KEY set   -> headers.Authorization = "Bearer <key>"
- apply_api_key_logic=False -> no injection
- context7 still injects --api-key when CONTEXT7_API_KEY is set
Codex review on the prior commit flagged a P1 behavioral regression:
marking parallel_search as `requires_api_key: False` put it in the
always-on auto-discovery bucket alongside context7, so every config
with `auto_discover_custom_tools: true` (e.g. subagent_checklist.yaml,
log_analysis.yaml, many others) silently gained an outbound web-search
tool — breaking deterministic/offline assumptions of existing workflows.

Realign with the Brave/Exa convention:
- `requires_api_key: True`, `api_key_env_var: "PARALLEL_API_KEY"`
- Add a baked `headers: {"Authorization": "Bearer ${PARALLEL_API_KEY}"}`
  block; the MCP transport's existing env-var substitution unpacks it at
  connection time (mirrors how stdio servers consume `env:` like
  `BRAVE_API_KEY: "${BRAVE_API_KEY}"`).
- Revert the temporary streamable-http bearer-injection branch added
  to `get_server_config()` last commit — no longer needed and made the
  registry entry's `optional_api_key_env_var` look like it was opt-in
  when it actually gated nothing.
- Description/notes reworded: anonymous use is still possible by adding
  the server manually to mcp_servers without the headers block (the
  example YAML continues to show that path).

Manually verified:
- PARALLEL_API_KEY unset -> auto-discovery is just [context7]
  (no regression vs. main)
- PARALLEL_API_KEY set   -> auto-discovery is [context7, parallel_search]
  with the headers template ready for MCP substitution
- Missing-key reporting now lists parallel_search alongside
  brave_search and exa_search when their env vars are unset
… compat

Codex review caught that the example was wired to backend.type: 'claude_code'
but used the streamable-http 'headers:' field. ClaudeCodeBackend forwards
MCP configs to ClaudeAgentOptions, where the equivalent auth field is a
top-level 'authorization:' string (per
massgen/backend/docs/MCP_IMPLEMENTATION_CLAUDE_BACKEND.md), not a nested
'headers' map — so users uncommenting the API-key block would have silently
kept anonymous mode.

Switch the example to backend.type: 'claude' (generic massgen MCP
transport), matching the existing streamable_http_test configs whose
'headers:' shape lines up with the registry entry. Added an inline note
pointing claude_code users at the 'authorization:' alternative.
…display

Splits two god-class files into composable collaborators with zero breaking
changes, gated by characterization tests written first.

- massgen/orchestrator.py: 21,599 → 10,741 lines (−10,858, 50% reduction)
- 35 collaborator modules in massgen/orchestrator_collaborators/
- massgen/frontend/displays/textual_terminal_display.py: 14,580 → 14,287
- 3 sibling display modules (_textual_*.py)

The Orchestrator class keeps thin delegator methods so every existing
internal/external call site works unchanged. All public contracts
preserved (Orchestrator, AgentState, WORKFLOW_TOOL_NAMES, create_orchestrator,
and all display public names). orchestrator.py stays a single .py module
(not a package) to preserve Path(__file__)-based skill resolution.

Two characterization test files pin the public contract + extraction
seams. Six tracked test files updated with seam-rewiring (mock-stub or
monkeypatch repointing) — no assertion weakening.

Full plan + lessons learned in docs/dev_notes/orchestrator_refactor_roadmap.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nalyzer

Two more cohesive clusters extracted from orchestrator.py:
- EvaluatorResultExtractor: 4 PUBLIC @staticmethod methods
  (extract_all_evaluator_answers, extract_evaluator_workspace_paths,
   format_multi_evaluator_result_block, build_task_plan_from_evaluator_verdict)
- QuestionIrreversibilityAnalyzer: 1 async method (224 lines)

orchestrator.py: 10,381 → 9,966 lines (37 total collaborators, 54% reduction
from the original 21,599-line god class). Both collaborators preserve all
delegator signatures (including @staticmethod for unbound-call tests in
test_round_evaluator_verdict.py).

Tests: test_round_evaluator_verdict.py (58/58), test_intelligent_planning_mode.py
(14/14), test_orchestrator_characterization.py (37/37) all green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3 methods: _generate_and_inject_evaluation_criteria (async 101 lines),
_save_evaluation_criteria_to_log, get_generated_evaluation_criteria
(PUBLIC, called by cli.py).

orchestrator.py: 9,966 → 9,860 lines (38 collaborators, 54% cumulative
reduction from 21,599). Class named EvaluationCriteriaGeneratorCollaborator
to avoid clash with the existing massgen.evaluation_criteria_generator
module that it wraps.

State mutations route through orchestrator back-ref so ChecklistGateManager
and CriteriaEvolutionRunner (both already extracted) see live state.
get_log_session_dir resolved lazily via massgen.orchestrator namespace
for test patchability.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2 cohesive methods: _load_rate_limits_from_config (init-time),
_apply_agent_startup_rate_limit (async, per-agent cooldown enforcement).

orchestrator.py: 9,860 → 9,698 lines (39 collaborators, 55% cumulative
reduction from 21,599). _rate_limits + _agent_startup_times state remains
on Orchestrator and is read/mutated via back-ref so monkeypatching still
works.

Tests: characterization (37/37), test_rate_limiter (all green).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3 methods: _is_changedoc_enabled (heavily monkeypatched by 10+ tests via
instance-attr assignment — kept as thin delegator), _gather_agent_changedocs,
_sync_applied_context_files_into_final_artifacts (110-line post-presentation
file-overlay routine).

orchestrator.py: 9,698 → 9,599 lines (40 collaborators, 56% cumulative
reduction from 21,599). gather_agent_changedocs routes through
self._orchestrator._is_changedoc_enabled() so the test monkeypatch idiom
(orch._is_changedoc_enabled = lambda: ...) keeps shadowing at the instance
level. get_log_session_dir resolved lazily via massgen.orchestrator namespace.

Tests: characterization + answer_path_normalization + bootstrap_criteria +
discriminative_pruning + checklist_label_refresh_ordering (105 total green).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wraps the pre-collab prompt-improvement subagent flow
(_improve_and_inject_prompt, async, 77 lines).

orchestrator.py: 9,599 → 9,531 lines (41 collaborators, 56% cumulative
reduction from 21,599). Class name -Collaborator suffixed to avoid clash
with massgen.prompt_improver.PromptImprover that it wraps. _prompt_improved
and current_task mutations route through orchestrator back-ref.

Tests: characterization + test_prompt_improver + test_orchestrator_timeout_selection (53 total).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
_merge_agent_memories_to_winner (95 lines, post-coordination memory consolidation
into the winning agent's workspace) folded into the existing
WorkspaceLifecycleManager collaborator since it naturally fits there alongside
clear_agent_workspaces / archive_agent_memories / namespace_verification_memory_files.

orchestrator.py: 9,531 → 9,440 lines (still 41 collaborators, 56% cumulative
reduction from 21,599). Method stays on Orchestrator as a thin delegator.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5 small cohesive helpers used by 6 existing collaborators
(EvaluationCriteriaGenerator, PersonaInjector, PromptImproverCollaborator,
CriteriaEvolutionRunner, RoundEvaluatorRunner, BootstrapCriteriaEngine):

- _build_parent_agent_configs
- _get_parent_workspace
- _get_log_directory (@staticmethod)
- _make_precollab_started_callback
- _notify_precollab_completed (monkeypatched in test_evolving_criteria.py)

orchestrator.py: 9,440 → 9,385 lines (42 collaborators, 57% cumulative
reduction from 21,599). All 5 methods kept as thin delegators on
Orchestrator preserving signatures including @staticmethod. Lazy
namespace lookup via `from massgen import orchestrator as _orch_mod`
preserves test patches at the massgen.orchestrator namespace.

Tests: characterization (37) all green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…alyzer

_format_planning_mode_ui (81 lines, pure formatter — no self references)
folded into the existing QuestionIrreversibilityAnalyzer collaborator as a
@staticmethod since QIA was already its only caller. Method stays on
Orchestrator as a thin delegator.

orchestrator.py: 9,385 → 9,316 lines (still 42 collaborators, 57% cumulative).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reflects current state after all follow-up extractions:
PostEvaluationRunner, FinalPresentationRunner, PeerAnswerVisibilityTracker,
ChecklistGateManager, MidStreamInjectionHookInstaller (partial),
TraceAnalyzerRunner, CriteriaEvolutionRunner, SnapshotManager,
RoundEvaluatorRunner, MetricsReporter, PersonaInjector, PreviousLogRestorer,
EvaluatorResultExtractor, QuestionIrreversibilityAnalyzer,
EvaluationCriteriaGeneratorCollaborator, RateLimitController,
ChangedocCoordinator, PromptImproverCollaborator, PreCollabHelpers, and
folding memory-merger + format_planning_mode_ui into existing collaborators.

orchestrator.py: 21,599 → 9,319 lines (57% reduction).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…plan)

Promoted the nested _setup_agent_orchestration function inside __init__ to
a regular method on the new AgentOrchestrationSetup collaborator. Closure
vars (skills_directory, massgen_skills, load_previous_session_skills) become
explicit method args. The two call sites in __init__ (parallel
ThreadPoolExecutor + sequential fallback) call through a tiny local wrapper
that forwards to self._agent_orchestration_setup.setup_agent_orchestration().

Also folded ensure_workspace_symlinks (PUBLIC, called externally) into the
same collaborator. Both kept as thin delegators on Orchestrator.

orchestrator.py: 9,319 → 9,228 lines (43 collaborators, 57% cumulative
reduction from 21,599). This was the last paused step from the original
29-step plan; only the MidStreamInjectionHookInstaller hook-install
methods (12 remaining, behavior-changing) and the streaming/coordination
cores remain unrefactored.

Tests: characterization + test_orchestrator_skills_injection all green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…jector

_rewrite_subagent_mcp_config_files (56 lines, round-2+ counterpart to
_create_subagent_mcp_config which is already on SubagentToolInjector) folded
into the existing collaborator. Method kept on Orchestrator as a thin
delegator.

orchestrator.py: 9,228 → 9,180 lines (still 43 collaborators).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nRunner

53-line @staticmethod (criteria-evolution result parser, prefer workspace
file then fall back to fenced JSON in answer text) folded into the existing
CriteriaEvolutionRunner collaborator that was already its only caller.
Method stays on Orchestrator as a thin @staticmethod delegator.

orchestrator.py: 9,180 → 9,132 lines.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Any is already imported at module top; the redundant local import
from the rewrite_subagent_mcp_config_files fold-in tripped ruff F401.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds AgentOrchestrationSetup to the collaborator list (the previously-paused
inline-in-__init__ extraction) and reflects post-fold-in line count.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2 stateless Docker-failure diagnostic helpers (~100 lines):
- save_docker_logs_on_mcp_failure (capture container state + logs to session
  log dir when a Docker MCP server disconnects)
- get_docker_health (read container health metrics for reliability tracing)

Both tolerant of non-Docker mode (no-op when filesystem_manager.docker_manager
is absent). Lazy get_log_session_dir lookup via massgen.orchestrator namespace
for test patchability.

orchestrator.py: 9,133 → 9,057 lines (44 collaborators, 58% cumulative
reduction from 21,599).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 cohesive step-mode methods (~160 lines):
- finalize_step_mode (PUBLIC, called by cli.py + tested unbound)
- _continuous_status_updates (async)
- _resolve_step_mode_workspace (heavily tested directly)
- _resolve_step_mode_stale_paths

orchestrator.py: 9,057 → 8,923 lines (45 collaborators, 58.7% cumulative
reduction from 21,599). finalize_step_mode is @staticmethod on the
collaborator; Orchestrator delegator invokes as
StepModeHandler.finalize_step_mode(self, log_dir) to preserve the
Orchestrator.finalize_step_mode(MagicMock(), log_dir) unbound-call test
pattern. shutil routed via massgen.orchestrator namespace for test
patchability (test_answer_path_normalization patches massgen.orchestrator.shutil).

Tests: characterization + test_step_mode + test_answer_path_normalization green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2 chat-flow helpers used by the public chat() entry point:
- handle_followup (async generator, follow-up response with planning-mode
  switching)
- build_conversation_context (@staticmethod, message-list → context dict)

orchestrator.py: 8,923 → 8,847 lines (46 collaborators, 59% cumulative
reduction from 21,599). Lazy lookups via massgen.orchestrator namespace
for log_orchestrator_activity / log_stream_chunk preserve test patches.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds StepModeHandler, DockerDiagnostics, ChatFollowupHandler to the
collaborator list.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3 cohesive tool-message helpers for the coordination/streaming loop (~116 lines):
- is_tool_related_content (@staticmethod, content-classification guard)
- create_tool_error_messages (@staticmethod, per-tool-call error builder)
- split_disallowed_workflow_tool_calls (workflow-tool partitioner)

orchestrator.py: 8,847 → 8,759 lines (47 collaborators, 59.4% cumulative
reduction from 21,599). split_disallowed lazy-imports WORKFLOW_TOOL_NAMES
via massgen.orchestrator namespace so tests patching it still fire.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
_seed_plan_execution_workspaces (59 lines, plan/spec artifact seeding) and
_copy_workspace_contents (33 lines, @staticmethod, top-level contents copy)
folded into the existing WorkspaceLifecycleManager. Both stay on Orchestrator
as thin delegators.

orchestrator.py: 8,759 → 8,682 lines (still 47 collaborators, 59.8%
cumulative reduction from 21,599).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@staticmethod helper (36 lines) that partitions a combined spawn result
into evaluator vs trace halves — folded into TraceAnalyzerRunner since
the splitting is specifically about extracting the trace half. Stays on
Orchestrator as @staticmethod delegator so Orchestrator._split_combined_spawn_result(...)
unbound test calls in test_orchestrator_unit.py keep working.

orchestrator.py: 8,682 → 8,655 lines.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
_load_essential_files_manifests (35 lines, reads per-agent
essential_files_manifest.json from snapshot storage) → new
EssentialFilesHelper collaborator. Method kept on Orchestrator as a thin
delegator. Test stubs in test_essential_files_manifest.py updated to wire
a real EssentialFilesHelper into the MagicMock orchestrator so the bound
method routes to real logic (same pattern as the WorkspaceLifecycleManager
fix earlier in this session).

orchestrator.py: 8,655 → 8,633 lines (48 collaborators, 60% cumulative
reduction from 21,599).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ncrispino and others added 9 commits June 1, 2026 06:21
A prior fold-in (split_combined_spawn_result extraction) inadvertently
ate the @staticmethod decorator from the immediately-following
_format_trace_analyzer_for_memory_static method. The docstring still
claimed @staticmethod but the decorator was missing, so unbound
Orchestrator._format_trace_analyzer_for_memory_static(arg1, arg2) calls
in test_auto_trace_analysis.py failed with "takes 2 positional arguments
but 3 were given". Restored the decorator.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds ToolMessageHelpers + EssentialFilesHelper to the collaborator list.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3 methods only called by CriteriaEvolutionRunner via orch back-ref folded
into the collaborator they belong to:
- _collect_evolution_input_data (22 lines, trace + history aggregation)
- _format_score_history_table (19 lines, @staticmethod)
- _format_criteria_for_prompt (15 lines, @staticmethod)

All 3 stay on Orchestrator as thin delegators (@staticmethod preserved for
the latter two so any unbound-call tests keep working).

orchestrator.py: 8,634 → 8,594 lines (still 48 collaborators, 60.2%
cumulative reduction from 21,599).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2 streaming-buffer helpers for the enforcement-retry path:
- get_buffer_content (@staticmethod, capture preview + total chars)
- truncate_enforcement_buffer_content (bound retry-buffer length)

orchestrator.py: 8,596 → 8,574 lines (49 collaborators, 60.3% cumulative
reduction from 21,599).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds EnforcementBufferHelper.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AgentOrchestrationSetup was actually extracted in this PR; only the
MidStreamInjectionHookInstaller hook-install methods and the streaming/
coordination cores remain out of scope.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
17-line gate (auto_trace_analysis config + round-2+ + no-in-flight-task) folded
into the existing TraceAnalyzerRunner. Stays on Orchestrator as thin delegator.

orchestrator.py: 8,574 → 8,561 lines (still 49 collaborators).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
With `from __future__ import annotations` the dict[str, Path | None]
annotation doesn't need a runtime import.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
massgen/orchestrator_collaborators/final_result_reporter.py (1)

37-417: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use Google-style docstrings for the new result/reporting methods.

This module adds several externally consumed helpers, but the new methods still use summary-only docstrings instead of the required Args/Returns sections.

As per coding guidelines, "**/*.py: For new or changed functions, include Google-style docstrings"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/final_result_reporter.py` around lines 37
- 417, Update the summary-only docstrings in the new reporting methods to
Google-style docstrings including Args and Returns (and Raises where
applicable); specifically edit resolve_final_workspace_path, get_vote_results,
determine_final_agent_from_states, get_final_result, get_partial_result,
ensure_final_directory_on_shutdown, get_all_agent_workspaces,
get_coordination_result, and get_status to include parameter descriptions (types
and meaning), what the function returns (type and structure), and any exceptions
or side effects (e.g., file operations or logging) so they comply with the
repository's Google-style docstring requirement.
🟠 Major comments (19)
massgen/orchestrator_collaborators/orchestrator_timeout_calculator.py-37-40 (1)

37-40: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle unknown agent_id before indexing agent_states.

This path currently raises KeyError for a stale or already-cleaned-up agent, which turns a timeout check into a round-breaking exception. get_agent_timeout_state() already uses a safe lookup; this helper should do the same.

Suggested fix
         orchestrator = self._orchestrator
         timeout_config = orchestrator.config.timeout_config
-        round_start = orchestrator.agent_states[agent_id].round_start_time
+        state = orchestrator.agent_states.get(agent_id)
+        if state is None:
+            return False
+        round_start = state.round_start_time
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/orchestrator_timeout_calculator.py` around
lines 37 - 40, The code directly indexes orchestrator.agent_states[agent_id]
which can raise KeyError for stale/cleaned agents; change the lookup to a safe
retrieval (mirroring get_agent_timeout_state()) by using
orchestrator.agent_states.get(agent_id) (or the existing safe helper) and bail
out or return a default timeout when missing before accessing round_start, so
the timeout check does not raise an exception.
massgen/orchestrator_collaborators/bootstrap_criteria_engine.py-137-147 (1)

137-147: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't cache the discriminator signature before the run outcome is known.

seen.add(content_signature) happens before run_bootstrap_discriminator_step() returns, but that callee returns 0 for both legitimate “nothing new to add” runs and outright failures/import errors/spawn failures. A transient failure therefore marks the answer set as completed and suppresses every future retry for the same content.

Please make the callee return an explicit completion status and only record the signature after a real completed discriminator pass.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/bootstrap_criteria_engine.py` around lines
137 - 147, The code currently adds content_signature to
orch._bootstrap_discriminator_completed_signatures before knowing the outcome of
run_bootstrap_discriminator_step(), causing transient failures to be treated as
completed; change the flow so run_bootstrap_discriminator_step() returns an
explicit completion status (e.g., boolean or enum) and only call
seen.add(content_signature) when that status indicates a successful/real
completed discriminator pass; update the check here (using content_signature,
orch._bootstrap_discriminator_completed_signatures, and
run_bootstrap_discriminator_step) to add the signature on success and leave it
unadded on failure, and adjust any other callers of
run_bootstrap_discriminator_step to handle the new explicit return value.
massgen/orchestrator_collaborators/round_evaluator_runner.py-644-654 (1)

644-654: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Emit the display completion status after salvage/degraded handling, not before.

This notification is finalized off the raw success bit, but the method can still salvage a failed spawn from recovered artifacts or convert a timeout into the degraded-success path. In those cases the UI is told failed here and never corrected, even though orchestration continues with a usable evaluator outcome.

Move the completion notification to the final exit paths so it reflects the derived status (completed, degraded, or terminal failure) exactly once.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/round_evaluator_runner.py` around lines
644 - 654, The current code calls
orch.coordination_ui.display.notify_runtime_subagent_completed early using the
raw success flag (variables: display, orch.coordination_ui,
notify_runtime_subagent_completed, parent_agent_id, upcoming_round,
tool_call_id, success), which can misreport status if later salvage/degraded
logic converts failures into usable outcomes; move this notification out of the
early path and invoke it exactly once at the final exit points after all
salvage/degraded handling finishes, passing the derived status string
("completed", "degraded", or "failed") so the UI reflects the final outcome.
massgen/orchestrator_collaborators/pre_collab_helpers.py-38-40 (1)

38-40: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle non-dict backend configs before calling .items().

This helper now assumes agent.backend.config is always a mapping, but other extracted code in this PR still treats backend config as potentially object-shaped. On those agents, .items() raises and breaks every pre-collab flow that calls _build_parent_agent_configs().

Suggested fix
+from collections.abc import Mapping
 import tempfile
 from typing import TYPE_CHECKING, Any
@@
         for agent_id, agent in self._orchestrator.agents.items():
             agent_cfg: dict[str, Any] = {"id": agent_id}
-            if hasattr(agent, "backend") and hasattr(agent.backend, "config"):
-                backend_cfg = {k: v for k, v in agent.backend.config.items() if k not in ("mcp_servers", "_config_path")}
+            cfg = getattr(getattr(agent, "backend", None), "config", None)
+            if isinstance(cfg, Mapping):
+                backend_cfg = {
+                    k: v
+                    for k, v in cfg.items()
+                    if k not in ("mcp_servers", "_config_path")
+                }
+                agent_cfg["backend"] = backend_cfg
+            elif cfg is not None:
+                backend_cfg = {
+                    k: v
+                    for k, v in vars(cfg).items()
+                    if k not in ("mcp_servers", "_config_path")
+                }
                 agent_cfg["backend"] = backend_cfg
             configs.append(agent_cfg)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/pre_collab_helpers.py` around lines 38 -
40, The code in _build_parent_agent_configs assumes agent.backend.config is a
mapping and calls .items(), which breaks for object-shaped configs; update the
branch that builds backend_cfg to first test the type of agent.backend.config
(e.g., isinstance(agent.backend.config, Mapping) from collections.abc) and only
call .items() for mappings, otherwise fallback to a safe conversion such as
vars(agent.backend.config) or {k: getattr(agent.backend.config, k) for k in
dir(agent.backend.config) if not k.startswith("_")} to produce a dict; ensure
this logic is applied where agent_cfg["backend"] is set so non-dict
backend.configs no longer raise.
massgen/orchestrator_collaborators/active_coordination_cleanup.py-49-60 (1)

49-60: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Snapshot _active_tasks before awaiting cancelled tasks.

This loop iterates the live _active_tasks dict and then awaits inside the iteration. Cancellation callbacks can mutate _active_tasks while control is yielded, which can raise RuntimeError: dictionary changed size during iteration and abort the cleanup path.

Suggested fix
-            for agent_id, task in orch._active_tasks.items():
+            for agent_id, task in list(orch._active_tasks.items()):
                 if not task.done():
                     if not orch.is_orchestrator_timeout:
                         orch.coordination_tracker.track_agent_action(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/active_coordination_cleanup.py` around
lines 49 - 60, The loop mutates orch._active_tasks while awaiting cancelled
tasks, causing a RuntimeError; fix by snapshotting the items first (e.g., items
= list(orch._active_tasks.items()) or collect tasks to cancel in a separate
list) and then iterate that snapshot to call
orch.coordination_tracker.track_agent_action (using ActionType.CANCELLED when
appropriate), task.cancel(), and await the task from the snapshot so the
original orch._active_tasks dict cannot change during iteration.
massgen/orchestrator_collaborators/checkpoint_coordinator.py-190-295 (1)

190-295: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reset checkpoint state in a finally path.

_checkpoint_active, _checkpoint_task, and _checkpoint_participants are set before the subprocess work starts, but any exception from spawn(), _copy_subprocess_logs(), event emission, or cleanup() will exit this method without clearing that state. That can leave the orchestrator stuck in checkpoint mode for the rest of the session.

The early "no workspace available" return also happens after _checkpoint_participants is populated, so stale participant state can leak even without an exception.

Suggested direction
-        orch._checkpoint_active = True
-        orch._checkpoint_number += 1
-        orch._checkpoint_task = signal.get("task", "")
+        orch._checkpoint_active = True
+        orch._checkpoint_number += 1
+        orch._checkpoint_task = signal.get("task", "")
+        orch._checkpoint_participants = {}
+
+        try:
+            ...
+            if not parent_workspace:
+                logger.error("[Checkpoint] No parent workspace for subprocess")
+                return "Checkpoint failed: no workspace available"
+
+            result = await manager.spawn(
+                signal=signal,
+                on_event=_relay_event,
+            )
+            ...
+            return consensus
+        finally:
+            orch._checkpoint_active = False
+            orch._checkpoint_participants = {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/checkpoint_coordinator.py` around lines
190 - 295, Wrap the subprocess work and all subsequent post-processing (the call
to CheckpointSubprocessManager(...), await manager.spawn(...),
manager._copy_subprocess_logs(), event emissions, and manager.cleanup()) in a
try/finally so orch._checkpoint_active, orch._checkpoint_task and
orch._checkpoint_participants are always reset in the finally block; also ensure
the early "no workspace available" return path clears those same fields before
returning. Use the existing symbols (CheckpointSubprocessManager, manager.spawn,
manager._copy_subprocess_logs, manager.cleanup, get_event_emitter) to locate
where to add the try/finally and the cleanup assignments.
massgen/orchestrator_collaborators/subagent_tool_injector.py-97-111 (1)

97-111: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make list-backed MCP injection idempotent.

When mcp_servers is a dict you replace by key, but in the list branch you always append. If this injector runs more than once for the same agent, the same subagent server gets registered repeatedly and the backend will see duplicate tool definitions. Filter the list by self.subagent_server_name(agent_id) before appending the new config.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/subagent_tool_injector.py` around lines 97
- 111, The list-backed MCP injection is not idempotent: in the else branch you
always append subagent_mcp_config to mcp_servers causing duplicates; modify the
logic that handles list mcp_servers (variable mcp_servers, method
subagent_server_name(agent_id), and value subagent_mcp_config) to first filter
out any existing entries whose server name equals
self.subagent_server_name(agent_id) (or whose "name" field matches that value)
and then append the new subagent_mcp_config only once, finally writing back to
agent.backend.config["mcp_servers"] so repeated runs replace the previous entry
instead of duplicating it.
massgen/orchestrator_collaborators/subagent_tool_injector.py-542-568 (1)

542-568: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep the round-2 rewrite path in sync with the initial config writer.

rewrite_subagent_mcp_config_files() is not reproducing the payload from create_subagent_mcp_config(): it drops the backend-model fallback and never rewrites *_context_paths.json. After a workspace clear, later subagent launches can therefore run with a different config than round 1 even though the docstring says this is the counterpart path. Please share the serialization logic or rewrite the same files/fields here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/subagent_tool_injector.py` around lines
542 - 568, rewrite_subagent_mcp_config_files() currently builds agent_configs
but omits the backend-model fallback and never emits the *_context_paths.json
(so round-2 payload diverges from create_subagent_mcp_config()); update
rewrite_subagent_mcp_config_files() to mirror create_subagent_mcp_config()
exactly by: when serializing each agent (agent_cfg in the loop over orch.agents)
include the backend model fallback field(s) you add/expect from backend.config
(e.g., preserve any "model" or fallback keys removed by the filter), serialize
runtime_agent_config.context_paths (or the same attribute
create_subagent_mcp_config reads) and write it to mcp_temp_dir /
f"{_token}_context_paths.json", and ensure the same three files are produced
(*_agent_configs.json, *_coordination_config.json, *_orchestrator_config.json
plus *_context_paths.json) using the same to_dict/json serialization logic used
in create_subagent_mcp_config() so the round‑2 payload matches round‑1.
massgen/orchestrator_collaborators/previous_log_restorer.py-39-42 (1)

39-42: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use Google-style docstrings on the new methods.

These new APIs still use summary-only docstrings, so the extracted collaborator is missing the required Args/Returns structure where callers now need it most.

As per coding guidelines, "**/*.py: For new or changed functions, include Google-style docstrings"

Also applies to: 42-52, 215-222, 281-282

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/previous_log_restorer.py` around lines 39
- 42, The new methods currently have summary-only docstrings; update each
changed/new function (e.g., the class __init__ and restore_from_previous_log in
PreviousLogRestorer, plus the other modified functions referenced around the
file such as the methods at the other changed ranges) to use Google-style
docstrings including at minimum an "Args:" section describing parameters and
types and a "Returns:" section (or "Raises:" if applicable) so callers can rely
on structured doc metadata; ensure the docstring formatting follows Google style
(triple-quoted string immediately under the def) and include types and brief
descriptions for each parameter and the return value or None.
massgen/orchestrator_collaborators/post_evaluation_runner.py-39-44 (1)

39-44: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use Google-style docstrings on these new methods.

Both new entry points are missing the required Args/Returns sections, which makes the restart and post-eval flow harder to follow now that it lives outside Orchestrator.

As per coding guidelines, "**/*.py: For new or changed functions, include Google-style docstrings"

Also applies to: 436-443

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/post_evaluation_runner.py` around lines 39
- 44, The two new async entry methods (post_evaluate_answer and the other new
method at lines ~436-443) are missing Google-style docstrings; update each
function (post_evaluate_answer and the other new entrypoint method name as seen
in the diff) to include a proper Google-style docstring with an overall summary
line plus an Args section that documents selected_agent_id: str and
final_answer: str (or the appropriate parameters for the second method) and a
Returns section that describes the AsyncGenerator[StreamChunk] return type and
what each yielded StreamChunk represents; ensure parameter types and return
behavior (async generator yielding StreamChunk events) are clearly described so
the restart and post-eval flows are discoverable.
massgen/orchestrator_collaborators/post_evaluation_runner.py-436-489 (1)

436-489: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear the consumed restart flag in handle_restart().

post_evaluate_answer() raises orch.restart_pending, but handle_restart() never clears it. That stale flag can make the next attempt look like it is still awaiting a restart decision and trigger another restart even after a later approval.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/post_evaluation_runner.py` around lines
436 - 489, post_evaluate_answer() sets orch.restart_pending but handle_restart()
never clears it, leaving a stale restart flag that can trigger another restart;
inside handle_restart (after obtaining orch or before finishing the method)
explicitly reset the flag (e.g., set orch.restart_pending = False or None) so
the orchestrator no longer appears to be awaiting a restart decision; update any
places that check the flag to expect this cleared state and ensure you reference
handle_restart and orch.restart_pending when making the change.
massgen/orchestrator_collaborators/final_presentation_runner.py-449-458 (1)

449-458: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reset _presentation_started in the finally path.

Line 458 flips the guard to True, but nothing clears it afterward. After the first presentation attempt, every later call returns "Presentation already in progress", which breaks retries, restarts, and later-turn reuse of this runner.

Also applies to: 1127-1237

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/final_presentation_runner.py` around lines
449 - 458, The guard orch._presentation_started is set to True but never
cleared, causing all subsequent calls to short-circuit; update the presentation
runner functions (the block setting orch._presentation_started = True and the
duplicate-call path that logs via _orch_mod.logger.warning and yields
StreamChunk) to ensure orch._presentation_started is reset to False in a finally
path (or via context manager) so it is cleared after success or any exception;
apply the same fix to the other occurrence around the 1127-1237 region to
restore retries/restarts correctly.
massgen/orchestrator_collaborators/workspace_lifecycle_manager.py-23-317 (1)

23-317: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use Google-style docstrings for the new workspace lifecycle methods.

This module adds several new entry points, but the docstrings still don't follow the required Args/Returns structure.

As per coding guidelines, "**/*.py: For new or changed functions, include Google-style docstrings"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/workspace_lifecycle_manager.py` around
lines 23 - 317, Several new public methods (clear_agent_workspaces,
archive_agent_memories, namespace_verification_memory_files,
merge_agent_memories_to_winner, seed_plan_execution_workspaces,
copy_workspace_contents) lack Google-style docstrings; update each docstring to
follow the project's Google style (one-line summary, blank line, Args: with
parameter names and types, Returns: with type and description where applicable,
and Raises: when exceptions are expected) and ensure types for
orch/session_id/agent_id/etc. are specified and brief example lines removed so
linters accept them. Locate and replace the existing triple-quoted summaries in
those functions with the new formatted docstrings and keep the current brief
descriptions as the one-line summary. Ensure seed_plan_execution_workspaces and
copy_workspace_contents include Returns or None as appropriate and
archive_agent_memories/docstrings mention side effects (file I/O) in Raises if
relevant.
massgen/orchestrator_collaborators/isolated_change_reviewer.py-33-112 (1)

33-112: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Convert the new helper and entry-point docstrings to Google style.

The extracted review/apply flow is adding new API surface, but the methods still omit the required Args/Returns sections.

As per coding guidelines, "**/*.py: For new or changed functions, include Google-style docstrings"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/isolated_change_reviewer.py` around lines
33 - 112, The new helper functions (_count_diff_files,
_normalize_approved_files_by_context, _normalize_approved_hunks_by_context,
_is_in_context_prefix, _format_file_list) and the entry-point method review need
Google-style docstrings including Args and Returns (and Raises when applicable);
update each function/method to keep the existing short description and add an
"Args:" section listing parameters with types and a brief description and a
"Returns:" section describing the return value and type (for review, also
include that it yields AsyncGenerator[StreamChunk] and any contextual behavior),
and add "Raises:" only if the function can intentionally raise exceptions—ensure
formatting matches other Google-style docstrings in the codebase.
massgen/orchestrator_collaborators/previous_log_restorer.py-83-85 (1)

83-85: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Sort restored snapshots by round/timestamp, not by label string.

Line 84 uses lexicographic label ordering, so revisions like agent1.10 restore before agent1.2. That can leave answers[-1] and the restored agent state pointing at an older revision after longer runs. Sort the mappings by numeric round and a monotonic field like timestamp instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/previous_log_restorer.py` around lines 83
- 85, The current loop sorts answer_snapshots by label string which causes
lexicographic order issues (e.g., "agent1.10" before "agent1.2"); change the
sort to order mappings by their numeric round and a monotonic timestamp instead:
iterate over answer_snapshots.items() sorted by a key that uses mapping['round']
(or parses the numeric round from label if round is not present) and then
mapping['timestamp'] (or another monotonic field) so restored mappings and
answers[-1] reflect true chronological/round order; update the loop that
references answer_snapshots, label, and mapping to use that sorted order.
massgen/orchestrator_collaborators/peer_answer_visibility_tracker.py-39-291 (1)

39-291: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Convert the new method docstrings to Google style.

This extracted collaborator adds a lot of new surface area, but the methods still use summary-only docstrings instead of the required Args/Returns format.

As per coding guidelines, "**/*.py: For new or changed functions, include Google-style docstrings"

massgen/orchestrator_collaborators/final_result_reporter.py-342-353 (1)

342-353: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Populate per-snapshot content from the snapshot itself.

These entries are keyed by snapshot label, but content is pulled from orch.agent_states[agent_id].answer, i.e. the agent's latest answer. Once an agent has multiple snapshots, older labels will all report the newest content, so the API no longer matches the snapshot mapping.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/final_result_reporter.py` around lines 342
- 353, The code currently sets content = orch.agent_states[agent_id].answer
which returns the agent's latest answer and causes older snapshot labels to show
the newest content; update the content assignment to read the per-snapshot
content from the snapshot mapping (e.g., content = mapping.get("content") or
mapping.get("snapshot_answer")) and only fall back to
orch.agent_states[agent_id].answer if the mapping has no snapshot content;
change the assignment near where mapping, agent_id, and answers are used (the
block that builds the answers list with label, agent_id, answer_path) so each
snapshot label reflects its own stored content.
massgen/orchestrator_collaborators/workspace_lifecycle_manager.py-60-71 (1)

60-71: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Skip symlinks when repopulating workspaces.

These copy loops restore top-level entries from previous/cancelled workspaces without an item.is_symlink() guard. That can reintroduce symlinks pointing outside the live workspace, even though copy_workspace_contents() explicitly avoids that later in this file.

Also applies to: 83-94

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/workspace_lifecycle_manager.py` around
lines 60 - 71, The loop that restores top-level entries from previous workspaces
(iterating source.iterdir() and calling shutil.copy2 / shutil.copytree) lacks an
item.is_symlink() guard and can reintroduce external symlinks; update both copy
loops (the one shown and the similar block at lines ~83-94) to skip items where
item.is_symlink() is True (e.g., if item.is_symlink(): continue) before
performing shutil.copy2 or shutil.copytree, keeping consistency with
copy_workspace_contents() which deliberately avoids symlinks; optionally add a
brief debug/log statement when skipping a symlink for traceability.
massgen/orchestrator_collaborators/final_presentation_runner.py-38-444 (1)

38-444: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use Google-style docstrings on the new final-presentation methods.

This extracted runner now owns several core orchestration entry points, but the new methods still omit the required Args/Returns sections.

As per coding guidelines, "**/*.py: For new or changed functions, include Google-style docstrings"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/final_presentation_runner.py` around lines
38 - 444, The new final-presentation methods
(yield_existing_answer_finalization, present_final_answer,
handle_orchestrator_timeout, determine_final_agent_from_votes,
get_final_presentation) lack Google-style docstrings with Args/Returns; update
each function/class docstring to follow Google style by adding an Args section
describing parameters (types and meaning) and a Returns section describing the
AsyncGenerator or return value (including yielded StreamChunk types), and
include any raised exceptions or side-effects (e.g., workspace snapshot, event
emissions) where relevant so the docstrings for these methods conform to the
repository guideline.
🟡 Minor comments (8)
massgen/orchestrator_collaborators/skills_config_validator.py-65-68 (1)

65-68: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard the built-in skills probe with is_dir() before calling iterdir().

If massgen/skills exists but is a file or broken symlink, this path raises from iterdir() and skips the clearer RuntimeError below. Mirror the external-skills check here so broken installs fail deterministically.

Suggested fix
         builtin_skills_dir = _builtin_skills_dir()
-        has_builtin_skills = builtin_skills_dir.exists() and any(
+        has_builtin_skills = (
+            builtin_skills_dir.exists()
+            and builtin_skills_dir.is_dir()
+            and any(
             builtin_skills_dir.iterdir(),
-        )
+            )
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/skills_config_validator.py` around lines
65 - 68, The probe for built-in skills should check that builtin_skills_dir is a
directory before calling iterdir(): update the logic around builtin_skills_dir
(returned by _builtin_skills_dir()) and the has_builtin_skills assignment to
mirror the external-skills check by adding builtin_skills_dir.is_dir() as a
prerequisite to calling builtin_skills_dir.iterdir(); this ensures broken
symlinks or files named massgen/skills don't cause an exception and will instead
fall through to the existing RuntimeError handling.
massgen/orchestrator_collaborators/snapshot_manager.py-201-220 (1)

201-220: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix type hints and return type consistency.

  1. Lines 204-205 use implicit Optional which violates PEP 484. Use explicit | None syntax.
  2. Line 220 returns None but the method signature declares -> str. This should be -> str | None.
🔧 Proposed fix
     async def save_agent_snapshot(
         self,
         agent_id: str,
-        answer_content: str = None,
-        vote_data: dict[str, Any] = None,
+        answer_content: str | None = None,
+        vote_data: dict[str, Any] | None = None,
         is_final: bool = False,
         context_data: Any = None,
-    ) -> str:
+    ) -> str | None:
         """See Orchestrator._save_agent_snapshot."""
         orch = self._orchestrator
         logger.info(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/snapshot_manager.py` around lines 201 -
220, The signature of save_agent_snapshot uses implicit Optional types and
declares a non-optional return; update the type hints to use explicit
union-with-None and make the return type nullable: change the function signature
in save_agent_snapshot to return "str | None" and change parameters to
"answer_content: str | None = None", "vote_data: dict[str, Any] | None = None",
and "context_data: Any | None = None" so annotations are PEP 484-compliant and
match the code path that returns None when agent not found.
massgen/orchestrator_collaborators/subagent_lifecycle_coordinator.py-84-89 (1)

84-89: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Background task exceptions may be silently lost.

The task created by loop.create_task() is not stored, so any unhandled exception in _maybe_interrupt_background_wait_for_agent will be silently dropped. Consider adding an error handler or storing the task reference.

🛡️ Proposed fix with error callback
-        loop.create_task(
+        task = loop.create_task(
             orch._maybe_interrupt_background_wait_for_agent(
                 agent_id,
                 trigger=trigger,
             ),
         )
+        task.add_done_callback(
+            lambda t: logger.debug(
+                f"[SubagentLifecycle] Background interrupt task for {agent_id} "
+                f"{'failed: ' + str(t.exception()) if t.exception() else 'completed'}"
+            ) if t.exception() else None
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/subagent_lifecycle_coordinator.py` around
lines 84 - 89, The created background task for
orch._maybe_interrupt_background_wait_for_agent is not stored, so exceptions can
be lost; modify the code that calls loop.create_task(...) to capture the Task
object (e.g., task = loop.create_task(...)) and attach a done callback that
checks task.exception() and logs or forwards the error (or re-raises) via the
orchestrator's logger; alternatively register the task in an in-flight tasks
collection on the orch instance so it can be awaited/cancelled on shutdown and
its exceptions handled centrally.
massgen/orchestrator_collaborators/workspace_modal_presenter.py-61-66 (1)

61-66: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Log workspace-resolution failures instead of silently dropping the workspace tab.

If get_current_workspace() or the directory probe throws here, the code quietly falls back to the answer-only modal. That changes the visible result but leaves no clue why the workspace tab disappeared. Please log the exception before falling back so regressions in filesystem-manager integration are diagnosable.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/workspace_modal_presenter.py` around lines
61 - 66, The try/except that calls fm.get_current_workspace() in
workspace_modal_presenter.py currently suppresses all errors; change the except
to "except Exception as e:" and log the failure before falling back (e.g., call
logger.exception or logger.error with the exception) so the message includes
context like "Failed to resolve current workspace" and the exception
information; keep the existing fallback behavior that leaves workspace_path
unset. Reference the existing call sites get_current_workspace(), Path.is_dir(),
Path.iterdir(), and the workspace_path variable when adding the log.
massgen/frontend/displays/textual_terminal_display.py-741-749 (1)

741-749: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add Google-style docstrings for the touched delegating methods.

The capability-probe change touches TextualTerminalDisplay.__init__, and the new wrapper methods here are also changed, but they still use missing or non-Google-style docstrings. Please bring these updated methods in line with the repo’s Python docstring requirement.

As per coding guidelines, **/*.py: For new or changed functions, include Google-style docstrings.

Also applies to: 832-871, 3957-4000, 4366-4384

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/frontend/displays/textual_terminal_display.py` around lines 741 -
749, The modified TextualTerminalDisplay class methods lack Google-style
docstrings; update TextualTerminalDisplay.__init__ and any delegating/wrapper
methods you changed to include proper Google-style docstrings (one-line summary,
blank line, Args: and Returns: sections as applicable) so they follow the repo
convention—ensure each docstring documents parameters like
env/on_error/refresh_rate or the wrapper's input/return values and side effects,
and add them to the other touched methods referenced in the review (the
delegating/wrapper methods within the same class).
massgen/orchestrator_collaborators/criteria_evolution_runner.py-434-434 (1)

434-434: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

zip() without strict=True may silently truncate if list lengths differ.

If parse_evolution_response returns an evolved list with a different length than old_criteria, the zip() will silently drop items, causing an incorrect evolved_count. Consider using strict=True (Python 3.10+) or adding an explicit length check.

Suggested fix
-        evolved_count = sum(1 for old_c, new_c in zip(evolution_data["current_criteria"], evolved) if old_c.text != new_c.text)
+        evolved_count = sum(1 for old_c, new_c in zip(evolution_data["current_criteria"], evolved, strict=True) if old_c.text != new_c.text)

Or add a defensive length check before the zip if strict mode causes issues with the expected behavior.

Also applies to: 465-465

massgen/orchestrator_collaborators/final_result_reporter.py-402-414 (1)

402-414: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Build the agent status payload by aid, not by zipping two dicts.

This assumes orch.agents.values() and orch.agent_states.values() always stay in identical order. If they ever diverge, the status response can pair one agent's backend status with another agent's coordination state.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/final_result_reporter.py` around lines 402
- 414, The current payload builds "agents" by zipping orch.agents.keys() with
zip(orch.agents.values(), orch.agent_states.values()), which can mis-pair agent
objects and their states if dict ordering diverges; instead iterate over
orch.agents.items() (for aid, agent in orch.agents.items()), look up the
corresponding state via orch.agent_states[aid] (or .get(aid) with a safe
fallback), and construct the map using agent.get_status(), state.answer, and
state.has_voted so each aid is guaranteed to be paired with its correct
coordination state.
massgen/orchestrator_collaborators/persona_injector.py-195-196 (1)

195-196: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard against None return from get_log_session_dir().

If get_log_session_dir() returns None, the path operation on line 196 will raise a TypeError. Add a guard or early return.

🛡️ Suggested fix
             log_dir = _orch_mod.get_log_session_dir()
+            if log_dir is None:
+                logger.debug("[Orchestrator] No log session dir; skipping persona save")
+                return
             personas_file = log_dir / "generated_personas.yaml"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/orchestrator_collaborators/persona_injector.py` around lines 195 -
196, Guard against get_log_session_dir() returning None before using it: check
the value of log_dir (returned by _orch_mod.get_log_session_dir()) and if it's
None either return early or raise a clear error, otherwise proceed to construct
personas_file = log_dir / "generated_personas.yaml"; ensure any downstream code
in PersonaInjector (or persona_injector.py) handles the early-return or error
case consistently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 55433d91-58a3-4af6-8475-1bce8f5be183

📥 Commits

Reviewing files that changed from the base of the PR and between f9227ea and 0b23bc0.

📒 Files selected for processing (66)
  • PR_DRAFT.md
  • docs/dev_notes/orchestrator_refactor_roadmap.md
  • massgen/frontend/displays/_textual_provider_model.py
  • massgen/frontend/displays/_textual_terminal_capabilities.py
  • massgen/frontend/displays/_textual_widget_debug.py
  • massgen/frontend/displays/textual_terminal_display.py
  • massgen/orchestrator.py
  • massgen/orchestrator_collaborators/__init__.py
  • massgen/orchestrator_collaborators/active_coordination_cleanup.py
  • massgen/orchestrator_collaborators/agent_orchestration_setup.py
  • massgen/orchestrator_collaborators/answer_limit_gate.py
  • massgen/orchestrator_collaborators/answer_text_normalizer.py
  • massgen/orchestrator_collaborators/bootstrap_criteria_engine.py
  • massgen/orchestrator_collaborators/broadcast_tool_initializer.py
  • massgen/orchestrator_collaborators/changedoc_coordinator.py
  • massgen/orchestrator_collaborators/chat_followup_handler.py
  • massgen/orchestrator_collaborators/checklist_gate_manager.py
  • massgen/orchestrator_collaborators/checkpoint_coordinator.py
  • massgen/orchestrator_collaborators/context_path_write_tracker.py
  • massgen/orchestrator_collaborators/criteria_evolution_runner.py
  • massgen/orchestrator_collaborators/docker_diagnostics.py
  • massgen/orchestrator_collaborators/dspy_paraphrase_coordinator.py
  • massgen/orchestrator_collaborators/enforcement_buffer_helper.py
  • massgen/orchestrator_collaborators/essential_files_helper.py
  • massgen/orchestrator_collaborators/evaluation_criteria_generator.py
  • massgen/orchestrator_collaborators/evaluator_result_extractor.py
  • massgen/orchestrator_collaborators/fairness_gate.py
  • massgen/orchestrator_collaborators/final_presentation_runner.py
  • massgen/orchestrator_collaborators/final_result_reporter.py
  • massgen/orchestrator_collaborators/isolated_change_reviewer.py
  • massgen/orchestrator_collaborators/metrics_reporter.py
  • massgen/orchestrator_collaborators/midstream_injection_hook_installer.py
  • massgen/orchestrator_collaborators/nlip_routing_initializer.py
  • massgen/orchestrator_collaborators/orchestrator_timeout_calculator.py
  • massgen/orchestrator_collaborators/peer_answer_visibility_tracker.py
  • massgen/orchestrator_collaborators/persona_injector.py
  • massgen/orchestrator_collaborators/planning_tool_injector.py
  • massgen/orchestrator_collaborators/post_evaluation_runner.py
  • massgen/orchestrator_collaborators/pre_collab_helpers.py
  • massgen/orchestrator_collaborators/previous_log_restorer.py
  • massgen/orchestrator_collaborators/prompt_improver_collaborator.py
  • massgen/orchestrator_collaborators/question_irreversibility_analyzer.py
  • massgen/orchestrator_collaborators/rate_limit_controller.py
  • massgen/orchestrator_collaborators/round_evaluator_gate_config.py
  • massgen/orchestrator_collaborators/round_evaluator_runner.py
  • massgen/orchestrator_collaborators/round_start_context_queue.py
  • massgen/orchestrator_collaborators/run_mode_strategy_resolver.py
  • massgen/orchestrator_collaborators/runtime_input_delivery.py
  • massgen/orchestrator_collaborators/skills_config_validator.py
  • massgen/orchestrator_collaborators/snapshot_manager.py
  • massgen/orchestrator_collaborators/step_mode_handler.py
  • massgen/orchestrator_collaborators/subagent_lifecycle_coordinator.py
  • massgen/orchestrator_collaborators/subagent_tool_injector.py
  • massgen/orchestrator_collaborators/tool_message_helpers.py
  • massgen/orchestrator_collaborators/trace_analyzer_runner.py
  • massgen/orchestrator_collaborators/workspace_lifecycle_manager.py
  • massgen/orchestrator_collaborators/workspace_modal_presenter.py
  • massgen/tests/frontend/test_textual_terminal_display_characterization.py
  • massgen/tests/integration/test_orchestrator_hooks_broadcast_subagents.py
  • massgen/tests/integration/test_orchestrator_restart_and_external_tools.py
  • massgen/tests/test_answer_path_normalization.py
  • massgen/tests/test_auto_trace_analysis.py
  • massgen/tests/test_essential_files_manifest.py
  • massgen/tests/test_evaluator_personas.py
  • massgen/tests/test_orchestrator_characterization.py
  • massgen/tests/unit/test_orchestrator_unit.py
✅ Files skipped from review due to trivial changes (1)
  • PR_DRAFT.md

Comment thread massgen/orchestrator_collaborators/final_presentation_runner.py
Comment thread massgen/orchestrator_collaborators/runtime_input_delivery.py
feat(mcp): add Parallel Web Search to MCP server registry + example config
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
massgen/configs/tools/web-search/parallel_search_example.yaml (1)

28-30: ⚡ Quick win

Use a cost-effective default model in this example config.

Please switch this example to a cost-effective default (or add a short note explaining why a premium model is required here).

As per coding guidelines, massgen/configs/**/*.yaml: “Prefer cost-effective models (gpt-5-nano, gpt-5-mini, gemini-2.5-flash)”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/configs/tools/web-search/parallel_search_example.yaml` around lines
28 - 30, The example uses a premium Claude model ("claude-sonnet-4-20250514");
update the example to use a cost-effective default per guidelines by replacing
the model value with a recommended cheaper model (e.g., "gpt-5-nano" or
"gpt-5-mini" or "gemini-2.5-flash") or add a one-line comment next to the model
key explaining why a premium Claude model is required; target the YAML keys
"type", "model", and the surrounding example block (where "mcp_servers" is
defined) when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@massgen/mcp_tools/server_registry.py`:
- Around line 86-97: The registry entry for "parallel_search" currently sets
requires_api_key=True and hardcodes api_key_env_var/Authorization behavior,
preventing anonymous use; change the base "parallel_search" entry to not force
an API key (set requires_api_key=False or remove the header behavior) and remove
any mandatory Authorization header from the static entry, then update the MCP
config assembly code that builds mcp_servers to inject the Authorization header
only when os.environ.get("PARALLEL_API_KEY") is present (use api_key_env_var to
detect the env var), mirroring the existing optional-key pattern so anonymous
endpoints remain discoverable without headers.

---

Nitpick comments:
In `@massgen/configs/tools/web-search/parallel_search_example.yaml`:
- Around line 28-30: The example uses a premium Claude model
("claude-sonnet-4-20250514"); update the example to use a cost-effective default
per guidelines by replacing the model value with a recommended cheaper model
(e.g., "gpt-5-nano" or "gpt-5-mini" or "gemini-2.5-flash") or add a one-line
comment next to the model key explaining why a premium Claude model is required;
target the YAML keys "type", "model", and the surrounding example block (where
"mcp_servers" is defined) when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b59d69e2-58a9-42c8-9c84-4d0773e2f237

📥 Commits

Reviewing files that changed from the base of the PR and between 0b23bc0 and b155a34.

📒 Files selected for processing (2)
  • massgen/configs/tools/web-search/parallel_search_example.yaml
  • massgen/mcp_tools/server_registry.py

Comment thread massgen/mcp_tools/server_registry.py
@ncrispino
Copy link
Copy Markdown
Collaborator

@coderabbitai pause

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

✅ Actions performed

Reviews paused.

@Henry-811 Henry-811 merged commit 583b3aa into main Jun 1, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants