Skip to content

Release v0.3.1: MIRA rebrand, bundled engine, mira research CLI, uv runtime, provider hardening#88

Merged
ldxFAIRYTAIL merged 93 commits into
mainfrom
release
May 30, 2026
Merged

Release v0.3.1: MIRA rebrand, bundled engine, mira research CLI, uv runtime, provider hardening#88
ldxFAIRYTAIL merged 93 commits into
mainfrom
release

Conversation

@ChenglongWang
Copy link
Copy Markdown
Contributor

Summary

Roll-up of all release work since the previous v0.3.0 merge (PR #69) into main. 92 commits, ~709 files touched. Tip of release is tagged as v0.3.1.

Highlights by Theme

🏷️ Rebrand: MedPilot → MIRA (#47)

  • pip package renamed medpilotmira-engine
  • CLI entry points renamed: medpilotmira, medpilot-agentmira-engine
  • Workspace migration: legacy ~/.medpilot/ auto-migrates to ~/.mira/ on first command; MEDPILOT_* env vars are mapped to MIRA_*
  • GPL licensing + CLA governance added across the deploy flow

📦 Bundled engine + release packaging (#49, #51, #52, #58, #62, #84)

  • Checked-in PyInstaller spec, embedded uv binary in the bundle, runtime config + release asset support
  • Windows: bundled engine startup without SCM services, bundled gateway subprocess extraction, asyncio warning silencing
  • macOS / Windows release matrix unblocked for v0.3.0rc → v0.3.1

🤖 Agent loop split + mira research CLI (#55)

🔬 Research loop quality (#41, #56, #70, #71, #80, #85)

🐍 uv runtime + per-project venv isolation (#59, #60, #61, #63, #64, #65, #66, #67, #82)

  • Opt-in tools.exec.python schema, per-project venv auto-bootstrap on first python command
  • First-launch uv python install, agent prompt taught venv conventions
  • mira runtime cache-prune + mira runtime project-gc CLIs
  • Opt-in pip installuv pip install rewrite

🔌 Provider + UI runtime config (#43, #45, #73, #74, #76, #77, #83, #86, plus ee22fde4)

🛠️ Infra / CI / governance

CLA Acknowledgement

  • All inbound PRs (referenced above by #NN) have already been individually CLA-checked at merge into release / dev. This is a fast-forward / merge roll-up, no new external contribution introduced at this step.

Test Evidence

CI green on release head (47fc2ca). Each constituent PR carried its own test evidence at merge time; this PR is a roll-up, not a fresh change.

git rev-list --count origin/main..origin/release   # → 92
git diff --stat origin/main...origin/release | tail -1
# 709 files changed, 82404 insertions(+), 141623 deletions(-)
git tag --points-at origin/release
# v0.3.1

Rollback Notes

  • Rollback steps: git revert -m 1 <merge-commit-sha> on main, or git reset --hard a649c64 if no follow-up work has landed on main yet.
  • Data migration impact: workspace auto-migration ~/.medpilot/~/.mira/ is one-way; rollback won't reverse it, but legacy ~/.medpilot/ is left intact on first migration so users can re-point manually if needed.
  • Safe fallback version: pip install mira-engine==0.3.0 (or the prior medpilot==0.2.x for full pre-rename rollback).

ChenglongWang and others added 30 commits April 3, 2026 23:18
* Enable web auto-run continuation and mode-aware runtime context.

Carry run mode from UI metadata into the agent loop, continue pending experiments server-side in auto mode, and tighten stop heuristics so ordinary experiment analysis does not halt execution.

Made-with: Cursor

* Apply session-level run mode control to web auto loops.

Handle set_mode control messages in the gateway and agent loop so manual/auto switches take effect during active auto execution, and include the latest UI submodule pointer and refreshed template assets.

Made-with: Cursor

* Update submodule.

Advance the UI submodule pointer to include the manual/auto toggle switch styling update.

Made-with: Cursor
* Enable profile-based AGENTS template selection for web sessions.

Parse and persist the UI-selected agent profile per session, i.e., engineer/research/default, during prompt construction, and cover the new selection paths with context and web channel tests.

* Update system prompts.

* Add logging functionality.

* Log skill invoke.
Standardize issue linking, test evidence, and rollback notes for per-ticket deploy PRs.

Made-with: Cursor
Define a release compatibility mapping with schema-style checks and enforce it in CI so UI and agent versions stay aligned.
Expose machine-readable /health and /version contracts (plus /api aliases) so desktop bootstrap and release compatibility checks can rely on a stable runtime handshake.
Introduce a dedicated local engine management CLI with install/start/stop/status/logs/doctor commands and test coverage so deployment workflows no longer depend on tmux sessions.
… stack (#32)

* Add medpilot-agent service lifecycle CLI skeleton.

Introduce a dedicated local engine management CLI with install/start/stop/status/logs/doctor commands and test coverage so deployment workflows no longer depend on tmux sessions.

Made-with: Cursor

* Add macOS launchd support for local engine service.

Implement launchd-backed install/start/stop/status/doctor behavior and plist generation so desktop local mode can run as a managed user service instead of tmux.

Made-with: Cursor

* Add rollback-safe manual local engine upgrade flow (#25)

* Add manual upgrade command with rollback safeguards.

Provide a medpilot-agent upgrade flow that stops service, upgrades package, verifies health, and rolls back on failures, with an operator runbook for manual recovery.

Made-with: Cursor

* Add local engine structured logs and diagnostics export (#26)

* Add structured logging and diagnostics export for local engine.

Emit JSONL service lifecycle logs, rotate log files, and support doctor --export diagnostics bundles to speed up support triage.

Made-with: Cursor

* Add tag-driven agent release pipeline (#27)

* Add agent tag-release pipeline for package and executables.

Automate cross-platform build/test, PyPI publish, standalone medpilot-agent executable packaging, and checksum generation on version tags.

Made-with: Cursor

* Add release-train smoke orchestration workflow (#28)

* Add release-train orchestration and smoke workflow.

Introduce a manual workflow that validates agent/ui tag pairs, runs gateway smoke checks, and publishes release-train summary artifacts for coordinated releases.

Made-with: Cursor

* Add Linux systemd user-service manager for medpilot-agent (#29)

* Add Linux systemd --user service support for medpilot-agent.

Introduce systemd unit generation and lifecycle commands with status checks so Linux users can run local engine as a managed per-user service.

Made-with: Cursor

* Add Windows service manager for medpilot-agent (#30)

* Add Windows Service support for medpilot-agent lifecycle.

Implement Windows service create/start/stop/status/delete flows and tests so local engine management has parity with macOS/Linux service models.

Made-with: Cursor

* Add optional self-hosted Docker templates and operator guide. (#31)

Provide compose and env examples plus upgrade/rollback instructions while keeping Docker explicitly positioned as an advanced deployment path.

Made-with: Cursor
Update submodule version.
Switch runtime/docs/tests to the new package name, enable hatch-vcs tag-based versioning, and improve release workflow observability with full tag checkout plus verbose PyPI uploads.

Made-with: Cursor
Update compatibility.json with new release train and agent/ui versions.
Introduce project GPL metadata and CLA policy docs, require CLA acknowledgement in PR templates, update release-train cross-repo tag checks for private repos, and bump the UI submodule pointer to the latest governance updates.

Made-with: Cursor
Add high-value coverage for agent loop paths, channel/web handlers, config matching, and tool execution edge cases, plus a reusable scoped coverage command to track core coverage targets consistently.

Made-with: Cursor
Add high-value coverage for agent loop paths, channel/web handlers, config matching, and tool execution edge cases, plus a reusable scoped coverage command to track core coverage targets consistently.
* Add gateway-side data path validation API for UI project setup.

This keeps workspace restrictions enforced while allowing the UI to check server path visibility before project creation, and updates the UI submodule to the unified data-source entry flow.

* remove codecov threshold.
Keep pull_request checks for all PRs while restricting push-triggered test runs to main/dev/release to avoid duplicate CI runs on feature branches.

Made-with: Cursor
* Add gateway-side data path validation API for UI project setup.

This keeps workspace restrictions enforced while allowing the UI to check server path visibility before project creation, and updates the UI submodule to the unified data-source entry flow.

* Add task_plan guardrails to prevent experiment structure drift.

This introduces shared task_plan lint/reconcile logic, auto-fix and lint APIs in the web channel, and auto-mode gating so malformed or drifting experiment structures are corrected (or blocked) before they can break downstream UI rendering.

* remove codecov threshold.

* Harden web session persistence and add experiment snapshots.

This makes session history append-only, returns recoverable experiment snapshots for completed entries, and updates UI integration/tests to preserve stable experiment detail views during task_plan drift.

* Deduplicate repeated workspace-root update logs.

Only emit audit/log entries when /api/config actually changes projects_root, reducing reconnect noise while preserving behavior and test coverage.

* Add profile-aware task-plan guardrails with repair and versioning.

Enforce required evidence fields by profile, add one-shot auto repair for blocked auto runs, and make strictness configurable via project contract version with updated web APIs/tests/docs.

* Auto-fix duplicate experiment IDs in task plans.

Reconcile duplicate experiment ids to fresh ExpNNN values so malformed task_plan updates cannot persist ambiguous experiment records.

* Surface task-plan ID remaps to agent context.

When guardrails auto-reassign duplicate experiment IDs, inject a canonical remap notice into the web message metadata so the LLM reasons over corrected IDs instead of stale references.

* Expose profile contract metadata for task-plan UI alignment.

Derive required field rules from guardrails in a new read-only plan contract endpoint, persist rich experiment evidence fields in snapshots, and wire tests to keep profile-specific requirements synchronized with dashboard rendering.
* feat: sync all features from nanobot v0.15

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(onboard): add guided provider and model setup

- integrate OAuth providers into onboard flow (no separate provider login command)

- show provider endpoint connectivity with dim URL display

- add model examples and validation; allow bare model input after provider selection

- improve non-wizard onboarding prompts and related docs/tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(docker): stabilize cli oauth and config paths

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(bug): fix the problem of skill discovery and routing in nested workspace skill directories?

* feat(cli): auto-prepend provider prefix to model names during onboard

* fix(core): resolve cron reload loop and enhance gateway with fail-safe protection

- Fix infinite reload loop in CronService by updating mtime after loading.
- Add PID file locking and port collision detection to prevent duplicate gateway instances.
- Introduce '--host' option to gateway and service install commands for LAN access.
- Improve channel configuration handling for Pydantic models and key normalization.
- Bump version to 0.3.0 and add comprehensive fail-safe unit tests.

* fix(ci): add psutil dependency and include missing fail-safe tests

* fix(test): resolve CI failures by skipping failsafe in tests and fixing version mismatch

- Add MEDPILOT_SKIP_GATEWAY_FAILSAVE to globally skip port/PID checks during testing.
- Revert pyproject.toml version to 0.0.0 to match __init__.py and test expectations.
- Update gateway CLI tests to match new output format (0.0.0.0:port).
- Ensure psutil is in dependencies and failsafe tests are included.

* fix(test): extract gateway failsafe to function and global mock in tests

- Extract PID/port collision check into _gateway_failsafe_check for granular control.
- Add global mock in tests/conftest.py to bypass failsafe in all existing CLI tests.
- Update failsafe unit tests to test the check function directly and bypass global mock.
- Ensure all tests pass without SystemExit(1) due to port collisions in CI environment.

---------

Co-authored-by: Chenglong Wang <ryuu.j.ching@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Initialize web projects root from runtime workspace and write /api/config changes back to the currently selected config path so UI workspace edits remain permanent across restarts.
* Add gateway-side data path validation API for UI project setup.

This keeps workspace restrictions enforced while allowing the UI to check server path visibility before project creation, and updates the UI submodule to the unified data-source entry flow.

* Add task_plan guardrails to prevent experiment structure drift.

This introduces shared task_plan lint/reconcile logic, auto-fix and lint APIs in the web channel, and auto-mode gating so malformed or drifting experiment structures are corrected (or blocked) before they can break downstream UI rendering.

* remove codecov threshold.

* Harden web session persistence and add experiment snapshots.

This makes session history append-only, returns recoverable experiment snapshots for completed entries, and updates UI integration/tests to preserve stable experiment detail views during task_plan drift.

* Deduplicate repeated workspace-root update logs.

Only emit audit/log entries when /api/config actually changes projects_root, reducing reconnect noise while preserving behavior and test coverage.

* Add profile-aware task-plan guardrails with repair and versioning.

Enforce required evidence fields by profile, add one-shot auto repair for blocked auto runs, and make strictness configurable via project contract version with updated web APIs/tests/docs.

* Auto-fix duplicate experiment IDs in task plans.

Reconcile duplicate experiment ids to fresh ExpNNN values so malformed task_plan updates cannot persist ambiguous experiment records.

* Surface task-plan ID remaps to agent context.

When guardrails auto-reassign duplicate experiment IDs, inject a canonical remap notice into the web message metadata so the LLM reasons over corrected IDs instead of stale references.

* Expose profile contract metadata for task-plan UI alignment.

Derive required field rules from guardrails in a new read-only plan contract endpoint, persist rich experiment evidence fields in snapshots, and wire tests to keep profile-specific requirements synchronized with dashboard rendering.

* Add references ingestion and policy-driven auto-stop controls for web projects.

This introduces references-target uploads with safe zip extraction, persists automation policies from project setup, and switches auto-mode continuation to stop on goals, experiment limits, or token budgets for clearer project control.

* Enforce auto-run task_plan checkpoint updates between experiments.

Add a checkpoint barrier that detects unchanged running experiments across auto rounds and forces a task_plan sync repair round before continuing, so progress state is persisted incrementally instead of only at the end.

* Update UI submodule for manual export-driven results flow.

Track the latest MedPilotUI commits that remove output-goal setup, move automation policy defaults, and add user-triggered export actions in the result stage.

* Require explicit user request for result deliverables.

Remove the natural-conclusion trigger from web agent instructions so final deliverables are generated only when the user explicitly asks for export.
* Add gateway-side data path validation API for UI project setup.

This keeps workspace restrictions enforced while allowing the UI to check server path visibility before project creation, and updates the UI submodule to the unified data-source entry flow.

* Add task_plan guardrails to prevent experiment structure drift.

This introduces shared task_plan lint/reconcile logic, auto-fix and lint APIs in the web channel, and auto-mode gating so malformed or drifting experiment structures are corrected (or blocked) before they can break downstream UI rendering.


* remove codecov threshold.

* Harden web session persistence and add experiment snapshots.

This makes session history append-only, returns recoverable experiment snapshots for completed entries, and updates UI integration/tests to preserve stable experiment detail views during task_plan drift.

* Deduplicate repeated workspace-root update logs.

Only emit audit/log entries when /api/config actually changes projects_root, reducing reconnect noise while preserving behavior and test coverage.

* Add profile-aware task-plan guardrails with repair and versioning.

Enforce required evidence fields by profile, add one-shot auto repair for blocked auto runs, and make strictness configurable via project contract version with updated web APIs/tests/docs.

* Auto-fix duplicate experiment IDs in task plans.

Reconcile duplicate experiment ids to fresh ExpNNN values so malformed task_plan updates cannot persist ambiguous experiment records.

* Surface task-plan ID remaps to agent context.

When guardrails auto-reassign duplicate experiment IDs, inject a canonical remap notice into the web message metadata so the LLM reasons over corrected IDs instead of stale references.

* Expose profile contract metadata for task-plan UI alignment.

Derive required field rules from guardrails in a new read-only plan contract endpoint, persist rich experiment evidence fields in snapshots, and wire tests to keep profile-specific requirements synchronized with dashboard rendering.

* Add references ingestion and policy-driven auto-stop controls for web projects.

This introduces references-target uploads with safe zip extraction, persists automation policies from project setup, and switches auto-mode continuation to stop on goals, experiment limits, or token budgets for clearer project control.

* Enforce auto-run task_plan checkpoint updates between experiments.

Add a checkpoint barrier that detects unchanged running experiments across auto rounds and forces a task_plan sync repair round before continuing, so progress state is persisted incrementally instead of only at the end.

* Update UI submodule for manual export-driven results flow.

Track the latest MedPilotUI commits that remove output-goal setup, move automation policy defaults, and add user-triggered export actions in the result stage.

* Require explicit user request for result deliverables.

Remove the natural-conclusion trigger from web agent instructions so final deliverables are generated only when the user explicitly asks for export.

* Harden auto-run guardrails and persist runtime contract settings.

Apply code-level task-plan contract normalization after experiment transitions, ensure websocket messages can persist contract version metadata, and update tests while advancing the UI submodule for synced new-project runtime preferences.

* Enforce strict contract completion flow without placeholder bypass.

Fix the auto-run guardrail crash by aligning _guard_task_plan_structure with auto_fix calls, and require strict-mode experiments to request model补全 instead of auto-filling missing contract fields.

* Update UI submodule pointer for kickoff language policy.

Record the latest UI commit so runtime-profile-contract-sync includes the new language-aware project kickoff prompt behavior.
* Fix: Require explicit apiBase for custom provider

- Add validation in make_provider() to raise clear error when custom provider lacks apiBase
- Enhance onboard.py to prompt for apiBase when configuring custom provider
- Add tests for custom provider apiBase validation
- Error message guides users to configure via config.json or onboard wizard

* Fix: Prompt for apiBase in non-wizard onboard for custom provider

- Add api_base prompt logic for custom provider in non-wizard onboarding flow
- When user selects custom provider, prompt for API base URL (required if not set)
- If api_base already exists, offer options to update/keep/clear
- Consistent with wizard onboarding behavior

* fix: add missing prompt for apiBase in custom config

* fix: add missing prompt for apiBase in custom config (resolved conflict with dev)

- Re-apply custom provider api_base prompt logic on top of latest dev branch
- Maintains compatibility with dev branch changes to onboard flow
- Prompts for API Base URL when configuring custom provider in non-wizard mode

* merge: resolve conflicts with origin/dev (loop.py, web.py, guardrails.py, tests)

Agent-Logs-Url: https://github.com/Project-MedPilot/MedPilot/sessions/6a11157f-7b33-47e6-91fe-0d26235cdc68

Co-authored-by: ldxFAIRYTAIL <82999767+ldxFAIRYTAIL@users.noreply.github.com>

---------

Co-authored-by: LoveMachine <yqyi@example.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ldxFAIRYTAIL <82999767+ldxFAIRYTAIL@users.noreply.github.com>
* config: unify web bind host/port under gateway

Use gateway.host/port as the single bind source for the Web channel and stop duplicating bind settings in channels.web. Add config migration plus regression tests so legacy channels.web host/port values are safely promoted without breaking existing configs.

* chore: normalize config-unification files after origin/dev rebase

Preserve existing CRLF conventions in files touched during conflict resolution so the branch keeps a minimal, reviewable diff against origin/dev without changing behavior.

* Remove test from other branch.
ChenglongWang and others added 27 commits May 2, 2026 18:26
The Agent Release workflow on tag ``v0.3.0rc2`` failed across the
matrix with three independent issues. Fixing them in one commit so the
RC build can re-run cleanly.

1. macOS — ``Fetch bundled uv binary`` step hit GitHub API HTTP 403
   ``rate limit exceeded`` while resolving the latest ``uv`` release
   tag. Unauthenticated GitHub API requests are capped at 60/h per IP
   and Actions runners on macOS share that quota across many jobs.
   - ``scripts/fetch_uv.py``: ``resolve_release_tag`` now sends
     ``Authorization: Bearer $GITHUB_TOKEN`` (or ``$GH_TOKEN``) when
     present, lifting the quota to 5,000 req/h per repo. Also pins
     the recommended ``Accept`` and ``X-GitHub-Api-Version`` headers.
   - ``.github/workflows/agent-release.yml``: the ``Fetch bundled uv
     binary`` step now exposes ``GITHUB_TOKEN`` as an env var so the
     script can pick it up. ``GITHUB_TOKEN`` is the read-only,
     auto-issued workflow token; no new secret is needed.

2. Windows — ``test_project_gc_lists_venvs`` failed because Rich
   wrapped the long absolute project path mid-word inside the
   ``Project`` table cell, splitting the literal ``proj`` substring
   across a newline (``...test_pro\nject_gc_lists_ven...``). Collapse
   line breaks before searching for the substring; keep the
   ``"active"`` assertion as-is since the status column never wraps.

3. Windows — ``test_info_when_enabled`` asserted the literal POSIX
   string ``"/usr/local/bin/uv"`` against output that ``Path.__str__``
   had rendered with Windows separators
   (``"\\usr\\local\\bin\\uv"``). Derive the expected substring from
   the same ``Path`` the CLI will render so the assertion is
   OS-agnostic.

Verified by running ``pytest tests/`` locally (1934 passed, 1 skipped).

Co-authored-by: Cursor <cursoragent@cursor.com>
…#73)

The UI needs one authoritative runtime contract so newer providers do not get blocked behind stale frontend heuristics. This publishes the full provider catalog plus structured setup status for localized and actionable connection feedback.
* Remove mira-ui submodule.

* fix: isolate UI project bindings by workspace
* chore: relocate compatibility tracking to mira-ui repo

The compatibility.json file conceptually belongs in the consumer repo,
not the producer one — mira-ui is what depends on mira (calls its API),
so the UI side should declare which agent versions it's compatible with.
Living in mira coupled the agent's release cadence to the UI for no
runtime benefit: nothing in mira reads compatibility.json at runtime,
the actual UI ↔ agent wire-format handshake goes through GET /version
(served from _API_CONTRACT_VERSION in mira_engine/channels/ui.py).

This commit deletes the file and its supporting tooling from mira.
A companion PR in mira-ui will create compatibility.json there with
the same schema, port the validator to Node, and add a tag-time guard
to desktop-release.yml.

Removed:
- compatibility.json
- scripts/validate_compatibility.py
- tests/test_compatibility_validation.py

Updated docs to reflect the new location:
- README.md: "Release Compatibility Mapping" section now points to
  mira-ui and clarifies that mira's only contribution is the
  api_contract field on /version (sourced from _API_CONTRACT_VERSION).
- RELEASE_DAY_CHECKLIST.md: pre-release check now runs the Node
  validator from mira-ui; added an explicit reminder to bump
  _API_CONTRACT_VERSION + mira-ui's api_contract together when wire
  format changes.
- DEPLOYMENT_RELEASE_BLUEPRINT.md: blueprint already anticipated this
  split ("可放在 UI repo"); narrative updated to make it definitive.

No runtime change — /version still reports api_contract: "v1" as
before, and mira-ui still does its compatibility check off the
/version response, not off any file in this repo.

Co-authored-by: Cursor <cursoragent@cursor.com>

* ci: drop validate_compatibility step from tests.yml

The previous commit deleted scripts/validate_compatibility.py but
missed this caller in tests.yml. CI was failing on:

  python: can't open file '.../scripts/validate_compatibility.py':
  [Errno 2] No such file or directory

Compatibility validation now lives in the mira-ui repo
(scripts/validate-compatibility.mjs there), so this step has no
business in mira's PR-time test workflow anymore.
* Add release-train workflow to main branch.

Sync the manual release-train GitHub Actions workflow from deploy so it is visible and runnable from the default branch Actions page.

* Sync release-train workflow updates from deploy to main.

* Fix release-train CI config for writable model field.

* Update .github/workflows/release-train.yml

* Update .github/workflows/release-train.yml

* Replace local skills with submodule (mira-skills repo)

* Remove useless files.

* fix(agent): improve skill routing precision with metadata + session memory

- Parse scenarios/aliases from SKILL.md frontmatter for matching
- Recent skills always get scoring boost (+15), not just on follow-up
- Raise minimum score threshold from >0 to >=4 to filter weak matches
- Remove follow-up short-circuit, treat it as +10 extra score instead

* submodule: update mira-skills submodule reference

* fix(agent): improve skill routing precision with metadata + session memory

* ci: enable submodules in Tests workflow and drop stale skill test

- tests.yml now checks out submodules recursively so mira-skills
  (newly added as a submodule) is available during pytest.
- .gitmodules: switch mira-ui to HTTPS so CI runners without SSH
  keys can clone it.
- Remove tests/agent/test_skill_creator_scripts.py: a nanobot v0.15
  leftover that hard-coded the legacy skills/skill-creator layout
  and expected an init_skill module that does not exist in the new
  mira-skills submodule.

* fix(skills): align medical-image skill references with mira-skills rename

The mira-skills submodule renamed the deep-learning medical imaging skill
from `medical-image-dl-pipeline` to `medical-image-analysis`, but several
references in this repo still pointed at the old name:

- mira_engine/agent/skills.py: hardcoded alias key for routing
  ("去伪影", "monai", "mri", etc.) lived under the old name, so the
  router could never reach this skill when it lived in the submodule.
- tests/test_agent_loop_core.py: three fixtures and the active-skills
  injection assertion referenced the old name.
- tests/agent/test_skills_loader.py: routing tests referenced the old
  name (and would have broken once the alias key was renamed).
- README.md: docs still listed the old skill name.

Rename every reference to `medical-image-analysis` so routing actually
selects the skill that exists in the submodule.
agent-release.yml runs the full pytest suite before building wheels.
Now that mira-skills lives in a submodule (PR #80), the runner must
clone it or skill-dependent tests fail (skill_names is None, builtin
skill list missing 'builtin-skills').
agent-release.yml runs the full pytest suite before building wheels.
Now that mira-skills lives in a submodule (PR #80), the runner must
clone it or skill-dependent tests fail (skill_names is None, builtin
skill list missing 'builtin-skills').
- Delete .github/workflows/ci.yml (nanobot v0.15 leftover that duplicated pytest and only triggered for main/nightly).
- Move its `ruff check --select F401,F841` step into tests.yml.
- Exclude mira-skills submodule from ruff via pyproject `extend-exclude`.
- Fix 8 F401/F841 violations in mira_engine/* that the legacy workflow had surfaced.
* fix: handle conda as .bat file on Windows

On Windows, conda is a .bat file which cannot be executed directly by
subprocess.run without shell=True. Add shell=True on Windows and catch
FileNotFoundError as a fallback to prevent crashes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor: remove redundant auto_activate_env

The auto_activate_env function attempted to inject the conda mira
environment into PATH at runtime, but this is unnecessary — subprocesses
inherit the parent process's environment automatically. When started
inside the conda mira env, child processes already have the correct
Python and packages available.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: improve LLM error display in CLI with provider/model context

When the LLM provider returns a 500 or other error, the CLI now shows:
- The provider name and model that failed
- The raw API error message (without "Error:" prefix)
- A hint suggesting retry/checking network

Before: just "Error: Internal Server Error" which was cryptic
After: structured error output with actionable context

Also added _is_llm_error() detection to distinguish provider errors from
normal agent responses.

* fix: restore _print_agent_response accidentally removed in LLM error fix
* Update versions.

* Expand core-module tests to harden release quality.

Add high-value coverage for agent loop paths, channel/web handlers, config matching, and tool execution edge cases, plus a reusable scoped coverage command to track core coverage targets consistently.


* fix: harden rc6 tests against environment differences

Use interpreter-path based shell commands and deterministic gateway/DDGS mocks so release candidate test runs pass reliably across local setups.

* fix: stabilize rc6 Windows build test compatibility

Normalize backup and file URI path handling for Windows and make command/search tests deterministic across shell quoting and file timestamp differences.

* ci(release): checkout submodules so pytest and skills loading work

agent-release.yml runs the full pytest suite before building wheels.
Now that mira-skills lives in a submodule (PR #80), the runner must
clone it or skill-dependent tests fail (skill_names is None, builtin
skill list missing 'builtin-skills').

* Add native Windows service manager

* Support Windows ARM64 uv bundling

* Mark bundle placeholder runtime as unconfigured

* Fix Windows release CI test failures

* Fix Windows service manager test home path

* Improve macOS LaunchAgent bundle service

* Support per-engine runtime workspace

* Hot reload runtime config from UI saves

* Fix bundled engine service identity

* Stop auto replanning without policy
* Update versions.

* Expand core-module tests to harden release quality.

Add high-value coverage for agent loop paths, channel/web handlers, config matching, and tool execution edge cases, plus a reusable scoped coverage command to track core coverage targets consistently.

Made-with: Cursor

* fix: harden rc6 tests against environment differences

Use interpreter-path based shell commands and deterministic gateway/DDGS mocks so release candidate test runs pass reliably across local setups.

Made-with: Cursor

* fix: stabilize rc6 Windows build test compatibility

Normalize backup and file URI path handling for Windows and make command/search tests deterministic across shell quoting and file timestamp differences.

Made-with: Cursor

* ci(release): checkout submodules so pytest and skills loading work

agent-release.yml runs the full pytest suite before building wheels.
Now that mira-skills lives in a submodule (PR #80), the runner must
clone it or skill-dependent tests fail (skill_names is None, builtin
skill list missing 'builtin-skills').

* Fix Windows release path tests

* Route normal UI chats through base loop

* Generalize default USER.md template

The shipped template included personal name, timezone, and research-domain
details that are specific to a single contributor. Replace with a neutral,
general-purpose profile so new installs do not silently inherit someone
else's identity and research preferences.

* Harden macOS launchd bootstrap against in-place bundle upgrades

After a DMG re-install the user-installed LaunchAgent often refused to
update with "Bootstrap failed: 5: Input/output error" and the desktop UI
kept talking to the previous engine. Three independent issues conspired
to produce this:

* `launchctl bootout` returns before the previous job has actually
  exited (especially when it has active aiohttp/WebSocket clients to
  drain), so the immediate `bootstrap` raced against a half-torn-down
  label. Poll `launchctl print` until the label leaves the domain
  before bootstrapping, with a bounded timeout so a stuck job cannot
  hang the install forever.

* `install_service` was not transactional: a failed bootstrap could
  leave the new plist on disk and the persisted state file pointing at
  an executable that was never actually loaded. Snapshot the previous
  plist, restore it on bootstrap failure (delete it if there was none),
  and roll the launchd job back if the base-class state write fails
  after bootstrap succeeded.

* `_handle_version` re-read the on-disk manifest on every call, which
  meant the still-running old engine reported the *new* SHA after the
  DMG overwrote the manifest file in place — making the desktop UI
  believe the live engine already matched the bundle and skip the
  reinstall. Snapshot the engine identity at `UiChannel.__init__` and
  expose a new `engine_sha256_at_boot` marker field so the UI can
  distinguish trustworthy boot snapshots from legacy engines.

Also tighten cleanup: `uninstall_service` now mirrors install by
calling `launchctl remove` in addition to `bootout` so the label does
not linger in launchd's cache and trip a subsequent reinstall.

Covered by new pytest cases:
  * teardown waits for launchd to release the label
  * teardown wait is bounded by its timeout
  * failed installs restore the previous plist (or remove a fresh one)
  * failed installs do not update the engine identity state
  * uninstall removes the label from the cache
  * /version exposes engine_sha256_at_boot
  * /version snapshots identity at boot (survives manifest swap)

* fix: preserve DeepSeek reasoning content

* fix: require Windows service install

* fix: refine task plan contract guardrails

* feat: keep UI alive during agent work

* fix: preserve DeepSeek reasoning content

* fix: suppress activity pings for plain progress callbacks
* fix: route DeepSeek through native OpenAI-compatible provider

LiteLLM strips reasoning_content from assistant messages on the way
back to DeepSeek (litellm#26395), so any multi-turn thinking-mode or
tool-call conversation 400s after the first response. Route DeepSeek
directly through OpenAICompatProvider — same approach nanobot took
(commit 3dfdab7) — to preserve reasoning_content end-to-end.

- registry: give the deepseek spec api.deepseek.com/v1 as default base
  and enable strip_model_prefix so "deepseek/deepseek-chat" reaches the
  API as "deepseek-chat".
- factory: route provider_name == "deepseek" (or deepseek/* models) to
  OpenAICompatProvider before falling back to LiteLLM.
- openai_compat: add DeepSeek to the thinking-mode extra_body branch
  ({"thinking": {"type": "enabled"|"disabled"}}) and backfill empty
  reasoning_content on legacy assistant tool-call turns so DeepSeek's
  validator stops rejecting follow-ups on resumed sessions.
- litellm_provider: drop the now-dead DeepSeek error-retry/backfill
  path; the native route handles it proactively. Keep the generalized
  provider_specific_fields reasoning extraction, which still helps any
  remaining LiteLLM-routed model that hides reasoning under nested keys.

* fix: bring OpenAICompatProvider HTTP timeouts to LiteLLM parity

Routing DeepSeek through the native OpenAI SDK shortened the effective
HTTP timeout from LiteLLM's 6000s to the SDK's 600s read / 5s connect,
which is too tight for DeepSeek V4-Pro thinking mode (often quiet for
minutes before streaming) and for slow / proxied networks in CN. Users
report `APITimeoutError: Request timed out.` on the first turn even
though LiteLLM never tripped on the same setup.

Pass an explicit `httpx.Timeout(connect=30, read=6000, ...)` into the
AsyncOpenAI client and let operators tune both legs via
MIRA_LLM_CONNECT_TIMEOUT_S / MIRA_LLM_READ_TIMEOUT_S. Garbage env values
fall back to the generous defaults instead of crashing provider init.

* fix: relax per-round experiment guardrail and surface provider error inline

The 'auto-run guard warning: multiple experiments advanced in one round'
emission was pure noise — it never actually stopped the loop, but the
matching prompt rule (AT MOST ONE terminal transition per turn) was
making the model timid and the warning made the UI imply a hard stop.
Drop the warning, soften the prompt to a preference, and rely on the
existing `running_count <= 1` invariant in task_plan guardrails for the
real concurrency limit.

Also fix the confusing stop-reason ordering: when an LLM call fails the
error text lives in `final_content` and is rendered as the assistant
reply (below the progress events), so users saw `auto-run stop reason:
provider error` followed by the error, making it look like the stop
caused the error. Surface a truncated snippet of the error inline with
the stop reason so the cause is visible next to the effect.
Resolves a single conflict in .github/workflows/release-train.yml:
both sides edited the file independently. main carried 8 pre-rename
fixes (writable model field, token check, etc.); release later
absorbed all of those concerns implicitly during the MedPilot → MIRA
rename (#47) and the gateway config unification (#45).

Resolution: keep release's version verbatim. Discarding main's
version is safe because:

- main still imports `medpilot.config.schema` → would ImportError
  after the package rename.
- main still calls `medpilot gateway` → CLI no longer exists.
- main still sets `cfg.channels.web.host` → field removed by #45
  (unify web bind host/port under gateway) and would be rejected
  by the Config schema.
- Token-check verbosity differences are cosmetic only.

This commit unblocks PR #88 (release → main) by making the branch
fast-forward-able / cleanly mergeable.
@ldxFAIRYTAIL ldxFAIRYTAIL merged commit 6d577e6 into main May 30, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants