medsim is the first foundation for a virtual surgical simulation and synthetic data platform for medical robotics research. The starting task is a narrow minimally invasive surgical primitive: laparoscopic needle passing over a deformable tissue patch.
The current implementation is deliberately small. It provides typed configuration, deterministic scenario perturbations, a runnable placeholder simulation backend, structured episode recording, action traces, replay validation, JSON/JSONL export, run manifests, and evaluation scaffolding. It does not claim physical accuracy.
- Not a physical surgical robot.
- Not an autonomous surgical system.
- Not a clinical product.
- Not a claim that the placeholder backend models real soft tissue biomechanics.
- Not a GUI, robotics hardware integration, ML training stack, or rendering pipeline.
The first runnable environment validates the infrastructure shape:
- load a base scene and scenario from YAML;
- reset with a deterministic seed;
- run a scripted needle passing episode through a backend interface;
- apply first-class perturbations such as camera occlusion, grasp instability, tissue shift, and collision risk;
- record per-step state, canonical events, and the exact normalized action applied;
- export episode summaries, run manifests, replay validation results, and aggregate metrics.
The placeholder backend is intentionally a simple kinematic state machine. Its purpose is to exercise configuration, replay, data export, and evaluation contracts before a real deformable backend is integrated.
medsim separates the parts that should remain stable as the simulator becomes more realistic:
medsim.config: typed scene and scenario models plus YAML loading.medsim.sim: backend lifecycle contract, environment wrapper, runtime state, canonical events/failures, actions, and the placeholder backend.medsim.tasks: task-specific rules for laparoscopic needle passing.medsim.scenarios: deterministic perturbation effect generation.medsim.data: structured episode records and JSON/JSONL exporters.medsim.eval: summary metrics, replay validation, and text replay scaffolding.
Future SOFA work should implement the existing backend contract in medsim.sim.sofa_backend without forcing callers to change the environment, recording, or evaluation layers.
Backend compliance now makes that boundary enforceable. Each backend declares capabilities through backend_info(), and the compliance harness checks lifecycle behavior, determinism claims, taxonomy use, terminal state shape, recorder/export compatibility, and semantic reference cases.
The SOFA path is now represented by a dependency-gated no-physics runtime foothold. SofaBackend imports safely and exposes metadata even when SOFA is not installed. Scene-plan mapping, minimal root-scene ownership, typed runtime metadata authority, reset-time runtime-state extraction, a runtime-owned lifecycle step index, and initial state/observation access are implemented; SOFA step(), physics, rendering, contacts, recorder/export compatibility, and replay remain intentionally unimplemented.
From the repository root:
python -m venv .venv
source .venv/bin/activate
python -m pip install -e ".[dev]"Run one deterministic episode:
python scripts/run_scene.py --scene configs/base_scene.yaml --scenario configs/scenarios/normal.yaml --seed 1Generate a small multi-scenario dataset:
python scripts/generate_dataset.py --scene configs/base_scene.yaml --scenarios configs/scenarios/*.yaml --episodes 10 --seed 100Evaluate an artifacts run directory:
python scripts/run_eval.py --artifacts artifacts/runs/<run_id>Validate deterministic replay for a run:
python scripts/replay_validate.py --artifacts artifacts/runs/<run_id>Validate one episode:
python scripts/replay_validate.py --summary artifacts/runs/<run_id>/episodes/episode_0001_summary.jsonCheck the backend contract:
python scripts/check_backend_contract.py --backend placeholder --scene configs/base_scene.yaml --scenarios configs/scenarios/normal.yaml configs/scenarios/camera_occlusion.yamlInspect the honest SOFA stub compliance report:
python scripts/check_backend_contract.py --backend sofa --scene configs/base_scene.yaml --scenarios configs/scenarios/normal.yamlRun tests:
pytestGenerated runs are written under:
artifacts/
runs/
<run_id>/
run_manifest.json
config_snapshot.json
aggregate_metrics.json
replay_validation.json
episodes/
episode_0001.jsonl
episode_0001_summary.json
episode_0001_replay_validation.json
compliance/
placeholder_<timestamp>/
backend_contract_report.json
run_manifest.json records provenance such as medsim version, Python version, backend, task, scene/scenario paths, seeds, recorder settings, and best-effort git commit. config_snapshot.json stores the fully resolved scene and scenarios used for replay. JSONL step logs contain action records, observations, state-derived fields, camera condition, and canonical event streams. Summary JSON files contain episode metadata, active perturbations, derived effects, terminal state, outcome labels, and metrics. Replay validation files compare a re-executed action trace against the original episode.
The placeholder backend emits canonical event labels from medsim.sim.taxonomy, including episode start/termination, camera occlusion, visibility degradation, target shift, needle grasp/release, unstable grasp, tool collision, target passage, success, and failure. Failure labels are also canonical: max steps exceeded, collision limit exceeded, workspace violation, invalid state transition, unstable needle control, excessive occlusion, and missed target. These labels are intentionally stable because dataset generation and evaluation depend on their semantics.
SOFA is optional and is not listed as a package dependency. The current SOFA adapter provides:
- safe dependency detection through
medsim.sim.sofa.availability; SofaBackend.backend_info()metadata without importing SOFA at package import time;- typed scene-plan mapping through
build_sofa_scene_plan(scene_config, scenario); - minimal runtime ownership through
SofaRuntimeHandlewhen SOFA is installed; - typed static runtime metadata through
SofaRuntimeMetadata; - reset-time medsim-compatible state and observation access through
extract_runtime_state(runtime). - a no-physics runtime-owned step index that extractor-backed state/observation can read.
The adapter does not yet run a simulation step, export step-time SOFA runtime state as recorder artifacts, render images, compute contacts, record SOFA episodes, or claim deterministic replay. Runtime metadata is still static and metadata-derived, and the runtime index is lifecycle bookkeeping only; neither means the surgery, tools, needle, or tissue dynamically changed. Calling SofaBackend.initialize() without SOFA installed raises a clear SofaDependencyError. Running backend compliance against sofa should complete and write a failing report rather than crash; that failure is currently the honest result.
The repository now ships a small local developer workbench that makes the
placeholder simulation visible and demoable. It is a truthful projection of
the existing backend, not a separate fake simulator. See
docs/workbench.md for the full design.
The workbench consists of:
- a FastAPI backend under
src/medsim/apithat exposes prompt compile, scene preview, and stepwise placeholder run orchestration; - a workbench layer under
src/medsim/workbench(prompt compiler, run session manager, viewer payload builder); - a React + Vite + react-three-fiber frontend under
frontend/.
Install the optional dependencies:
python -m pip install -e ".[dev,workbench]"Run the API (default http://127.0.0.1:8000):
python -m medsim.api --scene configs/base_scene.yaml --scenario-dir configs/scenariosIn a second terminal, run the UI (default http://127.0.0.1:5173, proxies
/api/* to the FastAPI server):
cd frontend
npm install
npm run devOpen http://127.0.0.1:5173, type a prompt like
"Run a laparoscopic needle passing scenario with camera occlusion", click
Compile, then Run. The 3D viewer, event timeline, compiled scenario
JSON, and artifact paths are all driven by the same placeholder backend that
the CLI uses.
Truthfulness notes:
- The placeholder backend is the only backend that runs live in the workbench.
- The SOFA adapter is exposed only as a scene-plan / metadata preview.
- No fake physics, deformation, or collision simulation is introduced by the workbench layer.
- Use replay validation as the regression gate for deterministic infrastructure changes.
- Use backend compliance as the gate for any backend integration work.
- Extend the SOFA runtime foothold selectively: replace one static metadata field with a truly runtime-read value where SOFA exposes it, before attempting
step(). - Add richer task parameterization and scripted policy variants only when they preserve action trace replay.
- Add multimodal export hooks once rendering and sensor streams are real.
- Expand evaluation into regression suites for rare surgical edge cases.