If Claude Code crashes mid-run, or the shell dies, or the machine reboots while evo run <id> is in flight, the experiment is left with status: "running" in graph.json forever. There's no command to detect this or recover.
What you have to do today:
- Notice via
evo scratchpad or evo status that something is stuck on running
- Manually figure out whether the underlying subprocess is actually still alive
- If not,
evo discard <id> to mark it dead and let the orchestrator move on
- Lose any partial trace data because the run never completed
That's all manual and easy to forget. For autonomous setups, it can leave the optimization loop spinning on phantom in-flight experiments.
What I'd want:
evo recover command that walks all running nodes, checks whether their subprocess is alive (PID-based or timeout-based), and either (a) marks them failed with a "process disappeared" reason, or (b) prompts to retry.
evo status to surface stale running nodes prominently rather than burying them in the dump.
- Optionally: a heartbeat file the runner touches periodically, so detection doesn't depend on tracking PIDs across crashes.
Related but separate: concurrent Claude Code sessions on the same workspace can race on next_id allocation in evo new even with advisory locking. Edge case but real for power users.
If Claude Code crashes mid-run, or the shell dies, or the machine reboots while
evo run <id>is in flight, the experiment is left withstatus: "running"ingraph.jsonforever. There's no command to detect this or recover.What you have to do today:
evo scratchpadorevo statusthat something is stuck on runningevo discard <id>to mark it dead and let the orchestrator move onThat's all manual and easy to forget. For autonomous setups, it can leave the optimization loop spinning on phantom in-flight experiments.
What I'd want:
evo recovercommand that walks allrunningnodes, checks whether their subprocess is alive (PID-based or timeout-based), and either (a) marks themfailedwith a "process disappeared" reason, or (b) prompts to retry.evo statusto surface stalerunningnodes prominently rather than burying them in the dump.Related but separate: concurrent Claude Code sessions on the same workspace can race on
next_idallocation inevo neweven with advisory locking. Edge case but real for power users.