feat(runtime): lazy-install matching ABI runtime tree for cross-Python roar run#108
feat(runtime): lazy-install matching ABI runtime tree for cross-Python roar run#108christophergeyer wants to merge 7 commits into
Conversation
…n roar run Solves the cross-Python first-run experience: `uv tool install roar-cli` installs roar under one CPython (typically uv's default), but the user runs `roar run python3 …` against a different one. Without a matching ABI tree, backend dispatch loads roar's bundled (wrong-ABI) compiled deps and crashes inside pydantic_core. The previous PR (#107) gates that crash gracefully. This one closes the loop: probe the target Python's ABI before launching, install a matching runtime tree at `~/.cache/roar/runtime/<tag>/` on the fly (or use the existing cache), and prepend it to ROAR_RUNTIME_PYTHONPATH so the traced process picks up ABI-correct compiled deps. What's added - `roar/execution/runtime/abi_probe.py` — `probe_python_abi(executable)` runs the target Python in a one-shot subprocess to read its `sys.implementation.cache_tag`. Bails fast for non-Python targets (bash/make/etc.) so `roar run cmd` doesn't pay a probe cost for things it couldn't lazy-install for anyway. - `roar/execution/runtime/lazy_install.py` — XDG-respecting cache (`$XDG_CACHE_HOME/roar/runtime/<tag>/` or `~/.cache/...`), atomic install via `uv pip install --target --python` (with plain pip fallback), roar-version-stamped cache invalidation, and the `ensure_runtime(...)` orchestrator. Failures return None — the sitecustomize gate handles fallback. - `runtime.install` config key — `auto` (default, lazy install on mismatch) or `skip` (use bundled only; backend dispatch off on mismatch). Env `ROAR_RUNTIME_INSTALL` overrides. The `skip` mode covers restricted-network containers where lazy-install would fail anyway. - `TracerService._lazy_install_runtime_entries(command, roar_dir)` — hooks the probe + ensure_runtime into `execute()` right before `ROAR_RUNTIME_PYTHONPATH` is set. Gate refactor (touches #107 territory) The original ABI-tag check (`bundled_abi_tag` + `abi_minor_version`) parsed roar's bundled `.so` filenames. That works for the bundled-only case but is blind to a lazy-installed runtime tree on the same path. Replaced with `matching_compiled_pydantic_core(sys.path, expected_soabi)` — walks sys.path for a pydantic_core SO whose filename matches the running interpreter's SOABI. Composes naturally with the lazy-install path (matching SO anywhere on sys.path satisfies the gate). The old helpers are kept (still tested) for future use and to avoid churning the API surface of `support.py`. User-visible - Lazy-install path emits a single 🦖 line on cache miss: 🦖 installing roar runtime for cp312 ... - Cache hits are silent. - Skip mode + ABI mismatch falls through to the gate's actionable message, which now also recommends `pip install roar-cli` for single-Python container environments. Test coverage - `test_abi_probe.py`: success / non-Python target / subprocess failure / blank stdout / versioned python names. - `test_lazy_install.py`: cache layout, stamp invalidation, atomic install (mocked subprocess, real tempdir/rename), failure paths, mode resolution (env / config / default / case normalization), and the `ensure_runtime` decision tree (match / skip / cache-hit / cache-miss-install / install-fail). - `test_inject_support.py`: `matching_compiled_pydantic_core` finds in bundled, finds in lazy runtime, returns False on mismatch / missing pkg / wrong extension / blank entries. 968 unit tests passing, 1 pre-existing skipped. Stacked on #107 (the gate). When #107 merges, this branch rebases cleanly onto main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Friction-journal feedback on #108: after the cross-Python fixes, `roar show @N` reports `Environment: ... Python 3.13.x` for jobs whose command was system `python3` (3.12). The Python identity in job metadata was being captured from `platform.python_version()` in roar's own host process (`runtime_collector.collect`), not from the traced child. Right at the seam roar is selling reproducibility on. Captures `python_version` + `python_implementation` from `platform.{python_version,python_implementation}()` *inside* the traced process in `tracker.write_log`, threads them through the `PythonInjectData` model, and has `runtime_collector` prefer the traced values (falling back to host values when the inject log is missing the fields, e.g. older logs). Also: stderr remediation in `sitecustomize.py` now suggests `uv tool install --python pythonX.Y roar-cli --reinstall` instead of `--force`. Same effect, accurate user-facing semantics, less alarming. Tests: - `test_runtime_tracker_writes_expected_log_payload` asserts `python_version` (with at least 2 dots, e.g. "3.12.3") and a non-empty `python_implementation`. - 968 unit tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Both review points addressed in the latest push (74958a1): 1.
|
…system
Friction-journal reported two bugs blocking the lazy-install path:
1. `from roar import __version__` failed because `roar/__init__.py` was
empty. The tracer launcher's `try/except Exception` swallowed it as
a debug log, so the entire `_lazy_install_runtime_entries` path
returned [] silently — no 🦖 line, fall through to the gate. Dead
code on every install.
Adds canonical `__version__` to `roar/__init__.py` via
`importlib.metadata.version("roar-cli")` with a sentinel fallback
when metadata is absent. Narrows the launcher's except to
ImportError and bumps the log level to warning — an ImportError on
these internal modules is a contract violation, not a user
environment issue, and shouldn't hide.
2. `sitecustomize._append_roar_runtime_pythonpath` *appended*
ROAR_RUNTIME_PYTHONPATH entries to sys.path, so the lazy-installed
cache landed at the END. System site-packages (with stale
typing_extensions / pydantic) won. pydantic_core loaded from the
cache but its `from typing_extensions import Sentinel` resolved to
the system's older copy and crashed exactly like before.
Renamed to `_prepend_roar_runtime_pythonpath`. Uses
`sys.path[:0] = new_paths` to prepend the whole list in declared
order, so the cache dir lands at `sys.path[0]`. The original "let
the workload's own venv keep precedence" intent — preserved by
appending — only mattered when the user already had matching deps.
In the lazy-install scenario the gate didn't trip, which means the
user's env didn't have matching deps; roar's ABI-matched copy is
the better answer.
Also addresses the user's diagnostic comment: `probe_python_abi`'s
narrow `except (OSError, SubprocessError)` let a test-mocked
subprocess.Popen leak a ValueError out into the launcher. Broadened
to `except Exception` with a comment — the probe's contract is "tag
or None", weird mock returns degrade to None.
🦖 message gains a one-time/cached tail so a slow network doesn't
read as a hang:
🦖 installing roar runtime for cp312 ... (one-time per Python; cached)
Tests:
- `tests/unit/test_roar_version.py` — `from roar import __version__`
must resolve to a non-empty string. Catches the regression at the
package level so it can't slip through.
- `tests/execution/runtime/test_sitecustomize_path_order.py` —
subprocess test: when roar isn't already importable, the entries
from ROAR_RUNTIME_PYTHONPATH land at the front of sys.path in
declared order. When roar IS already importable (the merged case),
the function early-returns and leaves sys.path alone.
971 unit tests passing, 1 pre-existing skipped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Both lazy-install blockers fixed in 1.
|
…seam Friction-journal caught the missing reader-side half of 74958a1: the tracker writes `python_version` and `python_implementation` into the inject JSON, but `DataLoaderService.load_python_data` was constructing `PythonInjectData` without extracting those keys. They defaulted to "" and `runtime_collector` then fell back to `platform.python_version()` — host Python, not the traced child's. So `roar show @N` still reported the host's 3.13.12 for a 3.12 traced run. Two missing lines in data_loader.py plus tests at the seam: - `test_python_identity_keys_flow_through` — explicit reader-side test: python_version + python_implementation make it from a JSON payload into the model. - `test_python_identity_keys_default_to_empty_for_older_logs` — back-compat: older inject logs without the keys still load cleanly. - `test_writer_reader_roundtrip_carries_python_identity` — the reviewer's specifically-requested coverage: the real tracker writes, the real loader reads, the version comes out non-empty. Would have caught both halves of the original asymmetry — if either side regresses, this test fails. 974 unit tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Reviewer's catch addressed in `b1cd824` (just pushed). You were right — the writer/reader asymmetry was the textbook missing half. `tracker.write_log` was writing `python_version` / `python_implementation` into the inject JSON, but `DataLoaderService.load_python_data` wasn't extracting those keys when constructing `PythonInjectData`. They defaulted to `""`, so `runtime_collector` hit the `or platform.python_version()` fallback and stamped roar's host Python on the job. Two missing lines. Tests added at the seam — the third one is the round-trip you specifically asked for:
In-process can only round-trip with the test runner's interpreter, so it asserts the seam (non-empty, dot-separated version) rather than a specific value. The end-to-end "actual mismatched Python" test you described is still the strongest version — that's worth doing as an integration test, but it needs a second CPython on PATH in CI. Worth following up on after the merge. 974 unit tests, 1 skipped, ruff clean. |
CI's repo-wide `ruff format --check` caught a few black-style line wraps in the lazy-install test file that my targeted format runs earlier in this branch missed. Pure formatting; no behavioral change.
…install `runtime_install_mode(start_dir: Path | None = None)` was passing a `Path` through to `config_get`, which is typed as `str | None`. mypy caught it in CI. One-line coercion at the call site. Full-repo mypy (361 source files) clean.
…able CI's test-macos (macos-15-intel, x86_64) saw two integration-test timeouts on this branch — `test_dag_deep_nesting` and `test_register_private_without_project_binding_uses_current_user_scope`. Both run several `roar run` invocations in a single test and ride the 60s pytest-timeout. macOS arm64 and Linux all passed; the smoking gun is Intel macOS framework Python's notoriously slow startup combined with the probe subprocess this branch adds to every `roar run`. The modal case for `roar run python foo.py` is that the target Python is exactly the one roar-cli itself runs under (especially in tests where the fixture provides `python_exe` = the test runner's interpreter). In that case the ABI matches by construction and the probe subprocess is pure overhead — N python startups per N-step test. Adds a fast path that resolves both `command[0]` and `sys.executable` through `shutil.which` + `os.path.realpath` and returns early on match. Probe still runs when the target genuinely differs (the cross-Python case lazy-install was built for). Verified locally: - ruff check + ruff format --check (repo-wide) - mypy roar (361 source files) - full unit + execution test sweep (974 passed, 1 skipped)
Summary
Stacked on #107. Closes the loop on the cross-Python
roar runstory: when the traced Python's ABI doesn't match roar-cli's bundled deps, lazy-install a matching runtime tree at `~/.cache/roar/runtime//` and prepend it to `ROAR_RUNTIME_PYTHONPATH`. The traced process picks up ABI-correct `pydantic_core` / `blake3` from the cache, and the sitecustomize gate (whose detection is also refactored here) sees a matching SO on `sys.path` and lets backend dispatch proceed.Trigger: `roar run`, not `roar init`. Init-time prefetch was tempting but the user might never invoke Python; `roar run` is the first moment a concrete target exists.
User-visible behavior
What changed
Gate refactor (#107 area)
The original ABI-tag check in `sitecustomize.py` parsed roar's bundled `.so` filenames. That's correct for the bundled-only case but blind to a lazy-installed runtime tree on the same `sys.path`. Replaced with `matching_compiled_pydantic_core(sys.path, expected_soabi)` — walks sys.path for a `pydantic_core/*.so` whose filename matches the running interpreter's SOABI. Composes naturally with lazy-install: a matching SO anywhere satisfies the gate.
The old `bundled_abi_tag` + `abi_minor_version` helpers are retained (still tested) for future use and to keep the `support.py` surface stable.
Test plan
Out of scope
Merge sequencing
Stacked on `cg/abi-gate-backend-init` (PR #107). When #107 lands on `main`, this branch rebases cleanly. The diff against #107's branch is the meat of this PR; the diff against main shows both PRs' changes together.
🤖 Generated with Claude Code