Skip to content

fix(inject): gate backend dispatch when traced Python ABI != bundled#107

Merged
TrevorBasinger merged 1 commit into
mainfrom
cg/abi-gate-backend-init
May 15, 2026
Merged

fix(inject): gate backend dispatch when traced Python ABI != bundled#107
TrevorBasinger merged 1 commit into
mainfrom
cg/abi-gate-backend-init

Conversation

@christophergeyer
Copy link
Copy Markdown
Member

Summary

uv tool install roar-cli installs roar under one CPython (whatever uv picks — typically 3.13). When a user then runs roar run python3 … against a different interpreter (system 3.12, a venv 3.11, etc.), roar's injected backend init pulls compiled deps tagged for the install Python into the traced Python. That mismatch surfaces as ImportError: cannot import name 'Sentinel' from 'typing_extensions' deep in pydantic_core's import chain — a 30-line traceback that looks like a tool bug, and that I've seen pretty cleanly bisected in an engineer-feedback friction journal: "This is the moment a real user quits."

The injection mechanism itself is ABI-safe. The bomb is concentrated in two paths:

  • RuntimeImportController.initialize_selected_backend() — fires once during sitecustomize.
  • RuntimeImportController.handle_import() — fires on every traced-process import, can lazily load a matched backend's plugin.

Both end up importing the selected backend's plugin (Ray, OSMO), which pulls pydantic + pydantic_core and explodes.

This PR detects the ABI mismatch at sitecustomize entry and disables backend dispatch without disturbing the stdlib tracker hooks.

How it works

  1. Detect roar's bundled ABIbundled_abi_tag(inject_dir) walks up to the enclosing site-packages and parses cpython-NNN from a known compiled dep's .so filename (pydantic_core preferred, falls back to blake3). Returns None if the layout doesn't look like a wheel install — in which case we don't gate.
  2. Compare to the running interpreterabi_minor_version(...) extracts (major, minor) from both forms (cp313 from sys.implementation.cache_tag, cpython-313 from .so filename) into a comparable tuple.
  3. Gate — on mismatch under ROAR_WRAP=1, print an actionable stderr line and call _runtime_import_controller.disable_backend_dispatch(). Both initialize_selected_backend and handle_import short-circuit.

What stays running on mismatch:

  • File-open wrapping (tracking_open)
  • Import-name recording (tracking_import, minus the backend dispatch)
  • Env-var read recording (patched_environ_get)
  • write_log atexit
  • Syscall-level capture (eBPF/preload/ptrace) — unaffected, it's a separate process

What's disabled on mismatch:

  • Ray driver/worker hooks
  • OSMO backend hooks
  • Anything else that registered a RuntimeImportAdapter (today: just those two)

These were effectively already gated on Python-ABI matching — they just failed loudly mid-pipeline instead of refusing at startup with a fix-it.

Stderr on mismatch

roar: traced Python is 3.12 but roar-cli was installed under Python 3.13.
  Backend integrations (Ray, OSMO) are disabled for this run.
  File I/O is still captured.
  To re-enable backends, reinstall under the matching Python:
    uv tool install --python python3.12 roar-cli --force

Test plan

  • New unit tests for bundled_abi_tag (pydantic-core fixture, blake3 fallback, no-compiled-deps None, no-site-packages None) and abi_minor_version (both forms, unparseable inputs).
  • New unit tests for disable_backend_dispatch short-circuiting initialize_selected_backend and handle_import (the matched-backend path also writes to _environ, asserted unchanged).
  • Existing test_pth_pydantic_import.py::test_pth_import_does_not_require_pydantic still passes — the helper import in sitecustomize.py is hoisted above _runtime_tracker.install() so it doesn't go through the patched __import__.
  • Full tests/execution/ sweep (54 tests) clean.
  • ruff check + ruff format --check clean.

What this isn't

This is the gate, not the fix. The actionable message tells the user how to reinstall under the matching Python, but it's a manual step. The next PR is lazy per-ABI runtime install on roar run — keep uv tool install as the recommended path, install the right per-ABI runtime tree on demand, transparently. That's larger work (manifest split, install backend, locking, version-stamping). This PR rescues the modal first-run experience independently.

🤖 Generated with Claude Code

When `uv tool install roar-cli` installs roar under one CPython (e.g.,
3.13) and `roar run python3 ...` traces a different one (e.g., system
3.12), the injected backend init tries to load roar's bundled
`pydantic_core` (ABI-tagged for 3.13) into the wrong interpreter. The
chain crashes with `ImportError: cannot import name 'Sentinel' from
'typing_extensions'`, which looks like a user/library bug but is the
canonical "wrong-ABI wheel loaded into wrong Python" failure.

The injection mechanism itself is ABI-safe (it's just `.pth` +
sitecustomize). The bomb is in `initialize_selected_backend()` and the
matched-backend branch of `handle_import` — both end up importing the
selected backend's plugin, which pulls compiled deps.

This change:

- Parses the bundled ABI tag from a known compiled dep's `.so`
  filename (`pydantic_core`, falling back to `blake3`) and compares
  against `sys.version_info` at sitecustomize entry.
- On mismatch under `ROAR_WRAP=1`, prints an actionable stderr line
  with the reinstall command and calls a new
  `RuntimeImportController.disable_backend_dispatch()` to no-op both
  init paths. The stdlib tracker hooks (file opens, env reads,
  imported-module names) keep running, so file I/O capture and per-job
  Python metadata are unaffected.
- Leaves Ray/OSMO disabled-but-honest on mismatch instead of crashing
  three stack frames into pydantic.

When the bundled ABI can't be detected (no recognizable compiled deps
in site-packages — e.g., source checkouts, editable installs), behave
as today: don't gate. Better to crash on a real mismatch than to
falsely refuse.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@TrevorBasinger TrevorBasinger merged commit 518b969 into main May 15, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants