fix(inject): gate backend dispatch when traced Python ABI != bundled#107
Merged
Conversation
When `uv tool install roar-cli` installs roar under one CPython (e.g., 3.13) and `roar run python3 ...` traces a different one (e.g., system 3.12), the injected backend init tries to load roar's bundled `pydantic_core` (ABI-tagged for 3.13) into the wrong interpreter. The chain crashes with `ImportError: cannot import name 'Sentinel' from 'typing_extensions'`, which looks like a user/library bug but is the canonical "wrong-ABI wheel loaded into wrong Python" failure. The injection mechanism itself is ABI-safe (it's just `.pth` + sitecustomize). The bomb is in `initialize_selected_backend()` and the matched-backend branch of `handle_import` — both end up importing the selected backend's plugin, which pulls compiled deps. This change: - Parses the bundled ABI tag from a known compiled dep's `.so` filename (`pydantic_core`, falling back to `blake3`) and compares against `sys.version_info` at sitecustomize entry. - On mismatch under `ROAR_WRAP=1`, prints an actionable stderr line with the reinstall command and calls a new `RuntimeImportController.disable_backend_dispatch()` to no-op both init paths. The stdlib tracker hooks (file opens, env reads, imported-module names) keep running, so file I/O capture and per-job Python metadata are unaffected. - Leaves Ray/OSMO disabled-but-honest on mismatch instead of crashing three stack frames into pydantic. When the bundled ABI can't be detected (no recognizable compiled deps in site-packages — e.g., source checkouts, editable installs), behave as today: don't gate. Better to crash on a real mismatch than to falsely refuse. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
TrevorBasinger
approved these changes
May 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
uv tool install roar-cliinstalls roar under one CPython (whateveruvpicks — typically 3.13). When a user then runsroar run python3 …against a different interpreter (system 3.12, a venv 3.11, etc.), roar's injected backend init pulls compiled deps tagged for the install Python into the traced Python. That mismatch surfaces asImportError: cannot import name 'Sentinel' from 'typing_extensions'deep inpydantic_core's import chain — a 30-line traceback that looks like a tool bug, and that I've seen pretty cleanly bisected in an engineer-feedback friction journal: "This is the moment a real user quits."The injection mechanism itself is ABI-safe. The bomb is concentrated in two paths:
RuntimeImportController.initialize_selected_backend()— fires once during sitecustomize.RuntimeImportController.handle_import()— fires on every traced-process import, can lazily load a matched backend's plugin.Both end up importing the selected backend's plugin (Ray, OSMO), which pulls
pydantic+pydantic_coreand explodes.This PR detects the ABI mismatch at sitecustomize entry and disables backend dispatch without disturbing the stdlib tracker hooks.
How it works
bundled_abi_tag(inject_dir)walks up to the enclosingsite-packagesand parsescpython-NNNfrom a known compiled dep's.sofilename (pydantic_corepreferred, falls back toblake3). ReturnsNoneif the layout doesn't look like a wheel install — in which case we don't gate.abi_minor_version(...)extracts(major, minor)from both forms (cp313fromsys.implementation.cache_tag,cpython-313from.sofilename) into a comparable tuple.ROAR_WRAP=1, print an actionable stderr line and call_runtime_import_controller.disable_backend_dispatch(). Bothinitialize_selected_backendandhandle_importshort-circuit.What stays running on mismatch:
tracking_open)tracking_import, minus the backend dispatch)patched_environ_get)write_logatexiteBPF/preload/ptrace) — unaffected, it's a separate processWhat's disabled on mismatch:
RuntimeImportAdapter(today: just those two)These were effectively already gated on Python-ABI matching — they just failed loudly mid-pipeline instead of refusing at startup with a fix-it.
Stderr on mismatch
Test plan
bundled_abi_tag(pydantic-core fixture, blake3 fallback, no-compiled-deps None, no-site-packages None) andabi_minor_version(both forms, unparseable inputs).disable_backend_dispatchshort-circuitinginitialize_selected_backendandhandle_import(the matched-backend path also writes to_environ, asserted unchanged).test_pth_pydantic_import.py::test_pth_import_does_not_require_pydanticstill passes — the helper import insitecustomize.pyis hoisted above_runtime_tracker.install()so it doesn't go through the patched__import__.tests/execution/sweep (54 tests) clean.ruff check+ruff format --checkclean.What this isn't
This is the gate, not the fix. The actionable message tells the user how to reinstall under the matching Python, but it's a manual step. The next PR is lazy per-ABI runtime install on
roar run— keepuv tool installas the recommended path, install the right per-ABI runtime tree on demand, transparently. That's larger work (manifest split, install backend, locking, version-stamping). This PR rescues the modal first-run experience independently.🤖 Generated with Claude Code