In the modern ML ecosystem, dependencies are chaotic. You finally resolve all pip conflicts, launch your training job on a $30/hr H100 cluster, go to sleep, and wake up to find it crashed 5 minutes in because flash-attn wasn't compiled for your specific CUDA version, or xformers silently mismatched with your torch build.
Huge amounts of money and time are wasted every day on silly errors on massive GPU hardware.
Enter env-doctor.
env-doctor is a local-first runtime compatibility and intelligence platform designed specifically for the complex HuggingFace + PyTorch + CUDA ecosystem. It operates on a simple but powerful premise:
"If one user faces a runtime failure due to environment issues, no other user will ever face it again."
Traditional package managers focus on "Can these packages be installed together?" (pip check).
env-doctor focuses on "Will this stack actually work at runtime on your exact hardware?"
We save you from Out-Of-Memory (OOM) errors, silent CUDA fallback performance drops, and undocumented API breaking changes before you even provision a GPU instance.
- 🛡️ Community-Driven Intelligence: Powered by a curated, GitHub-hosted intelligence database. When the community discovers a runtime incompatibility, it's vetted by our AI agents (powered by Watsonx Orchestrate) and pushed to the global database.
- 🧠 Smart VRAM Estimation: Stop guessing if a model will fit. Precise OOM detection accounting for model weights, quantization (
int8,fp16, etc.), KV cache size, sequence lengths, and runtime fragmentation across backends likevllm,transformers,llama.cpp, andtgi. - 🚀 Stable Stack Recommendations: Don't know which versions of
torch,transformers, andcudaplay nicely together today?env-doctoranalyzes your OS and hardware to recommend community-tested, rock-solid dependency stacks. - 🔍 Deep Compatibility Checking: Scans your
requirements.txtorpyproject.tomlagainst known ABI conflicts, CUDA mismatches, and undocumented breakages. - 🤖 AI-Powered Bug Reporting: Run a failing script through
env-doctor report-incompatibility. It captures the stack trace, system state, and outputs, securely submits it to an MCP-powered Watsonx agent, verifies the environment issue, and generates a new rule to protect everyone else.
Install env-doctor globally using uv (recommended) or pip:
pip install env-doctor-pypi
# or
uv tool install env-doctor-pypiPull the latest community intelligence locally so you can run checks completely offline:
env-doctor update-dbAnalyze your project requirements for hidden runtime risks and CUDA mismatches:
env-doctor check requirements.txtOutput snippet:
🔴 Critical Issue: torch 2.1.0 ↔ flash-attn 2.5.0
Description: flash-attn 2.5.0 requires torch>=2.1.1. Will result in segmentation fault at runtime.
Workaround: Upgrade torch to 2.1.1+ or downgrade flash-attn.
Will Llama-3-8B fit on your 24GB RTX 3090 with a 32k context window using vLLM?
env-doctor vram --model meta-llama/Llama-2-7b-hf --runtime vllm --seq-len 32768 --quant fp16Tell env-doctor what you have, and get a list of guaranteed-to-work combinations:
env-doctor recommendenv-doctor gets smarter every day thanks to the community. If you encounter a bizarre, undocumented combination of packages that causes a runtime failure:
- Run your script via our reporter:
env-doctor report-incompatibility broken_script.py --submit
- Our local tool bundles the source, outputs, tracebacks, and environment snapshot into a Markdown report.
- The report is submitted to the env-doctor Watsonx Orchestrate Agent.
- The AI strictly verifies the failure is due to an environment/package incompatibility.
- If verified, the AI uses an MCP Server to automatically draft and commit a new rule to the GitHub compatibility database.
You just saved thousands of developers from debugging the exact same error.
- Runtimes Profiling:
vllm,transformers,tgi,deepspeed,tensorrt-llm,llama.cpp,onnxruntime - Target Systems: Linux (First-class), Windows (Beta)
- Hardware Profiling: NVIDIA GPUs (CUDA)
This project is licensed under the GNU AGPL v3 License. See the LICENSE file for details.