-
Notifications
You must be signed in to change notification settings - Fork 118
Open
Labels
bugSomething isn't workingSomething isn't working
Description
I am trying to run RF3 on a Google Collab T4 GPU following the example in the IPD Design Pipeline Collab but consistently get this error when I try to run prediction on a batch of sequences:
INFO:rf3.inference_engines.rf3:[rank: 0] Loading checkpoint from /root/.foundry/checkpoints/rf3_foundry_01_24_latest_remapped.ckpt...
WARNING:atomworks.ml:Using element type for atom names of atomized tokens.
INFO: Using bfloat16 Automatic Mixed Precision (AMP)
INFO:lightning.pytorch.utilities.rank_zero:Using bfloat16 Automatic Mixed Precision (AMP)
INFO:rf3.inference_engines.rf3:[rank: 0] Outputs will be written to /content/Predictions.
INFO:rf3.inference_engines.rf3:[rank: 0] Found 2 structures to predict!
INFO:rf3.inference_engines.rf3:[rank: 0] Predicting structure 1/2: seq0
WARNING:atomworks.ml:Cached data not found for ALA at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/A/ALA/ALA.pt
WARNING:atomworks.ml:Cached data not found for ARG at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/A/ARG/ARG.pt
WARNING:atomworks.ml:Cached data not found for ASN at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/A/ASN/ASN.pt
WARNING:atomworks.ml:Cached data not found for ASP at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/A/ASP/ASP.pt
WARNING:atomworks.ml:Cached data not found for ATP at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/A/ATP/ATP.pt
WARNING:atomworks.ml:Cached data not found for GLN at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/G/GLN/GLN.pt
WARNING:atomworks.ml:Cached data not found for GLU at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/G/GLU/GLU.pt
WARNING:atomworks.ml:Cached data not found for GLY at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/G/GLY/GLY.pt
WARNING:atomworks.ml:Cached data not found for HIS at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/H/HIS/HIS.pt
WARNING:atomworks.ml:Cached data not found for ILE at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/I/ILE/ILE.pt
WARNING:atomworks.ml:Cached data not found for LEU at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/L/LEU/LEU.pt
WARNING:atomworks.ml:Cached data not found for LYS at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/L/LYS/LYS.pt
WARNING:atomworks.ml:Cached data not found for MET at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/M/MET/MET.pt
WARNING:atomworks.ml:Cached data not found for PHE at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/P/PHE/PHE.pt
WARNING:atomworks.ml:Cached data not found for PRO at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/P/PRO/PRO.pt
WARNING:atomworks.ml:Cached data not found for SER at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/S/SER/SER.pt
WARNING:atomworks.ml:Cached data not found for THR at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/T/THR/THR.pt
WARNING:atomworks.ml:Cached data not found for TRP at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/T/TRP/TRP.pt
WARNING:atomworks.ml:Cached data not found for TYR at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/T/TYR/TYR.pt
WARNING:atomworks.ml:Cached data not found for VAL at /net/tukwila/lschaaf/datahub/MACE-OMOL-Jul2025/mace_embeddings/V/VAL/VAL.pt
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[/tmp/ipykernel_391/4213100572.py](https://localhost:8080/#) in <cell line: 0>()
1 # Run RF3 prediction for all designed sequences
2 # Set an output directory to save predicted strucutres into local file system
----> 3 rf3_all_outputs = inference_engine.run(inputs=rf3_inputs, out_dir="Predictions", annotate_b_factor_with_plddt=True)
50 frames
[/usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/compiler.py](https://localhost:8080/#) in make_llir(self, src, metadata, options, capability)
339 if os.environ.get("TRITON_DISABLE_LINE_INFO", "0") == "0":
340 passes.llvmir.add_di_scope(pm)
--> 341 pm.run(mod)
342 # LLVM-IR (MLIR) -> LLVM-IR (LLVM)
343 llvm.init_targets()
RuntimeError: PassManager::run failed
After some googling it sounds like this is an internal GPU issue and not code related, but I find it odd that it happens every time I try to run RF3.
Is this a known bug, and do you have any advice for running RF3 on Collab?
Thanks, Ben
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working