-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alphafold runs will not find the GPU #1029
Comments
Hi, |
There might be an issue with the AlphaFold execution script. First, verify if the container can access GPU properly: docker run --rm -it --gpus all --entrypoint /bin/bash alphafold Inside the container, check if nvidia-smi and jax library are properly connected: nvidia-smi
python -c "import jax; nmp = jax.numpy.ones((20000, 20000)); print('Device:', nmp.device()); result = jax.numpy.dot(nmp, nmp); print('Done')" If these work normally, the issue might be with the docker-py library. You can verify this by running the following test: import unittest
import docker
class TestDocker(unittest.TestCase):
def test_docker(self):
client = docker.from_env()
device_requests = [
docker.types.DeviceRequest(
driver="nvidia",
capabilities=[["gpu"]],
)
]
logs = client.containers.run(
"nvidia/cuda:12.2.2-runtime-ubuntu20.04",
"nvidia-smi",
runtime="nvidia",
device_requests=device_requests,
remove=True,
)
print(logs.decode("utf-8"))
if __name__ == "__main__":
unittest.main() If this test runs successfully and shows nvidia-smi output, look for other potential issues. If the test fails, the issue is likely with docker-py's GPU device recognition. You can fix this by modifying the AlphaFold script: # alphafold/docker/run_docker.py
# Original code - line 232
client = docker.from_env()
device_requests = [
docker.types.DeviceRequest(driver='nvidia', capabilities=[['gpu']])
] if FLAGS.use_gpu else None
# Modified code
client = docker.from_env()
device_requests = (
[docker.types.DeviceRequest(driver="nvidia", capabilities=[["gpu"]], count=-1)]
if use_gpu
else None
) I encountered this issue when using docker-py==5.0.0 with the latest system Docker version. The exact cause is unclear, but it appears to be related to GPU device recognition between docker-py and the Docker daemon. The issue can be resolved by adding the If you're experiencing similar issues, try the modification shown in the code above. I hope this solution works for your case. If not, please let me know and we can explore other potential solutions. This issue appears to be version-specific between docker-py and Docker daemon, so there might be alternative approaches worth investigating. |
Thanks so much for the response. Modifying the run_docker.py script with I ran the recommended tests inside the container and those passed. I was not sure how to create the docker-py library test script within the container, so I could not run it there. Running outside the container just gave a bunch of errors. Update: I do still have an issue with the minimization portion. But as other's have noted the GPU isn't really necessary for the relaxation steps, so using |
Sometime in the past several months, my Alphafold install stopped being able to find and use the GPU (nvidia RTX A4500, NVIDIA-SMI 535.183.06 Driver Version: 535.183.06 CUDA Version: 12.2 )
I have been attempting a fresh install, and still no luck.
I am able to have docker find the GPU using the following command:
docker run --rm --gpus all nvidia/cuda:12.2.2-cudnn8-runtime-ubuntu20.04 nvidia-smi
During the install I had to use the NVIDIA Docker cgroup issue fix referenced in the README (NVIDIA/nvidia-docker#1447 (comment)) and modify the Dockerfile according to another issue (#945)
When I submit a run I get the errors below. It will run, but only using the CPU so it takes forever.
Any recommendations are welcome
Thanks!
The text was updated successfully, but these errors were encountered: