You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Launch a 4xA10 GPU instance or a 4xL40S GPU instance on AWS EC2.
Choose Deep Learning Base OSS Nvidia Driver GPU AMI (Ubuntu 22.04) 20241115 (ami-0f2ad13ff5f6b6f7c) as the AMI.
Run
docker run --rm -it --gpus all -p 8888:8888 nvcr.io/nvidia/clara/bionemo-framework:1.10.1 "/bin/bash"
Try running MolMIM pertaining
Workaround
Switch to ami-075a0f15f2d44a65e (NVIDIA GPU-Optimized AMI). Note that some users might not have the liberty to switch AMIs therefore it might be great if we can figure out what is going on here. Thanks very much!
The text was updated successfully, but these errors were encountered:
Issue
When running multi-GPU pretraining of MolMIM on AWS EC2, the following issues were observed:
Error log:
log.txt
How to replicate:
Specific AWS EC2 configurations:
docker run --rm -it --gpus all -p 8888:8888 nvcr.io/nvidia/clara/bionemo-framework:1.10.1 "/bin/bash"
Workaround
Switch to
ami-075a0f15f2d44a65e
(NVIDIA GPU-Optimized AMI). Note that some users might not have the liberty to switch AMIs therefore it might be great if we can figure out what is going on here. Thanks very much!The text was updated successfully, but these errors were encountered: