Skip to content

Issues with Conda + WSL deployment: GPU torch/dgl not specified, Python version, and low GPU utilization #9

@lg1243610235-cmyk

Description

@lg1243610235-cmyk

Steps to Reproduce

First of all, thank you very much to ovo and all contributors for developing and maintaining this excellent software .
While deploying and running the project using conda + WSL, I encountered several issues related to environment configuration and GPU usage. I am reporting them here for reference and discussion.
Environment
OS: Windows + WSL
Environment manager: Conda
CUDA: 12.8
GPU: NVIDIA 3060ti (properly detected in WSL)
Issues

  1. rfdiffusion.yml does not explicitly specify GPU versions of torch and dgl
    In rfdiffusion.yml, GPU-enabled versions of torch and dgl are not explicitly specified.
    In practice:
    dgl 1.1.2 requires a CUDA 11.8 build
    If the CPU version of dgl/torch is installed by default, the GPU may not be used correctly even when CUDA is available
    It may be helpful to explicitly specify GPU versions in the environment file, for example:
    torch==2.2.2+cu118
  2. proteinmpnn-fastrelax.yml should explicitly specify Python 3.12
  3. GPU version of rfdiffusion mainly uses CPU at runtime
    After confirming that GPU-enabled dependencies are installed and running the GPU version of rfdiffusion, I observed that:
    GPU utilization is relatively low
    CPU utilization is high (observed via Windows Task Manager)
    The root cause is currently unclear.

Thanks again to ovo and the maintainers for their great work!

Expected Behavior

No response

Actual Behavior

No response

Logs / Screenshots

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions