Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛[BUG]: Cannot detect all the GPUs #752

Closed
david5010 opened this issue Jan 8, 2025 · 3 comments
Closed

🐛[BUG]: Cannot detect all the GPUs #752

david5010 opened this issue Jan 8, 2025 · 3 comments
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@david5010
Copy link

david5010 commented Jan 8, 2025

Version

0.6.0

On which installation method(s) does this occur?

No response

Describe the issue

I'm currently using Earth2Mip on Docker with the modulus image. On my PC, I have 2 GPUs and Pytorch can detect them without any issues. I want to speed up the Ensemble Forecast by leveraging the 2 GPUs rather than using just 1. However, when I checked, Modulus isn't detecting both GPUs, just the first one. Is there any fixes I can do so that it'll detect both?

DistributedManager.initialize()
    device = DistributedManager().device
    group = torch.distributed.group.WORLD

    logging.info(f"Earth-2 MIP config loaded {config}")
    logging.info(f"Loading model onto device {device}")
    model = get_model(config.weather_model, device=device)
    logging.info("Constructing initializer data source")
    perturb = get_initializer(
        model,
        config,
    )
    logging.info("Running inference")
    run_inference(model, config, perturb, group)

Minimum reproducible example

Relevant log output

Environment details

@david5010 david5010 added ? - Needs Triage Need team to review and classify bug Something isn't working labels Jan 8, 2025
@coreyjadams
Copy link
Collaborator

Hi @david5010 ,

How are you launching the code? Can you share the launch command and any errors you see? And, can you share what you are expecting to see when you call run_inference?

@david5010
Copy link
Author

Hi,

I realized that it was a mistake on my end. I was expecting to see both GPUs being utilized but I see that I simply had to use torchrun instead of python to make it work

@coreyjadams
Copy link
Collaborator

Glad you got it working!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants