Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to integrate with CUDA GPU #440

Open
LYK-love opened this issue Feb 7, 2025 · 1 comment
Open

Failed to integrate with CUDA GPU #440

LYK-love opened this issue Feb 7, 2025 · 1 comment

Comments

@LYK-love
Copy link

LYK-love commented Feb 7, 2025

Hi, I found that this codebase doesn't use GPU to do training. So I tried to add GPU support. Basically, I changed code from

# spinup/algos/pytorch/sac/sac.py
# Create actor-critic module and target networks
    # SAC concurrently learns a policy pi_\theta and two Q-functions Q_{\phi_1}, Q_{\phi_2}.
    ac = actor_critic(env.observation_space, env.action_space, **ac_kwargs)
    ac_targ = deepcopy(ac)

to

# ... set device to 'cuda'
device = torch.device(device if torch.cuda.is_available() else "cpu")
# Create actor-critic module and target networks
# SAC concurrently learns a policy pi_\theta and two Q-functions Q_{\phi_1}, Q_{\phi_2}.
ac = actor_critic(env.observation_space, env.action_space, **ac_kwargs).to(device)
ac_targ = deepcopy(ac).to(device)

However, when I try to run this, I got

[0136-ict-prxmx50038:2626350] *** Process received signal ***
[0136-ict-prxmx50038:2626350] Signal: Segmentation fault (11)
[0136-ict-prxmx50038:2626350] Signal code: Invalid permissions (2)
[0136-ict-prxmx50038:2626350] Failing at address: 0x7fffdf58ac5c
[0136-ict-prxmx50038:2626350] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x78dc26642520]
[0136-ict-prxmx50038:2626350] [ 1] [0x7fffdf58ac5c]
[0136-ict-prxmx50038:2626350] *** End of error message ***

When I set device to "cpu", it works fine as usual. I'm quite confused and don't know what happened. Changing code from mlp = MLP(*args) to mlp = MLP(*args).to(device) works for typical pytorch projects. But why does this error occur?

I would be appreciated if anyone can give advice.

@codeseeker-dhruv
Copy link

def sac(...):
    # mpi fork 
    ...

    env = make_env()
    # Write after mpi fork
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    ac = actor_critic(...).to(device)  # cuda init 
    ac_targ = deepcopy(ac).to(device)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants