Multi-GPU Training - Two Suggested Improvements #11079

RylanSchaeffer · 2021-12-11T19:38:23Z

RylanSchaeffer
Dec 11, 2021

Hi! I'm not sure whether this is appropriate to open as an issue, but I can't find answers to two key multi-GPU training questions:

Does one need to use Trainer in order to take advantage of PL's multi-GPU training? Or is the trainer unnecessary if the models are LightningModules?
If I look at nvidia-smi and see that my PL code isn't using GPUs, how do I identify the cause?

cc @Borda

awaelchli · 2021-12-15T14:45:13Z

awaelchli
Dec 15, 2021

Hello!
I converted the issue to a discussion, let's first discuss here :)

Does one need to use Trainer in order to take advantage of PL's multi-GPU training? Or is the trainer unnecessary if the models are LightningModules?

The answer is yes: The whole purpose of the LightningModule is to interact with the Trainer. It exposes some hooks like train_dataloader or training_step that will be called by the Trainer at the appropriate time. But note, the LightningModule is just a nn.Module as well, so if you want you can also use it's forward method like you do with a regular nn.Module.

If you are not sure whether to use/convert Lightning or not, or are overwhelmed by the Trainer, we also have a lightweight version called LightningLite. It only bundles the accelerators (multi-GPU, TPU etc.) and comes without a Trainer, so that you can use your existing training loop and nn.Modules.

If I look at nvidia-smi and see that my PL code isn't using GPUs, how do I identify the cause?

There could be multiple reasons, the most likely that you have not configured the Trainer correctly. Try

Trainer(gpus=2)
or
Trainer(accelerator="gpu", devices=2)

You should see a message printed like this when you run the script.

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-GPU Training - Two Suggested Improvements #11079

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Multi-GPU Training - Two Suggested Improvements #11079

Uh oh!

Uh oh!

RylanSchaeffer Dec 11, 2021

Replies: 1 comment

Uh oh!

awaelchli Dec 15, 2021

RylanSchaeffer
Dec 11, 2021

awaelchli
Dec 15, 2021