Not sure how to make DP and DPP on single-node, 2-GPU setup work #268

vwrj · 2019-09-29T23:58:00Z

vwrj
Sep 29, 2019

When I use DDP, it hangs the process, and no metric log files are created for me to view with tensorboard. I just see two tf event files.

On the other hand, when I use DP, the code runs and I can view the loss values going down in Tensorboard, but I don't see any accelerated training. Running with 1 GPU and running with DP on 2 GPUs gives me the same training time: 12 min. I've tried with different batch sizes, and there is virtually no difference in training time.

Do I have to create a DistributedSampler or do something else to see accelerated training using DP?

Code

My code is just as follows:

model = ConvNet()
# most basic trainer, uses good defaults
exp = Experiment(save_dir=os.getcwd())
trainer = Trainer(experiment=exp, gpus=[0, 1], max_nb_epochs=20, distributed_backend='dp')
trainer.fit(model)

What's your environment?

conda version (no venv) I activate a py3.6.3 virtual environment instead of conda.
PyTorch version: 1.2
Lightning version: 0.4.8
Test-tube version: 0.7.1

williamFalcon · 2019-10-01T10:18:16Z

williamFalcon
Oct 1, 2019
Maintainer

I think the documentation and examples cover this case. Did those not work for you?

The speedups with dp vary according to what you're doing. In DDP you (*mostly) double the speed every time you double the number of GPUs.

0 replies

williamFalcon · 2019-10-02T16:39:29Z

williamFalcon
Oct 2, 2019
Maintainer

Will reopen if you are still having issues

0 replies

philippslang · 2019-10-24T13:18:28Z

philippslang
Oct 24, 2019

Do I have to create a DistributedSampler or do something else to see accelerated training using DP?

Yes, this is most likely what you want. Else you run the same batch on both accelerators.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Not sure how to make DP and DPP on single-node, 2-GPU setup work #268

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Not sure how to make DP and DPP on single-node, 2-GPU setup work #268

Uh oh!

Uh oh!

vwrj Sep 29, 2019

Code

What's your environment?

Replies: 3 comments

Uh oh!

Uh oh!

williamFalcon Oct 1, 2019 Maintainer

Uh oh!

williamFalcon Oct 2, 2019 Maintainer

Uh oh!

philippslang Oct 24, 2019

vwrj
Sep 29, 2019

williamFalcon
Oct 1, 2019
Maintainer

williamFalcon
Oct 2, 2019
Maintainer

philippslang
Oct 24, 2019