How to train on multiple GPUs and then test on a single GPU? #12856

nian-liu · 2022-04-22T15:55:21Z

nian-liu
Apr 22, 2022

Dear all,

I am using DDP to train and validate my model on multiple GPUs and then testing the model performance at the end of the training/validation. I logged the testing speed (second per image) and found that using DDP will somewhat slow the speed compared with the speed I measured when using only one GPU. Hence, I want to train on multiple GPUs and then test on a single GPU. To do this, I first instantiate a trainer on multiple GPUs and fit the model. Then, I instantiate another trainer on one GPU and conduct testing, shown as below:

I also log test speed and accurary:

However, I found that the code always stuck at the end of the testing:

I guess it's because self.log is waiting to synchronize between multiple GPUs. So I tried to use
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
self.log('test_time', test_time, on_step=False, on_epoch=True, sync_dist=False)
self.log("test_acc", accuracy, sync_dist=False)
and
if self.trainer.is_global_zero: self.log("test_acc", accuracy, rank_zero_only=True)
However, nothing worked. Any suggestion?

rohitgr7 · 2022-04-27T08:29:01Z

rohitgr7
Apr 27, 2022

with DDP the complete script is launched on each device. I'd recommend creating 2 different scripts for your use-case, one for training and another one for testing.

Also in your test_step, you can just do:

self.log('test_acc', self.accuracy)

no need of creating test_epoch_end since all that is handled within PL internally.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to train on multiple GPUs and then test on a single GPU? #12856

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to train on multiple GPUs and then test on a single GPU? #12856

Uh oh!

Uh oh!

nian-liu Apr 22, 2022

Replies: 1 comment

Uh oh!

rohitgr7 Apr 27, 2022

nian-liu
Apr 22, 2022

rohitgr7
Apr 27, 2022