How to train on multiple GPUs and then test on a single GPU? #12856
Unanswered
nian-liu
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment
-
with DDP the complete script is launched on each device. I'd recommend creating 2 different scripts for your use-case, one for training and another one for testing. Also in your self.log('test_acc', self.accuracy) no need of creating |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Dear all,
I am using DDP to train and validate my model on multiple GPUs and then testing the model performance at the end of the training/validation. I logged the testing speed (second per image) and found that using DDP will somewhat slow the speed compared with the speed I measured when using only one GPU. Hence, I want to train on multiple GPUs and then test on a single GPU. To do this, I first instantiate a trainer on multiple GPUs and fit the model. Then, I instantiate another trainer on one GPU and conduct testing, shown as below:

I also log test speed and accurary:

However, I found that the code always stuck at the end of the testing:

I guess it's because
self.log
is waiting to synchronize between multiple GPUs. So I tried to useos.environ["CUDA_VISIBLE_DEVICES"] = "0"
self.log('test_time', test_time, on_step=False, on_epoch=True, sync_dist=False)
self.log("test_acc", accuracy, sync_dist=False)
and
if self.trainer.is_global_zero: self.log("test_acc", accuracy, rank_zero_only=True)
However, nothing worked. Any suggestion?
Beta Was this translation helpful? Give feedback.
All reactions