RuntimeError: CUDA error: device-size assert triggered #9466

rentainhe · 2021-09-12T06:27:00Z

rentainhe
Sep 12, 2021

Hello! I have some problem when I'm using Data-parallel Multi-GPU Training with Pytorch-Lightning

Here's my trainer code:

  trainer = Trainer(
      max_epochs = 300,
      gpus = "1,2,3,4",
      precision = 16,
      accelerator = 'dp',
  )

I have about 8 GPUS: CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
I just want to train my model on specific GPUs 1,2,3,4

When gpus='1,2,3,4', it turns out:

RuntimeError: CUDA error: device-size assert triggered

But when gpus='0,1,2,3', there's no problems
When gpus='1,2', there's no problems

I'm very confused, I need some help, thanks a lot!

rentainhe · 2021-09-12T06:41:20Z

rentainhe
Sep 12, 2021
Author

I've tried some other settings

not work:

  trainer = Trainer(
      max_epochs = 300,
      gpus = "1,2,3,4",
      precision = 32,
      accelerator = 'dp',
  )

no errors

  trainer = Trainer(
      max_epochs = 300,
      gpus = "1,2",
      precision = 16,
      accelerator = 'dp',
  )

1 reply

rentainhe Sep 12, 2021
Author

I 've tried gpus='2,3,4,5', there's no problems ! ! !
OMG, there must be some problems with cuda:1

mestir · 2021-09-12T10:04:48Z

mestir
Sep 12, 2021

#- [ ] - - [x] ***@@

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RuntimeError: CUDA error: device-size assert triggered #9466

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

RuntimeError: CUDA error: device-size assert triggered #9466

Uh oh!

Uh oh!

rentainhe Sep 12, 2021

Replies: 2 comments · 1 reply

Uh oh!

Uh oh!

rentainhe Sep 12, 2021 Author

Uh oh!

rentainhe Sep 12, 2021 Author

Uh oh!

mestir Sep 12, 2021

rentainhe
Sep 12, 2021

Replies: 2 comments 1 reply

rentainhe
Sep 12, 2021
Author

rentainhe Sep 12, 2021
Author

mestir
Sep 12, 2021