Trying to create tensor with negative dimension with ddp_sharded
#13418
Answered
by
akihironitta
Riccorl
asked this question in
DDP / multi-GPU / multi-node
-
I'm trying to fine-tune a Transformer model (XLM-R) on multi-gpu, using the
I'm running the latest PyTorch Lightning, PyTorch 1.10, and I'm using two V100 on a Power9 based architecture. I've tried both with 16bit and 32bit precision. The optimizer I'm using is RAdam, from PyTorch. I can provide the code if needed. Here the complete stack trace
|
Beta Was this translation helpful? Give feedback.
Answered by
akihironitta
Jun 30, 2022
Replies: 1 comment
-
Duplicate of #13431. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
Riccorl
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Duplicate of #13431.