Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resume get RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor #522

Closed
LinSHP opened this issue Apr 25, 2019 · 5 comments

Comments

@LinSHP
Copy link

LinSHP commented Apr 25, 2019

When I resume training, I encountered a Runtime error

Traceback (most recent call last): File "trainval_net.py", line 339, in <module> optimizer.step() File "/pytorch/lib/python3.6/site-packages/torch/optim/sgd.py", line 101, in step buf.mul_(momentum).add_(1 - dampening, d_p) RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor

I search through the issues and find issue #475 has the same error. But it seems the problem has been solved and merged.
So I add the lines that the merge request removed and problem is solved.

add lines

if args.cuda:
    fasterRCNN.cuda()

  if args.optimizer == "adam":
    lr = lr * 0.1
    optimizer = torch.optim.Adam(params)

So I think there are still some problems with the RuntimeError.

@AlexanderHustinx
Copy link

AlexanderHustinx commented Apr 26, 2019

Did this error occur after you pulled the merged version?

The lines:

if args.cuda:
    fasterRCNN.cuda()

Weren't actually removed, they were only moved up, so the model is transferred to the GPU prior to the optimizer being initialized.

EDIT: The merge was for the pytorch-1.0 branch btw, perhaps it should still be merged into pytorch-0.4.0/main branch

@LinSHP
Copy link
Author

LinSHP commented Apr 26, 2019

Yes. I use pytorch1.0 branch, train and resume with --cuda. Before I edit, above the line

  if args.optimizer == "adam":
    lr = lr * 0.1
    optimizer = torch.optim.Adam(params)

there are no

if args.cuda:
    fasterRCNN.cuda()

they are below line

  if args.mGPUs:
    fasterRCNN = nn.DataParallel(fasterRCNN)

And I add those lines above "adam", and it worked.
I am new to pytorch. I don't know what actually cause the Runtime Error.

@AlexanderHustinx
Copy link

Ah okay.
The latest version, currently on GitHub does have the lines for

if args.cuda:
    fasterRCNN.cuda()

Above the optimizer initialization (see here).

It is mentioned in the PyTorch documentation that it is best practice to first move the model to the GPU prior to initializing the optimizer. That way the optimizer will automatically/correctly be moved to the GPU as well.

Are you still running into a problem or error?

@LinSHP
Copy link
Author

LinSHP commented Apr 27, 2019

No, the problem is solved. Thx for your reply. It seems that I clone before the commit that fix 'solve error when resuming training'. My fault..

Thx again.

@EMCP
Copy link
Contributor

EMCP commented May 2, 2019

@LinSHP if problem is solved, please go ahead and close the issue

@LinSHP LinSHP closed this as completed Nov 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants