Open
Description
I was trainig using multiple GPUs on my own dataset, but when resuming training, I got this error
Loading pretrained weights from data/pretrained_model/vgg16_caffe.pth
loading checkpoint models/vgg16/virtual_sign_2019/faster_rcnn_1_3_1124.pth
loaded checkpoint models/vgg16/virtual_sign_2019/faster_rcnn_1_3_1124.pth
Traceback (most recent call last):
File "trainval_net.py", line 340, in <module>
optimizer.step()
File "/home/sy1806701/anaconda3/envs/pytorch/lib/python3.5/site-packages/torch/optim/sgd.py", line 101, in step
buf.mul_(momentum).add_(1 - dampening, d_p)
RuntimeError: The expanded size of the tensor (3) must match the existing size (25088) at non-singleton dimension 3
Environment:
Pytorch 0.4.0
CUDA 9.0
cuDNN 7.1.2
Python 3.5
GPUs: 4 x Tesla V100
Command line I used:
CUDA_VISIBLE_DEVICES=2,3,4,5 python trainval_net.py --dataset virtual_sign_2019 --net vgg16 --bs 32 --nw 16 --lr 0.001 --cuda --mGPUs --r True --checksession 1 --checkepoch 3 --checkpoint 1124
I have tried everything I can to solve this problem, incluing many issue related to this, like #515 #475 #506 , but the problem still exists....... is there any possilbe solution? thanks....
Metadata
Metadata
Assignees
Labels
No labels