-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
training problem #15
Comments
same question |
maybe you need to use V100 |
@Whiplash-18 I had the same problem, you have to use torch 1.4 in order to train the models, so you will need a gpu which supports cuda 10.x |
Hi, @Whiplash-18. Thanks for your interest in our work. Yes, there exists a bug in our former implementation. And we solved this problem by using two optimizers to learn HDN and JLN, respectively. We've revised the code and you can pull the recent release. Now it can support a higher PyTorch version (>1.4). |
when I trained the model on panoptic datasets and met such problem. and I use the torch1.13, cuda 11.8.
File "/workspace/faster_voxel_pose/run/train.py", line 181, in
main()
File "/workspace/faster_voxel_pose/run/train.py", line 151, in main
train_3d(config, model, optimizer, train_loader, epoch, final_output_dir, writer_dict)
File "/workspace/faster_voxel_pose/run/../lib/core/function.py", line 41, in train_3d
final_poses, poses, proposal_centers, loss_dict, input_heatmap = model(views=inputs, meta=meta, targets=targets,
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
return self.module(*inputs[0], **kwargs[0])
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/faster_voxel_pose/run/../lib/models/voxelpose.py", line 38, in forward
bbox_preds = self.pose_net(input_heatmaps, meta, cameras, resize_transform)
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/faster_voxel_pose/run/../lib/models/human_detection_net.py", line 94, in forward
proposal_heatmaps_1d = self.c2c_net(torch.flatten(feature_1d, 0, 1)).view(batch_size, self.max_people, -1)
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/faster_voxel_pose/run/../lib/models/cnns_1d.py", line 131, in forward
hm = self.output_hm(x)
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 313, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 309, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/fx/traceback.py", line 57, in format_stack
return traceback.format_stack()
(Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:114.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File "/workspace/faster_voxel_pose/run/train.py", line 181, in
main()
File "/workspace/faster_voxel_pose/run/train.py", line 151, in main
train_3d(config, model, optimizer, train_loader, epoch, final_output_dir, writer_dict)
File "/workspace/faster_voxel_pose/run/../lib/core/function.py", line 71, in train_3d
accu_loss.backward()
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/opt/conda/envs/faster_voxel_pose/lib/python3.9/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 32, 1]] is at version 7; expected version 5 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
The text was updated successfully, but these errors were encountered: