You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to train the pre-trained model on the downstream task of object detection. I used the pre-trained model of mocov2 with 800 epochs here
I have followed the following process
step 1: Install detectron2.
step 2: Convert a pre-trained MoCo model to detectron2's format:
python3 convert-pretrain-to-detectron2.py input.pth.tar output.pkl
Put dataset under "./datasets" directory, following the directory structure required by detectron2.
The only change I did is used a single gpu rather than 8 gpu
I am getting the following error an
[08/31 12:42:12] fvcore.common.checkpoint WARNING: Some model parameters or buffers are not found in the checkpoint:
�[34mproposal_generator.rpn_head.anchor_deltas.{bias, weight}�[0m
�[34mproposal_generator.rpn_head.conv.{bias, weight}�[0m
�[34mproposal_generator.rpn_head.objectness_logits.{bias, weight}�[0m
�[34mroi_heads.box_predictor.bbox_pred.{bias, weight}�[0m
�[34mroi_heads.box_predictor.cls_score.{bias, weight}�[0m
�[34mroi_heads.res5.norm.{bias, running_mean, running_var, weight}�[0m
[08/31 12:42:12] fvcore.common.checkpoint WARNING: The checkpoint state_dict contains keys that are not used by the model:
�[35mstem.fc.0.{bias, weight}�[0m
�[35mstem.fc.2.{bias, weight}�[0m
[08/31 12:42:12] d2.engine.train_loop INFO: Starting training from iteration 0
[08/31 12:42:13] d2.engine.train_loop ERROR: Exception during training:
Traceback (most recent call last):
File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/engine/defaults.py", line 493, in run_step
self._trainer.run_step()
File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/engine/train_loop.py", line 273, in run_step
loss_dict = self.model(data)
File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 154, in forward
features = self.backbone(images.tensor)
File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/modeling/backbone/resnet.py", line 445, in forward
x = self.stem(x)
File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/modeling/backbone/resnet.py", line 356, in forward
x = self.conv1(x)
File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/livesense/Detectron2/detectron2/detectron2/layers/wrappers.py", line 88, in forward
x = self.norm(x)
File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 519, in forward
world_size = torch.distributed.get_world_size(process_group)
File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 638, in get_world_size
return _get_group_size(group)
File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 220, in _get_group_size
_check_default_pg()
File "/home/ubuntu/anaconda3/envs/detectron_env/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 210, in _check_default_pg
assert _default_pg is not None, \
AssertionError: Default process group is not initialized
[08/31 12:42:13] d2.engine.hooks INFO: Total training time: 0:00:00 (0:00:00 on hooks)
[08/31 12:42:13] d2.utils.events INFO: iter: 0 lr: N/A max_mem: 207M
how can we run the training on a single GPU ?
attached are the logs for details log 3.23.54 PM.txt
The text was updated successfully, but these errors were encountered:
Hi @WXinlong thanks for the wonderful work.
I want to train the pre-trained model on the downstream task of object detection. I used the pre-trained model of mocov2 with 800 epochs here
I have followed the following process
step 1: Install detectron2.
step 2: Convert a pre-trained MoCo model to detectron2's format:
step 3: Run training:
The only change I did is used a single gpu rather than 8 gpu
I am getting the following error an
how can we run the training on a single GPU ?
attached are the logs for details
log 3.23.54 PM.txt
The text was updated successfully, but these errors were encountered: