Error when using default weights "droid.pth" as pretrained weights #52

YznMur · 2022-06-13T16:12:07Z

Hi @zachteed @xhangHU
I couldn't use your weights "droid.pth" for training? I faced this error:

Traceback (most recent call last):
  File "train.py", line 189, in <module>
    mp.spawn(train, nprocs=args.gpus, args=(args,))
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/home/trainer/droidslam/train.py", line 60, in train
    model.load_state_dict(torch.load(args.ckpt))
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1482, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DistributedDataParallel:
        size mismatch for module.update.weight.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 128, 3, 3]).
        size mismatch for module.update.weight.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([2]).
        size mismatch for module.update.delta.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 128, 3, 3]).
        size mismatch for module.update.delta.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([2]).

I am trying to train the model on KITTI
These are the parameters which I am using :

 clip=2.5,  edges=24, fmax=96.0, fmin=8.0, gpus=4, iters=15, lr=5e-05, n_frames=7, noise=False, restart_prob=0.2, scale=False, steps=250000, w1=10.0, w2=0.01, w3=0.05, world_size=4

The text was updated successfully, but these errors were encountered:

YznMur · 2022-06-13T20:07:25Z

I figured it out and made some changes in class UpdateModule(nn.Module)

        self.weight = nn.Sequential(
            nn.Conv2d(128, 128, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 3, 3, padding=1),
            GradientClip(),
            nn.Sigmoid())

also

 self.delta = nn.Sequential(
            nn.Conv2d(128, 128, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 3, 3, padding=1),
            GradientClip())

if you have any advices about training the model on KITTI or training config, it will be appreciated !

felipesce · 2022-10-26T18:49:07Z

Why do we need to change the model shape for training vs inference?

CH-901 · 2023-02-03T08:49:28Z

@YznMur Hi, How do you train kitti, using rgb or rgbd? If rgbd is used, how to obtain depth images of kitti? thank you!

YznMur · 2023-02-03T09:52:52Z

@CH-901 I used RGBD.
You can find depth images for KITTI sequences here : https://www.cvlibs.net/datasets/kitti/eval_depth_all.php
This may help u:

Odometry Nr. Raw sequence name Start End
00: 2011_10_03_drive_0027 000000 004540
01: 2011_10_03_drive_0042 000000 001100
02: 2011_10_03_drive_0034 000000 004660
03: 2011_09_26_drive_0067 000000 000800
04: 2011_09_30_drive_0016 000000 000270
05: 2011_09_30_drive_0018 000000 002760
06: 2011_09_30_drive_0020 000000 001100
07: 2011_09_30_drive_0027 000000 001100
08: 2011_09_30_drive_0028 001100 005170
09: 2011_09_30_drive_0033 000000 001590
10: 2011_09_30_drive_0034 000000 001200

CH-901 · 2023-02-06T02:47:31Z

@YznMur Thank you for your reply. I found the depth data. If only rgb training is used, what value should be entered for disp0 in the training code model(Gs, images, disp0, intrinsics0, graph, num_steps=args.iters, fixedp=2)

YznMur · 2023-02-06T08:58:43Z

There are the KITTI 2015 stereo disparities, you can find them here:
http://www.cvlibs.net/download.php?file=data_scene_flow.zip

CH-901 · 2023-03-13T12:06:17Z

@YznMur Thanks. What is the RPE of the model after the training of Kitti dataset in your experiment. My output seems to be random although the loss is decreasing.

YznMur · 2023-03-13T16:53:02Z

Hi @CH-901
About loss, I faced the same problem, it was meaningless :(

YznMur · 2023-03-31T11:46:38Z

Hi @CH-901
Did u mange to solve the problem with training?

CH-901 · 2023-04-11T02:53:00Z

I haven't solved this problem @YznMur

LinMenwill · 2023-06-08T08:13:37Z

Hi @YznMur
I download the depth images for KITTI sequences from:https://www.cvlibs.net/datasets/kitti/eval_depth_all.php. But each sequence of depth images seems to be missing files from 000000.png to 000004.png. Have you encountered the same issue?

YznMur · 2023-06-08T08:32:43Z

Hi @LinMenwill.
Just take this into consideration when u r preparing train and eval lists.

LinMenwill · 2023-08-30T08:43:20Z

@YznMur Thanks

LinMenwill · 2023-08-30T08:53:09Z

@YznMur I found that the depth data downloaded from https://www.cvlibs.net/datasets/kitti/eval_depth_all.php lacks the presence of the sequence "03:2011_09_26_drive_0067." and the depth data appears to be sparse, how did you convert this sparse depth data into a denser representation? or use sparse depth for training directly?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when using default weights "droid.pth" as pretrained weights #52

Error when using default weights "droid.pth" as pretrained weights #52

YznMur commented Jun 13, 2022 •

edited

Loading

YznMur commented Jun 13, 2022

felipesce commented Oct 26, 2022

CH-901 commented Feb 3, 2023

YznMur commented Feb 3, 2023 •

edited

Loading

CH-901 commented Feb 6, 2023

YznMur commented Feb 6, 2023

CH-901 commented Mar 13, 2023

YznMur commented Mar 13, 2023

YznMur commented Mar 31, 2023

CH-901 commented Apr 11, 2023

LinMenwill commented Jun 8, 2023

YznMur commented Jun 8, 2023

LinMenwill commented Aug 30, 2023

LinMenwill commented Aug 30, 2023

Error when using default weights "droid.pth" as pretrained weights #52

Error when using default weights "droid.pth" as pretrained weights #52

Comments

YznMur commented Jun 13, 2022 • edited Loading

YznMur commented Jun 13, 2022

felipesce commented Oct 26, 2022

CH-901 commented Feb 3, 2023

YznMur commented Feb 3, 2023 • edited Loading

CH-901 commented Feb 6, 2023

YznMur commented Feb 6, 2023

CH-901 commented Mar 13, 2023

YznMur commented Mar 13, 2023

YznMur commented Mar 31, 2023

CH-901 commented Apr 11, 2023

LinMenwill commented Jun 8, 2023

YznMur commented Jun 8, 2023

LinMenwill commented Aug 30, 2023

LinMenwill commented Aug 30, 2023

YznMur commented Jun 13, 2022 •

edited

Loading

YznMur commented Feb 3, 2023 •

edited

Loading