Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using default weights "droid.pth" as pretrained weights #52

Open
YznMur opened this issue Jun 13, 2022 · 14 comments
Open

Error when using default weights "droid.pth" as pretrained weights #52

YznMur opened this issue Jun 13, 2022 · 14 comments

Comments

@YznMur
Copy link

YznMur commented Jun 13, 2022

Hi @zachteed @xhangHU
I couldn't use your weights "droid.pth" for training? I faced this error:

Traceback (most recent call last):
  File "train.py", line 189, in <module>
    mp.spawn(train, nprocs=args.gpus, args=(args,))
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/home/trainer/droidslam/train.py", line 60, in train
    model.load_state_dict(torch.load(args.ckpt))
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1482, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DistributedDataParallel:
        size mismatch for module.update.weight.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 128, 3, 3]).
        size mismatch for module.update.weight.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([2]).
        size mismatch for module.update.delta.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 128, 3, 3]).
        size mismatch for module.update.delta.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([2]). 

I am trying to train the model on KITTI
These are the parameters which I am using :

 clip=2.5,  edges=24, fmax=96.0, fmin=8.0, gpus=4, iters=15, lr=5e-05, n_frames=7, noise=False, restart_prob=0.2, scale=False, steps=250000, w1=10.0, w2=0.01, w3=0.05, world_size=4

@YznMur
Copy link
Author

YznMur commented Jun 13, 2022

I figured it out and made some changes in class UpdateModule(nn.Module)

        self.weight = nn.Sequential(
            nn.Conv2d(128, 128, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 3, 3, padding=1),
            GradientClip(),
            nn.Sigmoid())

also

 self.delta = nn.Sequential(
            nn.Conv2d(128, 128, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 3, 3, padding=1),
            GradientClip())

if you have any advices about training the model on KITTI or training config, it will be appreciated !

@felipesce
Copy link

Why do we need to change the model shape for training vs inference?

@CH-901
Copy link

CH-901 commented Feb 3, 2023

@YznMur Hi, How do you train kitti, using rgb or rgbd? If rgbd is used, how to obtain depth images of kitti? thank you!

@YznMur
Copy link
Author

YznMur commented Feb 3, 2023

@CH-901 I used RGBD.
You can find depth images for KITTI sequences here : https://www.cvlibs.net/datasets/kitti/eval_depth_all.php
This may help u:

Odometry Nr. Raw sequence name Start End
00: 2011_10_03_drive_0027 000000 004540
01: 2011_10_03_drive_0042 000000 001100
02: 2011_10_03_drive_0034 000000 004660
03: 2011_09_26_drive_0067 000000 000800
04: 2011_09_30_drive_0016 000000 000270
05: 2011_09_30_drive_0018 000000 002760
06: 2011_09_30_drive_0020 000000 001100
07: 2011_09_30_drive_0027 000000 001100
08: 2011_09_30_drive_0028 001100 005170
09: 2011_09_30_drive_0033 000000 001590
10: 2011_09_30_drive_0034 000000 001200

@CH-901
Copy link

CH-901 commented Feb 6, 2023

@YznMur Thank you for your reply. I found the depth data. If only rgb training is used, what value should be entered for disp0 in the training code model(Gs, images, disp0, intrinsics0, graph, num_steps=args.iters, fixedp=2)

@YznMur
Copy link
Author

YznMur commented Feb 6, 2023

@CH-901
Copy link

CH-901 commented Mar 13, 2023

@YznMur Thanks. What is the RPE of the model after the training of Kitti dataset in your experiment. My output seems to be random although the loss is decreasing.

@YznMur
Copy link
Author

YznMur commented Mar 13, 2023

Hi @CH-901
About loss, I faced the same problem, it was meaningless :(

@YznMur
Copy link
Author

YznMur commented Mar 31, 2023

Hi @CH-901
Did u mange to solve the problem with training?

@CH-901
Copy link

CH-901 commented Apr 11, 2023

I haven't solved this problem @YznMur

@LinMenwill
Copy link

Hi @YznMur
I download the depth images for KITTI sequences from:https://www.cvlibs.net/datasets/kitti/eval_depth_all.php. But each sequence of depth images seems to be missing files from 000000.png to 000004.png. Have you encountered the same issue?

@YznMur
Copy link
Author

YznMur commented Jun 8, 2023

Hi @LinMenwill.
Just take this into consideration when u r preparing train and eval lists.

@LinMenwill
Copy link

@YznMur Thanks

@LinMenwill
Copy link

@YznMur I found that the depth data downloaded from https://www.cvlibs.net/datasets/kitti/eval_depth_all.php lacks the presence of the sequence "03:2011_09_26_drive_0067." and the depth data appears to be sparse, how did you convert this sparse depth data into a denser representation? or use sparse depth for training directly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants