Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A little performance drop when running this code, ask for HigherHRnet version #17

Open
cucdengjunli opened this issue Dec 20, 2022 · 18 comments

Comments

@cucdengjunli
Copy link

cucdengjunli commented Dec 20, 2022

实验结果1

论文官方结果

Dear authors:
It is grateful to read your paper and code. when i try to run this project to reproduce your paper work. my result is dropped about 2mm, could you explain why ?

is your code responde to this setting? using [5 views; mask; weights;].

my conda environment is that, show in the picture:

my GPU is RTX3090, cuda11.3 , torch1.11.0

环境1
环境2

@cucdengjunli
Copy link
Author

this is my validation result on panoptic. my training set is the same as /Faster-VoxelPose-main/configs/panoptic/jln64.yaml

实验结果1

@cucdengjunli cucdengjunli changed the title a little accuracy loss when running this code A little performance drop when running this code Dec 20, 2022
@cucdengjunli
Copy link
Author

cucdengjunli commented Dec 21, 2022

the backbone you provide is Resnet , the backbone you mentioned in paper is HRnet, maybe this is the reason. Let me exchange the backbone and see the result

your core/config.py show that you use HigherHRnet

@cucdengjunli
Copy link
Author

@cucdengjunli cucdengjunli changed the title A little performance drop when running this code A little performance drop when running this code, ask for HigherHRnet version Dec 21, 2022
@gpastal24
Copy link

Hi, how did you manage to train the model on RTX30 series gpu? Did you make any changes to the code?

@cucdengjunli
Copy link
Author

Hi, how did you manage to train the model on RTX30 series gpu? Did you make any changes to the code?

microsoft/voxelpose-pytorch#19

I try this and succeed

@gpastal24
Copy link

gpastal24 commented Feb 7, 2023

Hi, how did you manage to train the model on RTX30 series gpu? Did you make any changes to the code?

microsoft/voxelpose-pytorch#19

I try this and succeed

I did something similar, eventually. I returned the total loss at each iteration, and I got 18.6mm 3d error.

`

    loss = loss_dict["total"]
    loss_2d = loss_dict["2d_heatmaps"]
    loss_1d = loss_dict["1d_heatmaps"]
    loss_bbox = loss_dict["bbox"]
    loss_joint = loss_dict["joint"]

    losses.update(loss.item())
    losses_2d.update(loss_2d.item())
    losses_1d.update(loss_1d.item())
    losses_bbox.update(loss_bbox.item())
    losses_joint.update(loss_joint.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

`

Screenshot from 2023-02-07 10-06-27

@Mosh-Wang
Copy link

Hi, how did you manage to train the model on RTX30 series gpu? Did you make any changes to the code?

microsoft/voxelpose-pytorch#19
I try this and succeed

I did something similar, eventually. I returned the total loss at each iteration, and I got 18.6mm 3d error.

`

    loss = loss_dict["total"]
    loss_2d = loss_dict["2d_heatmaps"]
    loss_1d = loss_dict["1d_heatmaps"]
    loss_bbox = loss_dict["bbox"]
    loss_joint = loss_dict["joint"]

    losses.update(loss.item())
    losses_2d.update(loss_2d.item())
    losses_1d.update(loss_1d.item())
    losses_bbox.update(loss_bbox.item())
    losses_joint.update(loss_joint.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

`

Screenshot from 2023-02-07 10-06-27

Hello, it’s convenient to ask you exactly how to change the part of loss? How is loss_dict defined?

@gpastal24
Copy link

@Mosh-Wang I just changed the code to backprop the total loss at every batch iteration. Loss dict is returned by the FVP model. I didn't do anything fancy.

loss_dict = {
"2d_heatmaps": loss_2d,
"1d_heatmaps": loss_1d,
"bbox": 0.1 * loss_bbox,
"joint": loss_joint,
"total": loss_2d + loss_1d + 0.1 * loss_bbox + loss_joint
}

@Mosh-Wang
Copy link

Mosh-Wang commented Mar 31, 2023

@Mosh-Wang I just changed the code to backprop the total loss at every batch iteration. Loss dict is returned by the FVP model. I didn't do anything fancy.

loss_dict = {
"2d_heatmaps": loss_2d,
"1d_heatmaps": loss_1d,
"bbox": 0.1 * loss_bbox,
"joint": loss_joint,
"total": loss_2d + loss_1d + 0.1 * loss_bbox + loss_joint
}

Thank you very much for your reply. Can I ask you one more question? That is, when I am training Panoptic, I use TRAIN_HEATMAP_SRC: 'image' and TEST_HEATMAP_SRC: 'image' in the original config, and I get the following error. Do you also use this setting? Or have you changed it? Or what do you think is the reason?
ref

@gpastal24
Copy link

gpastal24 commented Mar 31, 2023

@Mosh-Wang

input_heatmap = torch.zeros((1, 1, 1))

Change this to input_heatmaps or

'input_heatmap': input_heatmaps,

to input_heatmap.
They are not used anyway if you are using the image for training and testing.
Just make a quick check before waiting for a whole epoch that this resolves the problem both for training and valdating

@gpastal24
Copy link

@Mosh-Wang regarding the other q. I dont know if it matters that much. If you try both approaches would you br kind to let us know if training the 2D network as well, increases the performance of the method?

@zaie
Copy link

zaie commented Apr 26, 2023

Maybe the omitted loss_off cause the performance drop.
#26

@cucdengjunli
Copy link
Author

i reproduce the higherhrnet version backbone

@gpastal24
Copy link

@cucdengjunli

i reproduce the higherhrnet version backbone

Did you get the same results as the paper?

@AlvinYH
Copy link
Owner

AlvinYH commented Jul 23, 2023

Hi, @cucdengjunli. Thanks for your interest in our work. We've modified the code and you can pull the recent release. Yes, we make several changes to the model architecture (remove the offset branch and reduce the feature dimension in the weight_net). And the experimental results are slightly different from the one in the original paper. Specifically, as for Panoptic dataset, the mpjpe increases a little (+0.15mm) while the new model yields an improvement of 1.44 in terms of AP25. You can download the pre-trained checkpoints. We'll revise our paper to specify these alternations. Also, thanks for pointing our mistake. We did use Pose ResNet for training on Panoptic Dataset instead of HigherHRNet. We'll fix this typo in the final version. And using HigherHRNet is expected to further reduce the errors.

@cucdengjunli
Copy link
Author

@cucdengjunli

i reproduce the higherhrnet version backbone

Did you get the same results as the paper?

yes , mpjpe@500mm: 17.966

@cucdengjunli
Copy link
Author

Hi, @cucdengjunli. Thanks for your interest in our work. We've modified the code and you can pull the recent release. Yes, we make several changes to the model architecture (remove the offset branch and reduce the feature dimension in the weight_net). And the experimental results are slightly different from the one in the original paper. Specifically, as for Panoptic dataset, the mpjpe increases a little (+0.15mm) while the new model yields an improvement of 1.44 in terms of AP25. You can download the pre-trained checkpoints. We'll revise our paper to specify these alternations. Also, thanks for pointing our mistake. We did use Pose ResNet for training on Panoptic Dataset instead of HigherHRNet. We'll fix this typo in the final version. And using HigherHRNet is expected to further reduce the errors.

thank you!

@CodeCrusader66
Copy link

Hi, @cucdengjunli. Thanks for your interest in our work. We've modified the code and you can pull the recent release. Yes, we make several changes to the model architecture (remove the offset branch and reduce the feature dimension in the weight_net). And the experimental results are slightly different from the one in the original paper. Specifically, as for Panoptic dataset, the mpjpe increases a little (+0.15mm) while the new model yields an improvement of 1.44 in terms of AP25. You can download the pre-trained checkpoints. We'll revise our paper to specify these alternations. Also, thanks for pointing our mistake. We did use Pose ResNet for training on Panoptic Dataset instead of HigherHRNet. We'll fix this typo in the final version. And using HigherHRNet is expected to further reduce the errors.

I have alse reproduce the higherHRnet version code 😄,the result is as the same as your paper said. may I send you a merge request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants