Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The confidence issue still exists. #46

Open
ashutosh1807 opened this issue May 9, 2022 · 4 comments
Open

The confidence issue still exists. #46

ashutosh1807 opened this issue May 9, 2022 · 4 comments

Comments

@ashutosh1807
Copy link

ashutosh1807 commented May 9, 2022

If we are training the nlspn model from scratch, then as per instructions we do not require legacy, but conceptually it may lead to wrong calculations.
We have three inputs - per pixel confidence, per pixel neighbour offset, per pixel offsets wrt to 8 neighbours of 3x3 kernel across the pixel.
If we don't use legacy, then for the confidence calculation of neighbours we are using wrong offset because offsets are wrt to anchors defined by 3x3 kernel across the pixel. That is, offset shout be wrt to the pixel itself.
If we use legacy, then due to adjustment to offset_tmp by reference, the offsets become wrt to centre pixel which indeed is a wrong assumption while doing the deformable convolution.

@zzangjinsun
Copy link
Owner

Hi, I will check the codes again thx!

@ashutosh1807
Copy link
Author

Hi @zzangjinsun, any updates on the same?

@zzangjinsun
Copy link
Owner

zzangjinsun commented May 31, 2022

I will be busy until the end of the semester. Will check the code during the summer break! (Let's keep this issue open for a while...)

@nchodosh
Copy link

nchodosh commented Sep 9, 2023

Hi,

First off, nice work, and thanks for publishing the code!

I just wanted to ping this discussion since your original implementation and what @ashutosh1807 said seems correct. The offsets are relative to the normal kernel offsets, so when you shift to 1x1 convolutions, you must adjust them to account for that.

I think that this is clear with this alternative implementation of lines 113-144 that avoids the loop and 1x1 convolutions:

w_full = (torch.eye(9)).reshape(9, 1, 3, 3) # each output channel corresponds to one iteration of the original loop
b_full = torch.zeros(9)
conf_aff = deform_conv2d(confidence, offset, w_full, b_full, padding=1)
conf_aff = torch.cat(conf_aff[:, : self.idx_ref], conf_aff[:, (self.idx_ref + 1) :], dim=1) # delete the 0 offset output

From my tests, this gives the same results as your original (legacy) implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants