NAN or INF problem #29

AbyssHedgehog · 2023-05-21T03:34:17Z

Dear author,recently I tried to run your code on my sever. However, I found the 'NAN or INF found in tensor' when I start training.
I used torch==1.4.0,cudatoolkit==10.1,torchvision==0.5.0 and the others are same to the requirements.txt.
I changed GPUS: '0,1' to GPUS: '0' and NUM_DATA: 1000 to NUM_DATA: 500.
I trained the model for 30 epoch, but it still shows 'NAN or INF found in tensor'.
作者您好，最近我尝试在服务器上运行您的代码，但是始终显示 'NAN or INF found in tensor' 。
我使用的配置和您写的一样，torch==1.4.0，cudatoolkit==10.1,torchvision==0.5.0等等。
由于一些原因，我将多GPU并行处进行了修改，只有一块GPU；同时从1000减少了NUM DATA到500。
我尝试训练到了epoch 30，但是始终显示 'NAN or INF found in tensor'。
在其他人的Issues里，我看到您似乎已经解决了这个问题，这是解决问题的新代码吗？

AlvinYH · 2023-07-23T15:53:38Z

Thanks for your interest in our work. We've modified the code and you can pull the recent release. The NaN problem sometimes occurs when no valid proposal is detected in HDN, resulting in computing loss between None tensors. We have added a conditional expression to solve this problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NAN or INF problem #29

NAN or INF problem #29

AbyssHedgehog commented May 21, 2023

AlvinYH commented Jul 23, 2023

NAN or INF problem #29

NAN or INF problem #29

Comments

AbyssHedgehog commented May 21, 2023

AlvinYH commented Jul 23, 2023