Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch in number of Detections with TFRT onnx inference VS pytorch, pth file #83

Open
Allamrahul opened this issue Feb 10, 2023 · 5 comments

Comments

@Allamrahul
Copy link

Allamrahul commented Feb 10, 2023

Dataset: I am using a custom dataset with npy files and annotations. I followed all steps required for custom dataset preparation and I am able to get great results with pytorch with 90% map on my eval set.

However, once I convert the pth file to onnx format using exporter.py, for every point cloud in my eval dataset, I am seeing relatively smaller number of detections using TFRT inference with the cpp script as opposed to what I am getting using pytorch with the pth file.

In regard to the export process, exporter.py and simplifier_onnx.py are being used in the script. However, both scripts are hardcoded for 3 classes for kitti dataset. I have just one class to detect. Hence, I referred to the following commit to make the onnx export work: https://github.com/NVIDIA-AI-IOT/CUDA-PointPillars/pull/77/commits. After this , I was able to export but I faced the following issue after this: #82. I resolved this by tinkering with the export script, as mentioned on the following comment: #77 (comment). After this, my detections using TFRT onnx were atleast a subset of what I was seeing with pytorch but not the whole set. There is a clear delta between TFRT onnx and pytorch pth combo, in majority of my eval set. This can be seen in the following table:

Bounding box delta comparision: pytorch .pth VS TensorFlow RT onnx

<style> </style>
File Pytorch pth TFRT cpp using .onnx file Delta
000000.npy tensor([[  9.6498,   1.1609,   1.9397,   0.2856,   0.4898,   2.8947,   6.2814],         [ 24.8358,   1.3459,   2.5912,   0.2332,   0.4984,   3.0438,   6.2827],         [ 24.9936, -10.4810,   3.2429,   0.2568,   0.4702,   3.1647,   6.2816],         [  9.8542, -10.6894,   2.1888,   0.4316,   0.4553,   2.7412,   6.2486]],        device='cuda:0') 24.8358 1.34592 2.59117 0.23324 0.498444 3.04383 6.28266 0 0.46325 ; 24.9936 -10.481 3.24294 0.256755 0.47017 3.16474 6.28156 0 0.445165 ; 9.8573 -10.6925 2.17166 0.433223 0.452724 2.7258 6.24912 0 0.445157 1
000001.npy tensor([[  9.6501,   1.1778,   1.8507,   0.2533,   0.4935,   2.7208,   6.2741],         [ 24.9947, -10.4883,   3.0557,   0.2706,   0.4838,   3.0915,   6.2594],         [ 24.8404,   1.3479,   2.6033,   0.2287,   0.4947,   3.0391,   6.2825],         [  9.8570, -10.6883,   2.1663,   0.4322,   0.4521,   2.7124,   6.2346]],        device='cuda:0') 9.65337 1.1817 1.80798 0.248034 0.493837 2.66008 6.27361 0 ; 24.9947 -10.4883 3.05572 0.270619 0.483843 3.09145 6.25942 0 0.670895 ; 24.8404 1.34787 2.60326 0.228719 0.494724 3.03909 6.28252 0 0.459299 ; 9.8545 -10.6925 2.1472 0.438129 0.448904 2.7132 6.23376 0 0.424986 0
000002.npy tensor([[  9.6042,   1.1503,   2.0593,   0.2839,   0.4955,   2.9902,   6.3128],         [ 24.7882,   1.3638,   2.6522,   0.2538,   0.5039,   3.1623,   6.2903],         [  9.7436, -10.6760,   2.1350,   0.3712,   0.4578,   2.6609,   6.2507],         [ 24.9494, -10.5134,   3.2150,   0.2888,   0.4944,   3.3462,   6.2143]],        device='cuda:0') 9.74478 -10.6817 2.1041 0.374984 0.453993 2.63108 6.25019 0 0.532783 ; 24.9494 -10.5134 3.21504 0.288844 0.494413 3.34624 6.21432 0 0.515557 ; 0.309276 -10.6853 2.08503 0.458935 0.413923 3.13058 6.09365 0 0.412784 1
000003.npy tensor([[  9.5610, -10.4589,   2.1206,   0.4139,   0.4505,   2.7193,   6.2802],         [ 24.3758,   1.7272,   2.6000,   0.2396,   0.4966,   3.0571,   6.1985],         [ 24.7097, -10.1406,   3.0566,   0.2619,   0.4718,   3.0835,   6.2728],         [  9.2311,   1.3354,   1.8251,   0.2543,   0.4891,   2.7015,   6.2441],         [  8.9262,   7.8720,   2.1033,   0.3872,   0.4424,   2.7067,   6.3819]],        device='cuda:0') 9.56115 -10.4598 2.09798 0.418282 0.448642 2.68469 6.27597 0 0.735731 ; 24.3758 1.72724 2.59998 0.239596 0.496643 3.05714 6.19854 0 0.629267 ; 24.7097 -10.1406 3.0566 0.26186 0.471776 3.08349 6.27275 0 0.585723 ; 9.21606 1.33047 1.82858 0.254299 0.490583 2.66956 6.23728 0 0.471899 1
000004.npy tensor([[ 6.4732,  2.6481,  1.7006,  0.2879,  0.4678,  2.6444,  6.3118],         [21.4290,  4.8774,  2.5937,  0.2325,  0.5022,  3.1258,  6.4040],         [23.1383, -6.8599,  2.7714,  0.2839,  0.4960,  3.0160,  6.3080],         [ 8.1175, -8.9831,  2.2486,  0.3856,  0.4450,  2.7676,  6.3550]],        device='cuda:0') 23.1383 -6.85986 2.77142 0.283893 0.495966 3.01596 6.30801 0 0.580739 ; 8.11463 -8.9818 2.12152 0.396575 0.436063 2.65015 6.35895 0 0.429396 2
000005.npy tensor([[ 5.5251,  2.7731,  1.6679,  0.3284,  0.4662,  2.6940,  6.2788],         [20.4834,  5.0487,  2.5489,  0.2769,  0.5241,  3.1817,  6.4027],         [ 7.3220, -8.8810,  2.1011,  0.4506,  0.4281,  2.6641,  6.3688],         [22.2850, -6.6383,  2.6867,  0.2744,  0.4986,  3.0367,  6.3119]],        device='cuda:0') 7.32207 -8.88152 2.0861 0.445914 0.430497 2.6552 6.36896 0 0.696223 3
000006.npy tensor([[18.0280,  4.9469,  2.4509,  0.3035,  0.5205,  3.1520,  6.3221],         [19.8413, -6.7181,  2.7475,  0.3097,  0.5246,  3.2910,  6.3001],         [ 3.1871,  2.6373,  1.7287,  0.4621,  0.4224,  2.9021,  6.3156],         [ 4.8621, -8.9172,  1.8402,  0.4540,  0.3952,  2.5332,  6.3420],         [32.0742,  7.1384,  3.3039,  0.2361,  0.4806,  3.3647,  6.4108],         [21.2824, 12.1162,  3.6256,  0.2676,  0.4659,  3.5638,  6.5643],         [ 0.6082,  4.4304,  1.8762,  0.4470,  0.4348,  3.4172,  6.2065]],        device='cuda:0') 4.85492 -8.92965 1.819 0.460386 0.396642 2.5153 6.34298 0 0.494817 6
000007.npy tensor([[18.2038, -6.8837,  2.5308,  0.3099,  0.5277,  3.1208,  6.3168],         [16.5025,  4.7925,  2.3577,  0.3065,  0.5248,  3.0787,  6.3005],         [ 1.5735,  2.6487,  1.6249,  0.5034,  0.4109,  2.6605,  6.3160],         [ 2.2250,  2.7058,  1.8312,  0.4703,  0.4060,  3.0384,  6.3380],         [ 3.2350, -8.9478,  1.8462,  0.4438,  0.4085,  2.5771,  6.3109],         [19.7396, 11.9755,  3.2925,  0.2890,  0.5000,  3.6453,  6.5671],         [ 3.5311,  2.8095,  2.3147,  0.4571,  0.4455,  4.2559,  6.3274],         [30.5054,  6.8140,  3.3753,  0.2804,  0.5016,  3.6093,  6.2777]],        device='cuda:0') 18.2057 -6.88499 2.4907 0.307031 0.527094 3.07328 6.31815 0 0.636754 ; 16.502 4.79033 2.33373 0.299566 0.523598 3.0561 6.3044 0 0.532995 ; 1.56738 2.64373 1.68283 0.506594 0.412098 2.66617 6.31967 0 0.51762 ; 3.22002 -8.95614 1.8366 0.449459 0.409571 2.56386 6.3068 0 0.431358 ; 2.2279 2.70934 1.85016 0.464891 0.40516 3.07841 6.33425 0 0.391239 ; 19.7397 11.9755 3.29258 0.28902 0.499917 3.64496 6.56848 0 0.381675 2
000008.npy tensor([[ 8.7021, -7.9169,  2.6375,  0.3647,  0.4888,  3.5404,  6.2655],         [ 7.7196,  3.7774,  2.3025,  0.4060,  0.4704,  3.2993,  6.2707],         [22.8483, -6.6640,  3.5341,  0.3350,  0.5277,  4.1040,  6.3141],         [21.7832,  5.1120,  2.8534,  0.2781,  0.5178,  3.2145,  6.1912],         [ 3.2359, -8.4495,  2.0291,  0.4187,  0.4105,  3.2451,  6.2915]],        device='cuda:0') 8.70127 -7.92042 2.62612 0.36539 0.486129 3.51703 6.26476 0 0.864963 ; 7.6994 3.79393 2.24546 0.40736 0.469539 3.21603 6.25044 0 0.73586 ; 22.8483 -6.66398 3.53411 0.335008 0.527745 4.10398 6.31413 0 0.605781 ; 21.7832 5.11193 2.85462 0.278421 0.517415 3.21271 6.21329 0 0.508611 ; 1
000009.npy tensor([[19.5711,  4.7877,  2.6956,  0.3077,  0.5412,  3.3734,  6.2451],         [ 6.3672, -8.0972,  2.7778,  0.4181,  0.4778,  4.1039,  6.2421],         [ 5.4901,  3.6080,  2.3323,  0.4340,  0.4502,  3.7175,  6.2740],         [20.3728, -7.0433,  3.3803,  0.3514,  0.5351,  4.1972,  6.3070],         [26.6330, 11.8861,  3.9950,  0.3089,  0.5019,  4.1503,  6.6127]],        device='cuda:0') 5.47306 3.61103 2.394 0.432978 0.453338 3.80027 6.32163 0 0.714706 ; 19.5717 4.78751 2.71062 0.308163 0.539413 3.36241 6.27686 0 0.621834 ; 6.35329 -8.10289 2.76789 0.422266 0.47866 4.13415 6.24032 0 0.606208 2
000010.npy tensor([[18.3196,  4.6323,  3.2815,  0.3700,  0.5370,  4.5950,  6.3164],         [ 5.0913, -8.1561,  2.6470,  0.4329,  0.4667,  4.0704,  6.2747],         [19.1831, -7.1906,  3.3499,  0.3578,  0.5279,  4.2080,  6.3127],         [ 2.5482,  4.3696,  1.6065,  0.4281,  0.3918,  2.8003,  6.2634]],        device='cuda:0') 5.08485 -8.16716 2.64149 0.431825 0.466464 4.03816 6.27571 0 0.731938 ; 19.1846 -7.19002 3.2872 0.352221 0.529464 4.08496 6.31286 0 0.591408 2
000011.npy tensor([[15.3577, -7.3005,  3.0413,  0.3812,  0.5104,  4.2909,  6.3159],         [ 0.6093,  3.4074,  1.9033,  0.5056,  0.4306,  3.3583,  6.1790],         [14.5397,  4.4909,  3.0513,  0.3723,  0.5222,  4.3821,  6.2383],         [30.4700, -6.2796,  4.0225,  0.2914,  0.4843,  3.8403,  6.3179],         [29.6795,  5.5980,  4.0535,  0.2816,  0.4877,  3.9741,  6.2869]],        device='cuda:0') 0.594493 3.41456 2.11992 0.502219 0.441799 3.74912 6.17387 0 0.828488 ; 15.3587 -7.29961 2.99875 0.375657 0.512654 4.18005 6.31556 0 0.798267 ; 30.47 -6.27963 4.02255 0.29143 0.484331 3.84032 6.31788 0 0.434042 2
000012.npy tensor([[ 11.2944,   4.3980,   3.0133,   0.3911,   0.5198,   4.6365,   6.2670],         [ 26.4576,   5.3648,   3.6263,   0.3002,   0.5062,   3.8833,   6.3176],         [ 12.0963,  -7.3715,   3.0630,   0.3846,   0.5122,   4.3017,   6.2922],         [  8.1463, -12.5014,   2.9129,   0.3691,   0.4980,   3.9686,   6.1562],         [ 27.1433,  -6.4810,   3.9175,   0.3048,   0.5110,   3.9699,   6.3372],         [ 18.4373,  11.4960,   3.7129,   0.3159,   0.4918,   4.2750,   6.4670]],        device='cuda:0') 8.14566 -12.506 2.84502 0.364298 0.498799 3.84938 6.15557 0 0.378752 ; 12.0904 -7.37816 2.90519 0.378811 0.516209 4.00902 6.29017 0 0.376648 4

Please let me know if you know something that could help me.

@Allamrahul Allamrahul changed the title Number of Detections with TFRT inference cpp script are lesser than what I see with pytorch, pth file Mismatch in number of Detections with TFRT onnx inference VS pytorch, pth file Feb 10, 2023
@Allamrahul
Copy link
Author

I see the same behavior with the kitti dataset as well, as follows:
image
Can anyone confirm if this an expected behavior or is this not supposed to happen?

@KwangjinChoi
Copy link

Hello, can you tell me how much the 3D detection performance drops?

@Allamrahul
Copy link
Author

Hi, from my initial comment, there is delta as large as 6 in 000006.npy between pytorch pth and TFRT inference. I have about 30 evaluation point clouds and I see this drop in 90 % of them. Is there anything I can do to avoid this?

@wangxj2014
Copy link

I also encountered the same problem. Is there any way to solve this problem?

@Dreamdreams8
Copy link

The same problem. Has anyone solved it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants