Generating Belief Maps using train2/train.py #242

BazUCD · 2022-04-06T17:27:59Z

Hi I am attempting to run a the training script and generate the belief maps from train2/train.py in order to debug but I am getting this error:

start: 18:18:30.781464
load data: ['/home/user/Downloads/Spanner2']
load data:
training data: 2000 batches
load models
ready to train!
Traceback (most recent call last):
File "train.py", line 606, in
_runnetwork(epoch,trainingdata)
File "train.py", line 422, in _runnetwork
for batch_idx, targets in enumerate(train_loader):
File "/home/user/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/home/user/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/home/user/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/home/user/.local/lib/python3.6/site-packages/torch/_utils.py", line 434, in reraise
raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/user/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/user/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/user/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/user/catkin_pcl_new/src/dope/scripts/train2/utils_dope.py", line 321, in getitem
save=False,
File "/home/user/catkin_pcl_new/src/dope/scripts/train2/utils_dope.py", line 593, in CreateBeliefMap
p = [point[numb_point][1],point[numb_point][0]]
IndexError: list index out of range

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 17249) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/user/.local/lib/python3.6/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/home/user/.local/lib/python3.6/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/user/.local/lib/python3.6/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/user/.local/lib/python3.6/site-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/home/user/.local/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/user/.local/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2022-04-06_18:18:39
host : user-User
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 17249)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

I am unsure what is causing this error as I have the correct versions of Pytorch install based on requirements.txt. Is there any common mistakes I could be making?

TontonTremblay · 2022-04-07T03:55:52Z

Could you share an example of json file you are using in your dataset. It looks like

p = [point[numb_point][1],point[numb_point][0]]

point looks empty of the dimensions are wrong. @mintar refactored the data format a little bit, I did not check if it was compatible with train2/train.py? But I will try to check soon.

@BazUCD Did you try to use the original training script?

BazUCD · 2022-04-07T08:27:06Z

Hi @TontonTremblay thanks for the quick reply. Heres an example of my .json files with the associated png as well:

I've used the original training script and generated some weights but was unable to detect anything so after your recommendation from #238 I have been trying to generate the belief maps using train2

TontonTremblay · 2022-04-09T15:34:08Z

This looks correct, but your object has a symmetry in it. https://github.com/NVlabs/Deep_Object_Pose/tree/master/scripts/nvisii_data_gen#handling-objects-with-symmetries you should look into this from Martin.

andrewyguo · 2022-06-07T00:47:12Z

I encountered a similar issue. The training script expects the "projected_cuboid" field to contain 9 points. The last point being the point under"projected_cuboid_centroid".

In your case, you can add something like projected_cuboid_keypoints.append(obj['projected_cuboid_centroid']) right below line 228 in utils_dope.py. I did this and it worked for me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating Belief Maps using train2/train.py #242

Generating Belief Maps using train2/train.py #242

BazUCD commented Apr 6, 2022

TontonTremblay commented Apr 7, 2022

BazUCD commented Apr 7, 2022

TontonTremblay commented Apr 9, 2022

andrewyguo commented Jun 7, 2022

Generating Belief Maps using train2/train.py #242

Generating Belief Maps using train2/train.py #242

Comments

BazUCD commented Apr 6, 2022

train.py FAILED

Failures: <NO_OTHER_FAILURES>

TontonTremblay commented Apr 7, 2022

BazUCD commented Apr 7, 2022

TontonTremblay commented Apr 9, 2022

andrewyguo commented Jun 7, 2022

Failures:
<NO_OTHER_FAILURES>