-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generating Belief Maps using train2/train.py #242
Comments
Could you share an example of json file you are using in your dataset. It looks like
point looks empty of the dimensions are wrong. @mintar refactored the data format a little bit, I did not check if it was compatible with train2/train.py? But I will try to check soon. @BazUCD Did you try to use the original training script? |
Hi @TontonTremblay thanks for the quick reply. Heres an example of my .json files with the associated png as well: I've used the original training script and generated some weights but was unable to detect anything so after your recommendation from #238 I have been trying to generate the belief maps using train2 |
This looks correct, but your object has a symmetry in it. https://github.com/NVlabs/Deep_Object_Pose/tree/master/scripts/nvisii_data_gen#handling-objects-with-symmetries you should look into this from Martin. |
I encountered a similar issue. The training script expects the In your case, you can add something like |
Hi I am attempting to run a the training script and generate the belief maps from train2/train.py in order to debug but I am getting this error:
start: 18:18:30.781464
load data: ['/home/user/Downloads/Spanner2']
load data:
training data: 2000 batches
load models
ready to train!
Traceback (most recent call last):
File "train.py", line 606, in
_runnetwork(epoch,trainingdata)
File "train.py", line 422, in _runnetwork
for batch_idx, targets in enumerate(train_loader):
File "/home/user/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/home/user/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/home/user/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/home/user/.local/lib/python3.6/site-packages/torch/_utils.py", line 434, in reraise
raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/user/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/user/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/user/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/user/catkin_pcl_new/src/dope/scripts/train2/utils_dope.py", line 321, in getitem
save=False,
File "/home/user/catkin_pcl_new/src/dope/scripts/train2/utils_dope.py", line 593, in CreateBeliefMap
p = [point[numb_point][1],point[numb_point][0]]
IndexError: list index out of range
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 17249) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/user/.local/lib/python3.6/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/home/user/.local/lib/python3.6/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/user/.local/lib/python3.6/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/user/.local/lib/python3.6/site-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/home/user/.local/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/user/.local/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
train.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2022-04-06_18:18:39
host : user-User
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 17249)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
I am unsure what is causing this error as I have the correct versions of Pytorch install based on requirements.txt. Is there any common mistakes I could be making?
The text was updated successfully, but these errors were encountered: