Training of a Custom Object Cont. #137

Ademord · 2020-08-11T11:40:39Z

I am trying to train on some chairs, for which I setup 4 different photorrealistic scenes in UE4.

My dataset is made of 20k images per location in the scene, being 20k x 5 x 4 = 400k images.

A sample of what my dataset looks like:

Right now the export size is 512x512. Should I be increasing the size of the pictures the Network takes in? I have an OrbitalMovement in the camera that goes from 200 to 500, so I get some really close up photos and some distant ones, as shown here:
Examples of the pictures that ROS is seeing are:

Some questions that are cluttering my mind:

Do you think maybe I trained unnecessarily in too many images? 400k?
Also I did not export depth, maybe that's a mistake ? I have trained for 20-30 epochs and my loss becomes 5 x 10 ^ -7.
Some pictures of the chair in ROS have a very dark background as in the 3rd photo, so I am worried this is just blending in ?
Should I add a RandomizeMaterialColor to make the network more robust to the colors of the chair, or is the black and metal combination good enough ?
Do you think that my distance variation is too big? (some pictures are too far and some are too close?)
Am I taking too many angles from the object?

Additional Side info:

I don't think I have to modify something in the camera.config.. ? The camera I have at the office is a Realsense with the following info posted to camera_info, which is correctly set at config.yaml.
Camera_info.yaml has remained untouched for me the past 3 months during my tests (cracker box, banana, and now chairs):

# rostopic echo /camera/color/camera_info
---
header: 
  seq: 11576
  stamp: 
    secs: 1597168869
    nsecs: 431096885
  frame_id: "camera_color_optical_frame"
height: 480
width: 640
distortion_model: "plumb_bob"
D: [0.0, 0.0, 0.0, 0.0, 0.0]
K: [617.3424072265625, 0.0, 325.4233093261719, 0.0, 617.2723999023438, 230.31985473632812, 0.0, 0.0, 1.0]
R: [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]
P: [617.3424072265625, 0.0, 325.4233093261719, 0.0, 0.0, 617.2723999023438, 230.31985473632812, 0.0, 0.0, 0.0, 1.0, 0.0]
binning_x: 0
binning_y: 0
roi: 
  x_offset: 0
  y_offset: 0
  height: 0
  width: 0
  do_rectify: False
---

# camera_info.yaml
image_width: 640
image_height: 480
camera_name: dope_webcam_0
camera_matrix:
  rows: 3
  cols: 3
  data: [641.5, 0, 320.0, 0, 641.5, 240.0, 0, 0, 1]
distortion_model: plumb_bob
distortion_coefficients:
  rows: 1
  cols: 5
  data: [0, 0, 0, 0, 0]
rectification_matrix:
  rows: 3
  cols: 3
  data: [1, 0, 0, 0, 1, 0, 0, 0, 1]
projection_matrix:
  rows: 3
  cols: 4
  data: [641.5, 0, 320.0, 0, 0, 641.5, 240.0, 0, 0, 0, 1, 0]

My config_pose.yaml is also correct, as per the dimensions shown in _object_settings.json

# config_pose.yaml
topic_camera: "/camera/color/image_raw"
topic_camera_info: "/camera/color/camera_info"
topic_publishing: "dope"
input_is_rectified: True   # Whether the input image is rectified (strongly suggested!)
downscale_height: 500      # if the input image is larger than this, scale it down to this pixel height

# Comment any of these lines to prevent detection / pose estimation of that object
weights: {
    #"cracker":"package://dope/weights/cracker_60.pth",
    # "gelatin":"package://dope/weights/gelatin_60.pth",
    # "meat":"package://dope/weights/meat_20.pth",
    # "mustard":"package://dope/weights/mustard_60.pth",
    # "soup":"package://dope/weights/soup_60.pth",
    #"sugar":"package://dope/weights/sugar_60.pth",
    # "bleach":"package://dope/weights/bleach_28_dr.pth"
    # "banana":"package://dope/weights/banana_39.pth"
     "chair":"package://dope/weights/chair.pth"
}

# Cuboid dimension in cm x,y,z
dimensions: {
    "cracker": [16.403600692749023,21.343700408935547,7.179999828338623],
    "gelatin": [8.918299674987793, 7.311500072479248, 2.9983000755310059],
    "meat": [10.164673805236816,8.3542995452880859,5.7600898742675781],
    "mustard": [9.6024150848388672,19.130100250244141,5.824894905090332],
    "soup": [6.7659378051757813,10.185500144958496,6.771425724029541],
    "sugar": [9.267730712890625,17.625339508056641,4.5134143829345703],
    "bleach": [10.267730712890625,26.625339508056641,7.5134143829345703],
    "banana": [19.717399597167969, 3.8649001121520996, 7.4065999984741211],
    "chair": [64.41, 97.75, 65.13],
    "teat": [605.098388671875, 362.48529052734375, 337.88888549804688],
}

class_ids: {
    "cracker": 1,
    "gelatin": 2,
    "meat":    3,
    "mustard": 4,
    "soup":    5,
    "sugar":   6,
    "banana":  7,
    "chair": 8,
    "teat": 9,
}

draw_colors: {
    "cracker": [13, 255, 128],  # green
    "gelatin": [255, 255, 255],  # while
    "meat": [0, 104, 255],  # blue
    "mustard": [217,12, 232],  # magenta
    "soup": [255, 101, 0],  # orange
    "sugar": [232, 222, 12],  # yellow
    "banana": [232, 222, 12],  # yellow
    "chair": [232, 222, 12],  # yellow
    "teat": [232, 222, 12],  # yellow
}

# optional: provide a transform that is applied to the pose returned by DOPE
model_transforms: {
#    "cracker": [[ 0,  0,  1,  0],
#                [ 0, -1,  0,  0],
#                [ 1,  0,  0,  0],
#                [ 0,  0,  0,  1]]
}

# optional: if you provide a mesh of the object here, a mesh marker will be
# published for visualization in RViz
# You can use the nvdu_ycb tool to download the meshes: https://github.com/NVIDIA/Dataset_Utilities#nvdu_ycb
meshes: {
#    "cracker": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/003_cracker_box/google_16k/textured.obj",
#    "gelatin": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/009_gelatin_box/google_16k/textured.obj",
#    "meat":    "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/010_potted_meat_can/google_16k/textured.obj",
#    "mustard": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/006_mustard_bottle/google_16k/textured.obj",
#    "soup":    "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/005_tomato_soup_can/google_16k/textured.obj",
#    "sugar":   "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/004_sugar_box/google_16k/textured.obj",
#    "bleach":  "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/021_bleach_cleanser/google_16k/textured.obj",
}

# optional: If the specified meshes are not in meters, provide a scale here (e.g. if the mesh is in centimeters, scale should be 0.01). default scale: 1.0.
mesh_scales: {
    "cracker": 0.01,
    "gelatin": 0.01,
    "meat":    0.01,
    "mustard": 0.01,
    "soup":    0.01,
    "sugar":   0.01,
    "banana":  0.01,
    "chair":   0.01,
    "teat":    0.01,
}

# Config params for DOPE
thresh_angle: 0.5
thresh_map: 0.01
sigma: 3
thresh_points: 0.1

SceneCapturer info:

Thanks for taking the time to help me
Friendly regards,

The text was updated successfully, but these errors were encountered:

mintar · 2020-08-11T14:05:52Z

The camera info published by the realsense camera looks correct.

However, the dimensions are wrong. The config_pose.yaml file expects centimeters, and you chair is definitely not 2147 x 3258 x 2171 centimeters big. Maybe multiply it with the 0.03 value from the fixed_model_transform?

By the way: Don't take photos from your mobile phone if possible.

If it's text, like in the terminal, it's better to copy-paste that text. You can surround it with triple backticks to format it nicely (google "markdown code block").
If it's an image topic, you can run rosrun image_view image_saver image:=<your image topic> to directly save images.

Ademord · 2020-08-11T18:16:40Z

I have updated my previous post with the images exported as you suggested, the phone pictures into direct text and the cuboid_dimensions multiplied by .03, but it is still shows detected_objects=[].
What would be the next debugging step?

The size of the images ROS takes is 640x480 as per the exports by rosrun image_view

TontonTremblay · 2020-08-11T18:24:00Z

Can you look at the belief maps?

…

On Tue, Aug 11, 2020 at 11:16 AM Franco Ribera ***@***.***> wrote: I have updated my previous post with the images exported as you suggested, the phone pictures into direct text and the cuboid_dimensions multiplied by .03, but it is still shows detected_objects=[]. What would be the next debugging step? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#137 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABK6JIAUIP23UEKBDB7NDXDSAGDJTANCNFSM4P26GEXQ> .

Ademord · 2020-08-11T19:37:14Z

I am not sure how to do this.
After researching for a bit it seems that I should I add this function to the detection.py, correct?

def ShowBelief(tensor,name='tmp'):
    for j in range(tensor.size()[0]):

        belief = tensor[j].clone()
        belief -= float(torch.min(belief)[0].data.cpu().numpy())
        belief /= float(torch.max(belief)[0].data.cpu().numpy())  

        belief = m(belief.unsqueeze(0).unsqueeze(0)).squeeze().squeeze().data 
        belief = torch.clamp(belief,0,1)  
        belief = torch.cat([belief.unsqueeze(0),belief.unsqueeze(0),belief.unsqueeze(0)]).unsqueeze(0)

        temp = Variable(belief.clone())
        cv2.imwrite('belief_{}_v{}.png'.format(str(name),str(j)),temp.data.squeeze().cpu().numpy().transpose(1,2,0)*255)

TontonTremblay · 2020-08-11T19:57:38Z

I am working on a new version of DOPE and it includes this debug.

Ademord · 2020-08-11T22:10:00Z

I upgraded my dope to UE 4.24 and that's what I've been working with. Haven't finalized upgrading to 4.25...

Related to the thread, I will export tomorrow first thing in the morning my dataset again with multi-colored materials and better camera angles and start training. While it does that I will add this function to the code so I can see the maps you are looking for...
I am scared this still won't work and I will be lost in the forest.

mintar · 2020-08-12T10:58:04Z

It's probably best to first show the belief maps before retraining. The belief maps are useful because there's a two-step process during inference:

network generates belief maps
solvePnP calculates object pose from belief maps

Visualizing the belief maps helps narrow down the problem. If the belief maps look crappy, you know that you need to retrain the network. If the belief maps look ok, you know that the problem is elsewhere (camera_info, wrong bounding box dimensions, ...). I suspect your "dimensions" are still not correct.

If you're going to export your dataset again, please make double sure that the scaling and bounding box center of your model is correct. I've always found it easiest to use meters (or centimeters) for the mesh and to move the mesh origin to the center of the bounding box.

In meshlab, you can visualize this via "render -> show box corners". For example:

Here's the corresponding _object_settings.json:

{
	"exported_object_classes": [
		"getriebelager"
	],
	"exported_objects": [
		{
			"class": "getriebelager",
			"segmentation_class_id": 155,
			"segmentation_instance_id": 13841190,
			"fixed_model_transform": [
				[ 0, 0, 100, 0 ],
				[ 0, -100, 0, 0 ],
				[ 100, 0, 0, 0 ],
				[ 0, 0, 0, 1 ]
			],
			"cuboid_dimensions": [ 10.569199562072754, 26.933900833129883, 14.880499839782715 ]
		}
	]
}

And here's my config_pose.yaml (abbreviated):

topic_camera: "/ensenso/rgb/image_rect_color"
topic_camera_info: "/ensenso/rgb/camera_info"
topic_publishing: "dope"
input_is_rectified: True
downscale_height: 400

weights: {
    "getriebelager":"package://hybrit_dope/weights/getriebelager_68.pth"
}

# Cuboid dimension in cm x,y,z
dimensions: {
    "getriebelager": [10.569199562072754, 26.933900833129883, 14.880499839782715]
}

class_ids: {
    "getriebelager": 1
}

draw_colors: {
    "getriebelager": [228, 26 , 28]
}

# optional: provide a transform that is applied to the pose returned by DOPE
model_transforms: {
    "getriebelager":            [[ 0,  0,  1,  0],
                                 [ 0, -1,  0,  0],
                                 [ 1,  0,  0,  0],
                                 [ 0,  0,  0,  1]],
}

# optional: if you provide a mesh of the object here, a mesh marker will be published for visualization in RViz
meshes: {
  "getriebelager":            "package://hybrit_objects/meshes/getriebelager/getriebelager.stl",
}

# Config params for DOPE
thresh_angle: 0.5
thresh_map: 0.01
sigma: 3
thresh_points: 0.1

# Depth image based post processing
# uncomment topic_depth to enable post processing
topic_depth: "/ensenso/depth/image_rect"
# maximum depth distance (in meters) between recognized pose and true pose in depth image
depth_tolerance: 0.30
# Minimum ratio between valid points (= not NaN, within depth_tolerance) in actual depth image and expected points in
# in rendered depth image. Detections below min_valid_points_ratio will be discarded. In practice, false positives
# almost always have 0 valid points, whereas true positives range between 0.02 and 0.70 (depending on object and
# camera).
min_valid_points_ratio: 0.01

You can see that:

the mesh origin is at the bounding box center: "Center" in meshlab shows 0, 0, 0 and the last line of fixed_model_transform is [ 0, 0, 0, 1 ]
my mesh is in meters, so that's where the scaling factor "100" in fixed_model_transform comes from
my x, y, z axes are switched around, that's why the 3x3 rotation matrix in fixed_model_transform looks like this (modulo the scaling factor):
```
[ 0,  0, 1 ]
[ 0, -1, 0 ]
[ 1,  0, 0 ]
```
That's also why I need the model_transforms in my config_pose.yaml. If the axes weren't switched around, the matrix would be the identity matrix, and I wouldn't need to specify anything in model_transforms. But please ignore model_transforms for now, it only affects the rotation of the output; you can play around with it once you recognize objects.
If you take the last two points together, this also explains the following discrepancy (scaling factor 100, x and z are switched):
- Meshlab "Size": 0.1488, 0.2693, 0.1057
- config_pose.yaml dimensions (rounded): 10.57, 26.93, 14.88
  Basically, if you take the mesh bbox size as a vector and multiply it with fixed_model_transform, you get the dimensions.

Ademord · 2020-08-12T11:44:30Z

Thanks a lot for this reply! I clearly had some export discrepancies.
I installed meshlab and looked at my old export, it was way off since it included a plane, which messed up everything.
I modified the mesh in blender and exported it properly.

The way it was:

I had to modify the origin in blender to something else than 0 0 0 to get 0 0 0 in Meshlab interestingly (struggled 40min), even though my origin of the object is in the world origin.

Now it looks like this:

mintar · 2020-08-12T12:09:16Z

Now that looks much better! Did you re-measure if the dimensions fit your actual chair? Just to be sure.

BTW, you can also center the object directly in Meshlab by using Filters -> Normals, Curvature and Orientation -> Transform: Translate, Center, Set Origin -> Transformation: Center on BBox.

Ademord · 2020-08-12T12:13:47Z

Thanks! I actually measured the chairs in the office this time to make a realistic object, not just "some 3D model", hehe!
Yes I found that out as well how to recenter on Meshlab but they dont export easily somewhy into Unreal, since they dont support FBX and Unreal didnt want to take an STL or a PLY, and when it did, it didnt include the UVs (which by the way gave me a lot of problems back then). So I just fixed it from Blender 👍

I am replacing the actors' meshes with the new chairs and then will reexport + retrain

Ademord · 2020-08-13T16:21:08Z

I have been training now for a day, it's at
Train Epoch: 13 [138000/160000 (86%)] Loss: 0.000000180193666
I tried to test it on the chairs in the office with no results. I will make a rosbag out of the NDDS_Capture (video on a 360) and feed it to ros to see how DOPE reacts. I will do the feature maps visualization tomorrow too, to see how it goes...

Crossing fingers so that by epoch 60 we see better results...

My current question: is 13 epochs and such a low loss a good/bad sign, and 13 epochs too soon to test on ROS?
Anyway, I will come back tomorrow with the feature maps, and will also attach some samples of the dataset as it looks now.

mintar · 2020-08-13T17:19:44Z

My current question: is 13 epochs and such a low loss a good/bad sign

Bad sign.

, and 13 epochs too soon to test on ROS?

Not too soon.

Looking forward to the belief maps!

Ademord · 2020-08-18T06:25:39Z

Hello guys
I have finally gotten the belief maps!
It looks terrible and I don't know how to debug this further...

What the camera sees:

The Belief maps generated:

I don't know how to proceed from now on... I thought my dataset was well made.

mintar · 2020-08-18T07:21:04Z

Could you please upload a small sample of your dataset (like 100 random images, your mesh + all the relevant yaml files) somewhere?

Ademord · 2020-08-18T08:03:34Z

Hope this works:
https://www.dropbox.com/sh/xmv2isxjp2kfn46/AAAZv_N5N_xnnZuf3uaEaITDa?dl=0

mintar · 2020-08-18T10:59:53Z

Thanks! Could you also upload the chair STL model?

Ademord · 2020-08-18T11:22:14Z

Done!
I will upload as well my temple scene.

mintar · 2020-08-18T13:01:04Z

just the stl model of the chair would be enough

Ademord · 2020-08-18T13:17:07Z

It's all there 👍

mintar · 2020-08-18T15:08:16Z

Something is still seriously wrong with your fixed_model_transform. Run nvdu_viz from here and press F5, you should see the yellow mesh toggle on and off (you'll have to convert your mesh into .obj (e.g., using MeshLab) and put it into the proper path; if it's not there you'll get an error message in the terminal that shows you where to put it.

With your fixed_model_transform in , nothing happens:

			"fixed_model_transform": [
				[ -0.65170001983642578, 0, -0.75840002298355103, 0 ],
				[ 0.75840002298355103, 0, -0.65170001983642578, 0 ],
				[ 0, -1, 0, 0 ],
				[ 985.652587890625, -71.237998962402344, 2792.017578125, 1 ]
			],

I have fixed it:

			"fixed_model_transform": [
				[ 0, 0, 1, 0 ],
				[ -1, 0, 0, 0],
				[ 0, -1, 0, 0 ],
				[0, 0, 0, 1]
			],

Now you can switch on and off the mesh:

However, I'm not sure if that's a problem. After all, your other json files look ok, particularly the pose_transform and projected_cuboid values.

Maybe better delete the *.cs.png files from your dataset, I'm not sure if that confuses DOPE.

Also maybe try again without the color randomization.

Other than that, I'm a bit lost. Maybe the problem is that most projected corners do not lie on (or near) the object (see my screenshot above). @TontonTremblay : Do you think that's the problem?

Ademord · 2020-08-19T08:07:03Z

Ok so I have learned that the way to "fix" the fixed_model_transform is to check the Auto-start capturing option on the NDDS project in UE4.
I used to hit Play, and then manually Start Capture. This modifies the initial transform because the object is rotated on the first capture moment, but when we do "Auto Start Capturing", the first one is the standing "correct one". .. (If I am correct.. )

With this setup the fixed model transform looks like this

"fixed_model_transform": [
				[ 0, 0, 1, 0 ],
				[ -1, 0, 0, 0 ],
				[ 0, -1, 0, 0 ],
				[ -2230, -50, -1950, 1 ]
			],

Where the last row is for the coordinates of where in the world the object stands, I can manually modify that one I guess..? Since the network doesn't need to know the object is not in the "center" of the world/scene

I will retrain then with these new setup you indicate and without color randomization (maybe subtle randomization black-dark gray colors).

Edit: So the fixed_model_transform was still being affected because it didn't always grab the first frame (slow CPU?) where the Chair was still and the RandomRotation component kicked in and "messed everything up". I just removed the RandomRotation component to get the correct 1, -1, -1 transform. Will manually adjust the last row of the matrix.

Do you think the Segmentation Class Id should be the same across all json files?

Ademord · 2020-08-20T07:02:06Z

By epoch 3, the loss is already very low.. I will wait a bit more, and then generate the feature maps again..
Train Epoch: 3 [0/160000 (0%)] Loss: 0.000000076063948

Maybe given the point you made about

Other than that, I'm a bit lost. Maybe the problem is that most projected corners do not lie on (or near) the object (see my screenshot above)

I can try fixing it by training on the chair model without the recliner, so that it is at the level of the arm-rests, I will play around with this. I also will share a dropbox link with the new/current dataset sample and config again at the end of the day when I test.

mintar · 2020-08-20T08:38:55Z

I can try fixing it by training on the chair model without the recliner, so that it is at the level of the arm-rests, I will play around with this.

Maybe it's better if you only train on the black parts of the chair. The metal parts are very thin, so that's probably the bigger problem.

Ademord · 2020-08-24T07:23:08Z

I have been training and I think I found the error since the loss is looking a lot more normal (0.0025 is good?).

Will test today after 20 epochs and show u results + belief maps.

By the way do we have any kind of records of at which Hz speed does Dope work?

Ademord · 2020-08-24T21:29:05Z

So we have some progress this time, after 30 epochs!
I got a detection!! apparently but now this error comes out..

These are my belief maps, they look good! (i think?)

Ademord · 2020-08-24T21:59:53Z

Okay so i tracked down the error my openCV version is 4.2 and the error is in line
elif CuboidPNPSolver.cv2majorversion == 3:
since there is no support in the cuboid detector.py for version 4.2
What would be an algorithm for this version to solve pnp?

I downgraded opencv to 3.4 and i got this:

Do you think its because im at home and i am playing from a rosbag? I missed a couple values DKG from the camera_info :/ will try this in a couple of days. I need to finish another task before thursday.

Either way this is good news, im happy for all the patience and support, we will see how this ends up this week :)

mintar · 2020-08-25T07:37:36Z

I have been training and I think I found the error since the loss is looking a lot more normal (0.0025 is good?).

Yes, that looks good. Also your belief maps look perfect! How did you fix the problem?

Okay so i tracked down the error my openCV version is 4.2 and the error is in line
elif CuboidPNPSolver.cv2majorversion == 3:

That should still work in OpenCV 4, so you could try changing that line to

elif CuboidPNPSolver.cv2majorversion >= 3:

I downgraded opencv to 3.4 and i got this:
'location': [nan, nan, nan]

That still looks like the solvePnP step failed for some reason. DOPE is tested with OpenCV 3.3.1. It's probably best if you just use the Docker image, at least for now.

Ademord · 2020-08-25T08:35:14Z

I did that but still gave me an error.
I do think it's because my camera info is missing the D K R P values. I will re-export my rosbag on Thursday.

Related to how I made it work...
I found out that I wasn't training on the correct class... That's why the loss was always giving me 0.000003 or something low. Because it was training on no object. Face-palm.

I falsely assumed from the documentation that the class name given in TagActor is something independent. I assumed that the folder name would be the class name given in the --object parameter.
I don't know if it was obvious but I got mislead, and I feel like an idiot.

Before (assumed it was folder name)
train.py --object 044_chair_80k

After (or just skip the whole parameter since I'm only feeding one object in the folders)
train.py --object chair

I have been using the docker image to train (cluster), and testing the network directly on a separate laptop (host, not container) which has ROS, etc.

Ademord · 2020-08-28T12:49:09Z

So I have been able to log in to my machine (not in the office yet) and the network has trained nicely and always returns something :) here are the belief maps, i dont know why v4 and v5 always look a bit funny but the rest look good right?
This is output from running in the rosbag, i will run with the camera in the office later today I hope

mintar · 2020-08-28T16:41:48Z

Yes, this looks good. Probably v4 and v5 are the two bbox corners that are "in the air" furthest away from the chair, so they are the hardest to predict. But I believe these results should be more than good enough for object detection.

Ademord · 2020-09-05T13:34:31Z

Hello!
Good news! this is my current result

I dont know how to make the mesh that overlays smaller, nor zoom out..... I couldnt take a screenshot where the chair mesh showing on the left, but it covers the entire camera window.

mintar · 2020-09-05T15:54:58Z

I dont know how to make the mesh that overlays smaller

That's easy: In your config_pose, provide the scaling factor to meters:

# optional: If the specified meshes are not in meters, provide a scale here (e.g. if the mesh is in centimeters, scale should be 0.01). default scale: 1.0.
mesh_scales: {
    "cracker": 0.01,
    "gelatin": 0.01,
    "meat":    0.01,
    "mustard": 0.01,
    "soup":    0.01,
    "sugar":   0.01,
    "bleach":  0.01,
}

Ademord · 2020-09-08T11:01:05Z

Good news! only thing now is that it is rotated, i will find the axis...

mintar · 2020-09-08T12:47:35Z

Congratulations! When you have found the transformation, you should put it into your model_transforms in the config_pose.yaml.

Do you mind uploading the trained weights and your mesh file somewhere? We have the same chairs in our office and I would love to try it out myself.

Ademord · 2020-09-09T07:25:16Z

Hello! Of course!
I made a small video of the results, you can see how it performs and also with two chairs.

Here is the dropbox with the videos of my performance and the weights + config: https://www.dropbox.com/sh/dz5wybaqf2evzco/AABBMsfJXeLK46FUBzra5KzUa?dl=0

Let me know if you have any feedback. Is the visualization where it tries to fit the mesh with the real model always blinking like this?

It would be cool if there was some post-processing step. Do you know if can I enable the # Depth image based post processing?

Ademord · 2020-09-09T07:27:21Z

Also with two chairs (multiple objects), is it supposed to show multiple poses at the same time? since it only shows one at a time (second video).

Ademord · 2020-09-09T13:51:27Z

Additionally, sometimes the mask export doesn't work for me, some 3d models are properly exported with a white mask, and the others, for example the base models (this is a cylinder) from UE4, do not export a mask (as shown below, they export black and merge with background == no mask?)

Edit: The only way I got this to work 👍 and to export me a mask is to go back to blender and merge the "new" object with an "old" object, then export the .fbx and import to UE4, then the model has "some property i dont know about which old objects have and new ones dont" and the mask shows when exporting (but then the cylinder model inside UE4 should have it...). Made an issue to ask about it in the meantime at: NVIDIA/Dataset_Synthesizer#86 but I am not very hopeful they will answer.
Also, I am working with UE4 2.24.

TontonTremblay · 2020-09-10T03:13:18Z

@thangt Hey thang have you seen this problem before? Very cool results. This is very cool to see things come together like this :P

mintar · 2020-09-10T07:57:31Z

Very nice! Could you please also upload your chair.stl (or even better, a mesh with colors)?

Let me know if you have any feedback. Is the visualization where it tries to fit the mesh with the real model always blinking like this?

I fixed that in #120, but @TontonTremblay still has to merge that PR. (@TontonTremblay: could you please also merge #121 while you're at it?).

It would be cool if there was some post-processing step. Do you know if can I enable the # Depth image based post processing?

That's not merged into master yet, and I haven't opened a pull request since it's a rather large change and I don't know if @TontonTremblay wants it here. So for now, you'll have to use the "hybrit" branch of my fork:

https://github.com/mintar/Deep_Object_Pose/tree/hybrit

This will mainly correct the depth value of the objects and remove some false positives (like the huge chair around 0:30 of 20200908_205148.mp4):

Also with two chairs (multiple objects), is it supposed to show multiple poses at the same time? since it only shows one at a time (second video).

Yes, it's supposed to detect both. You can see that it briefly detects both at around 0:58:

DOPE can definitely work better than in your videos with tabletop objects, but I guess your objects are quite hard to detect by DOPE, so I'm happy it works as well as it does.

Ademord · 2020-09-10T09:03:55Z

Could you please also upload your chair.stl (or even better, a mesh with colors)?

I have added an STL and an FBX into the dropbox link so you get both just in case, should be colored but they never arrived colored into UE4 (materials import was quite an adventure).

you'll have to use the "hybrit" branch of my fork:

okay i will check this one out then, I guess it is a 4.22 version? I will have to update it to 4.24.

Also, it would be cool to be able to test the results after training without ROS per se, I saw a related post but I am trying to implement a solution where a docker-compose file creates a container that replays a rosbag, and then another one which feeds from that and processes the feed (say, dope or some other network).

mintar · 2020-09-10T09:15:14Z

okay i will check this one out then, I guess it is a 4.22 version? I will have to update it to 4.24.

It doesn't matter. The UE4 version is only relevant for generating the training data. Once you have the images, you can use any DOPE version you want with it.

Ademord · 2020-09-10T09:36:39Z

Ah, woops.

Ademord · 2020-09-22T13:01:39Z

Hello
So I tested with the hybrit branch and these are my results, which look amazing!
https://www.dropbox.com/sh/2jk8ye8pzp6zwiw/AABRY-kfDLMKRIeXAa_KT7vea?dl=0
Did u have a chance to test this on your side?

I measured the prediction time and it averages around 470-560 ms.
I measured with a decorator at:

    @staticmethod
    @timing
    def detect_object_in_image(net_model, pnp_solver, in_img, config):

This is a scary question but, is there a way to speed it up ? I am wondering if a prediction time of 100ish ms is possible...

Edit: I might be testing on a CPU, i will double check.
Python 2 timing:

def timing(f):
    def wrap(*args):
        time1 = time.time()
        ret = f(*args)
        time2 = time.time()
        print '%s function took %0.3f ms' % (f.func_name, (time2-time1)*1000.0)
        return ret
    return wrap

Python 3 timing:

def timing(f):
    def wrap(*args, **kwargs):
        time1 = time.time()
        ret = f(*args, **kwargs)
        time2 = time.time()
        print('{:s} function took {:.3f} ms'.format(f.__name__, (time2-time1)*1000.0))

        return ret
    return wrap

Ademord · 2020-09-22T13:50:14Z

Also I see that there is an X11 mount on the run-with-docker, but I dont find how to VNC into the container? Should I install a vnc server myself etc?

TontonTremblay · 2020-09-22T18:04:37Z

re: #137 (comment)

I just merged the MR, I am sorry I did not do it ealier. Your problems with flickering should be solved now.

Ademord · 2020-09-22T23:23:30Z

I tried to mess around with NDDS and I am a bit confused about the GroupActorManager (GAM), and I can't find docs for it.
I made a GAM for the TrainingGroup and one for Distractors.

Questions:

what actor class do I use? I am using Test_AnnotatedActor_BP with MeshOverride for a lack of better info.
how do I use RandomMaterial component if I use a GroupActorManager? I was randomizing the leg colors and this one fixes the legs of the chairs to green as shown below.

My settings:

Ademord · 2020-10-07T13:12:48Z

Kind of to provide an update to everyone:

Related to the question above, I ended up finding a way to modify the mesh of the objects generated but decided to not go into that direction, based on my doubts below.

My dataset generation takeaways so far:

🔵 Setup1: static chairs + OrbitalMovement camera = DOPE worked

🔵 Setup2: RandomRotation, RandomMovement chairs+ OrbitalMovement camera = DOPE worked BUT the results come out as None None None None quaternions...

🔵 Setup3 (current): Mixed the RandomRotation+ static datasets generated and I am training again.

I refuse strongly to believe that we cannot train with a RandomRotation.

❓ Question 1
My biggest doubt about this is what we mentioned about the ``fixed_transform```:
@mintar Correct me if I am wrong: this fixed transform is the initial, base position of the object (chair)
AND if I have a RandomRotation dope should still be able to detect a Pose (in contrast to what I have experienced with the NoneNoneNone? Otherwise it means training with RandomRotation is not possible == the object must remain static in the 3D scene?

❓ Question 2
Also, I have been starting to test my networks at 10-20 epochs, is this too early?
The loss looks like:

Train Epoch: 15 [0/696182 (0%)]  Loss: 0.008487645536661
Train Epoch: 15 [20000/696182 (3%)]      Loss: 0.007497824262828
Train Epoch: 15 [40000/696182 (6%)]      Loss: 0.007296111900359
Train Epoch: 15 [60000/696182 (9%)]      Loss: 0.007062429096550
Train Epoch: 15 [80000/696182 (11%)]     Loss: 0.007186851929873
Train Epoch: 15 [100000/696182 (14%)]    Loss: 0.007435663603246
Train Epoch: 15 [120000/696182 (17%)]    Loss: 0.006747611332685

🔵 What losses have you achieved yourself on your previous trainings?
🔵 When the loss "stabilizes" and doesnt go down from epochs 15 to 20 (stays from 0.006 and .007) should I stop or is it still learning? Because I stop and test and I get no results, and here is where I panic.
In other words: I have no way to compare if I should stop and test, or if I should stop and go back to look / adjust my dataset because my dope did not detect anything after 15 epochs (debugging guidance required).

TontonTremblay mentioned this issue Oct 2, 2020

Training on Custom Object using NDDS does not detect object after training #141

Open

Ixion46 mentioned this issue Oct 20, 2020

Training on new Object #145

Closed

Training of a Custom Object Cont. #137

Training of a Custom Object Cont. #137

Comments

Ademord commented Aug 11, 2020 • edited Loading

mintar commented Aug 11, 2020 • edited Loading

Ademord commented Aug 11, 2020 • edited Loading

TontonTremblay commented Aug 11, 2020 via email

Ademord commented Aug 11, 2020

TontonTremblay commented Aug 11, 2020

Ademord commented Aug 11, 2020 • edited Loading

mintar commented Aug 12, 2020

Ademord commented Aug 12, 2020 • edited Loading

mintar commented Aug 12, 2020

Ademord commented Aug 12, 2020

Ademord commented Aug 13, 2020 • edited Loading

mintar commented Aug 13, 2020

Ademord commented Aug 18, 2020 • edited Loading

mintar commented Aug 18, 2020 • edited Loading

Ademord commented Aug 18, 2020

mintar commented Aug 18, 2020

Ademord commented Aug 18, 2020 • edited Loading

mintar commented Aug 18, 2020

Ademord commented Aug 18, 2020

mintar commented Aug 18, 2020

Ademord commented Aug 19, 2020 • edited Loading

Ademord commented Aug 20, 2020 • edited Loading

mintar commented Aug 20, 2020

Ademord commented Aug 24, 2020 • edited Loading

Ademord commented Aug 24, 2020

Ademord commented Aug 24, 2020 • edited Loading

mintar commented Aug 25, 2020

Ademord commented Aug 25, 2020 • edited Loading

Ademord commented Aug 28, 2020

mintar commented Aug 28, 2020

Ademord commented Sep 5, 2020

mintar commented Sep 5, 2020

Ademord commented Sep 8, 2020 • edited Loading

mintar commented Sep 8, 2020

Ademord commented Sep 9, 2020 • edited Loading

Ademord commented Sep 9, 2020 • edited Loading

Ademord commented Sep 9, 2020 • edited Loading

TontonTremblay commented Sep 10, 2020

mintar commented Sep 10, 2020

Ademord commented Sep 10, 2020 • edited Loading

mintar commented Sep 10, 2020

Ademord commented Sep 10, 2020

Ademord commented Sep 22, 2020 • edited Loading

Ademord commented Sep 22, 2020

TontonTremblay commented Sep 22, 2020

Ademord commented Sep 22, 2020

Ademord commented Oct 7, 2020 • edited Loading

Ademord commented Aug 11, 2020 •

edited

Loading

mintar commented Aug 11, 2020 •

edited

Loading

Ademord commented Aug 11, 2020 •

edited

Loading

Ademord commented Aug 11, 2020 •

edited

Loading

Ademord commented Aug 12, 2020 •

edited

Loading

Ademord commented Aug 13, 2020 •

edited

Loading

Ademord commented Aug 18, 2020 •

edited

Loading

mintar commented Aug 18, 2020 •

edited

Loading

Ademord commented Aug 18, 2020 •

edited

Loading

Ademord commented Aug 19, 2020 •

edited

Loading

Ademord commented Aug 20, 2020 •

edited

Loading

Ademord commented Aug 24, 2020 •

edited

Loading

Ademord commented Aug 24, 2020 •

edited

Loading

Ademord commented Aug 25, 2020 •

edited

Loading

Ademord commented Sep 8, 2020 •

edited

Loading

Ademord commented Sep 9, 2020 •

edited

Loading

Ademord commented Sep 9, 2020 •

edited

Loading

Ademord commented Sep 9, 2020 •

edited

Loading

Ademord commented Sep 10, 2020 •

edited

Loading

Ademord commented Sep 22, 2020 •

edited

Loading

Ademord commented Oct 7, 2020 •

edited

Loading