Gt utilities (princeton-vl#83)

* Update GT utilities. * Overhaul GT visualization for blender's built-in GT * Refactor. * Flip camera pose axes to be consistent with computer vision. * Save camera parameters during render step, not during GT. * Fix bug with blender's built-in segmentation masks. * Flatten object data json. Remove redundant information. * Make built-in GT metadata and OpenGL metadata consistent. * Misc. * Update GroundTruthAnnotations.md * Update GroundTruthAnnotations.md * Update GroundTruthAnnotations.md * Update README.md * Update requirements.txt * Update GroundTruthAnnotations.md * Update GroundTruthAnnotations.md * Update GroundTruthAnnotations.md
tatsuya-ishihara · Jul 3, 2023 · aa22d4d · aa22d4d
1 parent 97b8a41
commit aa22d4d
Show file tree

Hide file tree

Showing 20 changed files with 354 additions and 192 deletions.
diff --git a/GroundTruthAnnotations.md b/GroundTruthAnnotations.md
@@ -1,20 +1,28 @@
 # Ground-Truth Annotations
 
+### Changelog
+
+- 07/03/23: Add specification for Blender's built-in annotations. Save built-in annotations as numpy arrays. Add more information to Objects_XXXX_XX_XX.json. Significant changes to built-in segmentation masks (fixes & filenames). Improve visualizations for built-in annotations. Always save camera parameters in frames/. Update docs.
+
+### Agenda
+
+- Save forward and backward flow for both built-in and advanced annotations.
+- Compute flow occlusion using forward-backward consistency.
+- Export scene geometry in .ply format.
+
+**Want annotations that we don't currently support? [Fill out a request!](https://github.com/princeton-vl/infinigen/issues/new?assignees=&labels=&projects=&template=request.md&title=%5BREQUEST%5D)**
+
 ## Default Annotations from Blender
 
 Infinigen can produce some dense annotations using Blender's built-in render passes. Users may prefer to use these annotations over our extended annotation system's since it requires only the bare-minimum installation. It is also able to run without a GPU.
 
 These annotations are produced when using the `--pipeline_configs blender_gt` ground truth extraction config in [manage_datagen_jobs.py](/README.md#generate-images-in-one-command), or can be done manually as shown in the final step of the [Hello-World](/README.md#generate-a-scene-step-by-step) example.
 
-### Specification
+## Advanced Annotation Pipeline :large_blue_diamond:
 
-**Coming Soon**
+We also provide a separate pipeline for extracting the full set of annotations from each image or scene. Features only supported using this annotation method will be denoted with :large_blue_diamond:.
 
-## OpenGL-Based Annotation Pipeline
-
-We also provide a separate pipeline for extracting the full set of annotations from each image or scene.
-
-This section will allow you to use our own `--pipeline_configs opengl_gt` ground truth extraction config, which provides additional labels such as occlusion boundaries, sub-object segmentation, 3D flow and easy 3D bounding boxes. If you do not need these features, we recommend using the [default annotations](#default-annotations-from-blender). This section is intended for computer vision researchers and power-users. 
+This will allow you to use our own `--pipeline_configs opengl_gt` ground truth extraction config, which provides additional labels such as occlusion boundaries, sub-object segmentation, 3D flow and easy 3D bounding boxes. If you do not need these features, we recommend using the [default annotations](#default-annotations-from-blender). This section is intended for computer vision researchers and power-users. 
 
 ### Installation
 
@@ -77,7 +85,7 @@ python tools/summarize.py outputs/helloworld # creating outputs/helloworld/summa
 python tools/ground_truth/segmentation_lookup.py outputs/helloworld 1 --query cactus
 ```
 
-### Specification
+## Specification
 
 **File structure:**
 
@@ -103,7 +111,7 @@ The resulting `<output-folder>/summary.json` will contains all file paths in the
 `<rig>` and `<sub-cam>` are typically both "00" in the monocular setting; `<file-ext>` is typically "npy" or "png" for the the actual data and the visualization, respectively; `<frame>` is a 0-padded 4-digit number, e.g. "0013". `<type>` can be "SurfaceNormal", "Depth", etc. For example
 `summary_json["SurfaceNormal"]["npy"]["00"]["00"]["0001"]` -> `'frames/SurfaceNormal_0001_00_00.npy'`
 
-*Note: Currently our ground-truth has only been tested for the aspect-ratio 16-9.*
+*Note: Currently our advanced ground-truth has only been tested for the aspect-ratio 16-9.*
 
 **Depth**
 
@@ -126,6 +134,8 @@ python tools/ground_truth/rigid_warp.py <folder> <first-frame> <second-frame>
 
 Surface Normals are stored as a 1080 x 1920 x 3 32-bit floating point numpy array.
 
+The coordinate system for the surface normals is +X -> Right, +Y -> Up, +Z Backward.
+
 *Path:* `summary_json["SurfaceNormal"]["npy"]["00"]["00"]["0001"]` -> `frames/SurfaceNormal_0001_00_00.npy`
 
 *Visualization:* `summary_json["SurfaceNormal"]["png"]["00"]["00"]["0001"]` -> `frames/SurfaceNormal_0001_00_00.png`
@@ -134,7 +144,7 @@ Surface Normals are stored as a 1080 x 1920 x 3 32-bit floating point numpy arra
 <img src="images/gt_annotations/SurfaceNormal_0001_00_00.png" width="400" />
 </p>
 
-**Occlusion Boundaries**
+### Occlusion Boundaries :large_blue_diamond:
 
 Occlusion Boundaries are stored as a 2160 x 3840 png, with 255 indicating a boundary and 0 otherwise.
 
@@ -144,14 +154,16 @@ Occlusion Boundaries are stored as a 2160 x 3840 png, with 255 indicating a boun
 <img src="images/gt_annotations/OcclusionBoundaries_0001_00_00.png" width="400" />
 </p>
 
-**Optical Flow / Scene Flow**
+### Optical Flow
 
 Optical Flow / Scene Flow is stored as a 2160 x 3840 x 3 32-bit floating point numpy array.
 
 *Note: The values won't be meaningful if this is the final frame in a series, or in the single-view setting.*
 
 Channels 1 & 2 are standard optical flow. Note that the units of optical flow are in pixels measured in the resolution of the *original image*. So if the rendered image is 1080 x 1920, you would want to average-pool this array by 2x.
 
+**3D Motion Vectors** :large_blue_diamond:
+
 Channel 3 is the depth change between this frame and the next.
 
 To see an example of how optical flow can be used to warp one frame to the next, run
@@ -162,61 +174,70 @@ python tools/ground_truth/optical_flow_warp.py <folder> <frame-number>
 
 *Path:* `summary_json["Flow3D"]["npy"]["00"]["00"]["0001"]` -> `frames/Flow3D_0001_00_00.npy`
 
-*Visualization:* `summary_json["Flow3D"]["png"]["00"]["00"]["0001"]` -> `frames/ObjectSegmentation_0001_00_00.png`
+*Visualization:* `summary_json["Flow3D"]["png"]["00"]["00"]["0001"]` -> `frames/Flow3D_0001_00_00.png`
+
+For the built-in versions, replace `Flow3D` with `Flow`.
 
-**Optical Flow Occlusion**
+### Optical Flow Occlusion :large_blue_diamond:
 
 The mask of occluded pixels for the aforementioned optical flow is stored as a 2160 x 3840 png, with 255 indicating a co-visible pixel and 0 otherwise.
 
 *Note: This mask is computed by comparing the face-ids on the triangle meshes at either end of each flow vector. Infinigen meshes often contain multiple faces per-pixel, resulting in frequent false-negatives (negative=occluded). These false-negatives are generally distributed uniformly over the image (like salt-and-pepper noise), and can be reduced by max-pooling the occlusion mask down to the image resolution.*
 
 *Path/Visualization:* `summary_json["Flow3DMask"]["png"]["00"]["00"]["0001"]` -> `frames/Flow3DMask_0001_00_00.png`
 
-**Camera Intrinsics**
+### Camera Intrinsics
 
 Infinigen renders images using a pinhole camera model. The resulting camera intrinsics for each frame are stored as a 3 x 3 numpy matrix.
 
-*Path:* `summary_json["Camera Intrinsics"]["npy"]["00"]["00"]["0001"]` -> `saved_mesh/frame_0001/cameras/K_0001_00_00.npy`
+*Path:* `summary_json["Camera Intrinsics"]["npy"]["00"]["00"]["0001"]` -> `frames/K_0001_00_00.npy`
 
-**Camera Extrinsics**
+### Camera Extrinsics
 
 The camera pose is stored as a 4 x 4 numpy matrix mapping from camera coordinates to world coordinates.
 
 As is standard in computer vision, the assumed world coordinate system in the saved camera poses is +X -> Right, +Y -> Down, +Z Forward. This is opposed to how Blender internally represents geometry, with flipped Y and Z axes.
 
-*Path:* `summary_json["Camera Pose"]["npy"]["00"]["00"]["0001"]` -> `saved_mesh/frame_0001/cameras/T_0001_00_00.npy`
+*Path:* `summary_json["Camera Pose"]["npy"]["00"]["00"]["0001"]` -> `frames/T_0001_00_00.npy`
 
-**Panoptic Segmentation and 3D Bounding Boxes**
+### Panoptic Segmentation
 
 Infinigen saves three types of semantic segmentation masks: 1) Object Segmentation 2) Tag Segmentation 3) Instance Segmentation
 
-*Object Segmentation* distinguishes individual blender objects, and is stored as a 2160 x 3840 32-bit integer numpy array. The association between each integer in the mask and the related object is stored in Objects_XXXX_XX_XX.json. The definition of "object" is imposed by Blender; generally large or complex assets such as the terrain, trees, or animals are considered one singular object, while a large number of smaller assets (e.g. grass, coral) may be grouped together if they are using instanced-geometry for their implementation.
+*Object Segmentation* distinguishes individual blender objects, and is stored as a 2160 x 3840 32-bit integer numpy array. Each integer in the mask maps to an object in Objects_XXXX_XX_XX.json with the same value for the `"object_index"` field. The definition of "object" is imposed by Blender; generally large or complex assets such as the terrain, trees, or animals are considered one singular object, while a large number of smaller assets (e.g. grass, coral) may be grouped together if they are using instanced-geometry for their implementation.
 
-*Tag Segmentation* distinguishes vertices based on their semantic tags, and is stored as a 2160 x 3840 64-bit integer numpy array. Infinigen tags all vertices with an integer which can be associated to a list of semantic labels in `MaskTag.json`. Compared to Object Segmentation, Infinigen's tagging system is less automatic but much more flexible. Missing features in the tagging system are usually possible and straightforward to implement, wheras in the automaically generated Object Segmentation they are not. 
+*Instance Segmentation* distinguishes individual instances of a single object from one another (e.g. separate blades of grass, separate ferns, etc.), and is stored as a 2160 x 3840 32-bit integer numpy array. Each integer in this mask is the *instance-id* for a particular instance, which is unique for that object as defined in the Object Segmentation mask and Objects_XXXX_XX_XX.json.
 
-*Instance Segmentation* distinguishes individual instances of a single object from one another (e.g. separate blades of grass, separate ferns, etc.), and is stored as a 2160 x 3840 32-bit integer numpy array. Each integer in this mask is the *instance-id* for a particular instance, which is unique for that object as defined in the Object Segmentation mask and Objects_XXXX_XX_XX.json. The list of **3D bounding boxes** for each instance are also defined in the `Objects_XXXX_XX_XX.json`.
 
 *Paths:*
 
 `summary_json["ObjectSegmentation"]["npy"]["00"]["00"]["0001"]` -> `frames/ObjectSegmentation_0001_00_00.npy`
 
-`summary_json["TagSegmentation"]["npy"]["00"]["00"]["0001"]` -> `frames/TagSegmentation_0001_00_00.npy`
-
 `summary_json["InstanceSegmentation"]["npy"]["00"]["00"]["0001"]` -> `frames/InstanceSegmentation_0001_00_00.npy`
 
 `summary_json["Objects"]["json"]["00"]["00"]["0001"]` -> `frames/Objects_0001_00_00.json`
 
-`summary_json["Mask Tags"][<frame>]` -> `fine/MaskTag.json`
-
 *Visualizations:*
 
 `summary_json["ObjectSegmentation"]["png"]["00"]["00"]["0001"]` -> `frames/ObjectSegmentation_0001_00_00.png`
 
-`summary_json["TagSegmentation"]["png"]["00"]["00"]["0001"]` -> `frames/TagSegmentation_0001_00_00.png`
-
 `summary_json["InstanceSegmentation"]["png"]["00"]["00"]["0001"]` -> `frames/InstanceSegmentation_0001_00_00.png`
 
-Generally, most useful panoptic segmentation masks can be constructed by combining the aforementioned three arrays in some way. As an example, to visualize the 2D and 3D bounding boxes for objects with the *blender_rock* semantic tag in the hello world scene, run 
+#### **Tag Segmentation** :large_blue_diamond:
+
+*Tag Segmentation* distinguishes vertices based on their semantic tags, and is stored as a 2160 x 3840 64-bit integer numpy array. Infinigen tags all vertices with an integer which can be associated to a list of semantic labels in `MaskTag.json`. Compared to Object Segmentation, Infinigen's tagging system is less automatic but much more flexible. Missing features in the tagging system are usually possible and straightforward to implement, wheras in the automaically generated Object Segmentation they are not. 
+
+*Paths:*
+
+`summary_json["TagSegmentation"]["npy"]["00"]["00"]["0001"]` -> `frames/TagSegmentation_0001_00_00.npy`
+
+`summary_json["Mask Tags"][<frame>]` -> `fine/MaskTag.json`
+
+*Visualization:*
+
+`summary_json["TagSegmentation"]["png"]["00"]["00"]["0001"]` -> `frames/TagSegmentation_0001_00_00.png`
+
+Generally, most useful panoptic segmentation masks can be constructed by combining the aforementioned three arrays in some way. As an example, to visualize the 2D and [3D bounding boxes](#object-metadata-and-3d-bounding-boxes) for objects with the *blender_rock* semantic tag in the hello world scene, run 
 ```
 python tools/ground_truth/segmentation_lookup.py outputs/helloworld 1 --query blender_rock --boxes
 python tools/ground_truth/bounding_boxes_3d.py outputs/helloworld 1 --query blender_rock
@@ -247,3 +268,32 @@ python tools/ground_truth/segmentation_lookup.py outputs/helloworld 1 --query wa
 <p align="center">
 <img src="images/gt_annotations/caves.png" width="400" /> <img src="images/gt_annotations/warped_rocks.png" width="400" />
 </p>
+
+### Object Metadata and 3D bounding boxes
+
+Each item in `Objects_0001_00_00.json` also contains other metadata about each object:
+```
+# Load object meta data
+object_data = json.loads((Path("output/helloworld") / summary_json["Objects"]["json"]["00"]["00"]["0001"]).read_text())
+
+# select nth object
+obj = object_data[n]
+
+obj["children"] # list of object indices for children
+obj["object_index"] # object index, for lookup in the Object Segmentation mask
+obj["num_verts"] # number of vertices
+obj["num_faces"] # number of faces (n-gons, not triangles)
+obj["name"] # obvious
+obj["unapplied_modifiers"] # names of unapplied blender modifiers
+obj["materials"] # materials used
+```
+
+More fields :large_blue_diamond:
+```
+obj["tags"] # list of tags which appear on at least one vertex 
+obj["min"] # min-corner of bounding box, in object coordinates
+obj["max"] # max-corner of bounding box, in object coordinates
+obj["model_matrices"] # mapping from instance-ids to 4x4 obj->world transformation matrices
+```
+
+The **3D bounding box** for each instance can be computed using `obj["min"]`, `obj["max"]`, `obj["model_matrices"]`. For an example, refer to [the bounding_boxes_3d.py example above](#tag-segmentation-large_blue_diamond).
diff --git a/README.md b/README.md
@@ -97,11 +97,13 @@ Install [WSL2](https://infinigen.org/docs/installation/intro#setup-for-windows)
  :warning: **Known issue** : We are actively fixing an issue which causes commands not to be reproducible on many platforms. The same command may produce multiple rearranged scenes with different runtimes and memory requirements.
 
 <p align="center">
-  <img src="images/Image0048_00_00.png" width="330" />
-  <img src="images/Depth0048_00_00.png" width="330" /> 
+  <img src="images/Image0048_00_00.png" width="350" />
+  <img src="images/Depth0048_00_00.png" width="350" />
+  <img src="images/SurfaceNormal_0001_00_00.png" width="350" />
+  <img src="images/InstanceSegmentation_0001_00_00.png" width="350" />
 </p>
 
-This guide will show you how to generate an image and it's corresponding depth ground-truth, similar to those shown above.
+This guide will show you how to generate an image and it's corresponding ground-truth, similar to those shown above.
 
 #### Generate a scene step by step
 Infinigen generates scenes by running multiple tasks (usually executed automatically, like in [Generate image(s) in one command](#generate-images-in-one-command)). Here we will run them one by one to demonstrate. These commands take approximately 10 minutes and 16GB of memory to execute on an M1 Mac or Linux Desktop.
@@ -125,6 +127,10 @@ $BLENDER -noaudio --background --python generate.py -- --seed 0 --task render -g
 
 Output logs should indicate what the code is working on. Use `--debug` for even more detail. After each command completes you can inspect it's `--output_folder` for results, including running `$BLENDER outputs/helloworld/coarse/scene.blend` or similar to view blender files. We hide many meshes by default for viewport stability; to view them, click "Render" or use the UI to unhide them.
 
+#### [Extended ground-truth & docmentation](./GroundTruthAnnotations.md)
+
+We also provide a (optional) separate pipeline for extracting the full set of annotations from each image or scene. Refer to [GroundTruthAnnotations.md](./GroundTruthAnnotations.md) for compilation instructions, data format specifications and an extended "Hello World".
+
 #### Generate image(s) in one command
 
 We provide `tools/manage_datagen_jobs.py`, a utility which runs these or similar steps automatically.

diff --git a/images/InstanceSegmentation_0001_00_00.png b/images/InstanceSegmentation_0001_00_00.png
diff --git a/images/SurfaceNormal_0001_00_00.png b/images/SurfaceNormal_0001_00_00.png
diff --git a/process_mesh/blender_object.cpp b/process_mesh/blender_object.cpp
@@ -73,10 +73,15 @@ json BaseBlenderObject::compute_bbox(const std::vector<unsigned int> &indices, c
     const std::set<int> unique_tags(tag_lookup.begin(), tag_lookup.end());
 
     json output = {
-        {"model matrices", json_serializable_model_matrices},
+        {"model_matrices", json_serializable_model_matrices},
         {"tags", std::vector<int>(unique_tags.begin(), unique_tags.end())},
-        {"name", name},
-        {"object index", obj_index}
+        {"name", info.name},
+        {"num_verts", info.num_verts},
+        {"num_faces", info.num_faces},
+        {"children", info.children},
+        {"materials", info.materials},
+        {"unapplied_modifiers", info.unapplied_modifiers},
+        {"object_index", info.index}
     };
 
     if ((num_verts > 0) && ((max - min).norm() > 1e-4)){
@@ -91,7 +96,7 @@ json BaseBlenderObject::compute_bbox(const std::vector<unsigned int> &indices, c
 }
 
 BaseBlenderObject::BaseBlenderObject(const BufferArrays &current_buf, const BufferArrays &next_buf, const std::vector<InstanceID> &instance_ids, const ObjectInfo& object_info, const ObjectType tp, int attrib_stride)
- : num_verts(current_buf.indices.size()), type(tp), name(object_info.name), num_instances(instance_ids.size()), obj_index(object_info.index) {
+ : num_verts(current_buf.indices.size()), type(tp), info(object_info), num_instances(instance_ids.size()) {
 
         const std::vector<Eigen::Matrix4f> &model_matrices = current_buf.get_instances(instance_ids);
         const std::vector<Eigen::Matrix4f> &model_matrices_next = next_buf.get_instances(instance_ids);
@@ -172,7 +177,7 @@ MeshBlenderObject::~MeshBlenderObject(){}
 
 void MeshBlenderObject::draw(Shader &shader) const {
     const auto t1 = std::chrono::high_resolution_clock::now();
-    shader.setInt("object_index", obj_index);
+    shader.setInt("object_index", info.index);
     glBindVertexArray(VAO);
     glDrawElementsInstanced(GL_LINES_ADJACENCY, num_verts, GL_UNSIGNED_INT, 0, num_instances);
     glCheckError();

diff --git a/process_mesh/blender_object.hpp b/process_mesh/blender_object.hpp
@@ -17,18 +17,25 @@ using json = nlohmann::json;
 
 struct ObjectInfo
 {
-    int index, num_instances;
+    int index, num_instances, num_faces, num_verts;
     std::string name, type, mesh_id, npz_filename;
+    std::vector<int> children;
+    std::vector<std::string> materials, unapplied_modifiers;
 
     ObjectInfo(){}
 
-    ObjectInfo(nlohmann::json_abi_v3_11_2::json instance_item) :
-    name(instance_item["object_name"].get<std::string>()),
-    type(instance_item["object_type"].get<std::string>()),
-    index(instance_item["object_idx"].get<int>()),
-    num_instances(instance_item["num_instances"].get<int>()),
-    mesh_id(instance_item["mesh_id"].get<std::string>()),
-    npz_filename(instance_item["filename"].get<std::string>()){}
+    ObjectInfo(const json instance_item) :
+    name(instance_item["object_name"]),
+    type(instance_item["object_type"]),
+    index(instance_item["object_idx"]),
+    num_instances(instance_item["num_instances"]),
+    num_faces(instance_item["num_faces"]),
+    num_verts(instance_item["num_verts"]),
+    children(instance_item["children"]),
+    materials(instance_item["materials"]),
+    unapplied_modifiers(instance_item["unapplied_modifiers"]),
+    mesh_id(instance_item["mesh_id"]),
+    npz_filename(instance_item["filename"]){}
 
 };
 
@@ -54,8 +61,7 @@ class BaseBlenderObject
 
     json bounding_box;
     const ObjectType type;
-    const std::string name;
-    const int obj_index;
+    const ObjectInfo info;
 
     BaseBlenderObject(const BufferArrays &current_buf, const BufferArrays &next_buf, const std::vector<InstanceID> &instance_ids, const ObjectInfo& object_info, const ObjectType tp, int attrib_stride);
     virtual ~BaseBlenderObject();