Authors: Martin Simon, Stefan Milz, Karl Amende, Horst-Michael Gross
PDF, code-pytorch
-
The authors present a single-stage LIDAR-only model for 3D object localization of vehicles (BEV - bird's eye view).
-
The point cloud data is converted into BEV map, which they also called BEV RGB map - R: density channel, G: height channel, B: intensity channel. The BEV covers the front 80m x 40m in point cloud. The size of the covering area, i.e. grid map, is defined with
n = 2014
andm = 512
. Therefore grid resolution is aboutg = 8cm
In summary covering area is:PΩ = {P = [x, y, z]T|x ∈ [0, 40m], y ∈ [−40m, 40m], z ∈ [−2m, 1.25m]}
-
Now, YOLO-v2 is applied on BEV map which will predict object geometry(6: x, y, w, l, im, re), conf(1) and class probabilities(7). The object orientation information is also included along with location information, therefore they named it as complex-yolo. If 5 anchors are used the output map will be
W x H x C = 32 x 16 x 70
. -
As height of the object is not regressed, only object position and orientation will be detected. The orientation will be calculated based on
Im
andRe
parameters as follows:Obj_ori = arctan2(Im, Re)
-
The Complex-YOLO loss function is the extension of the YOLO loss function:
loss = loss_yolo + loss_eular