논문명 | 3D Bounding Box Estimation Using Deep Learning and Geometry |
---|---|
저자(소속) | Arsalan Mousavian (GMU) |
학회/년도 | 2016~2017, 논문 |
키워드 | KITTI,Pascal 3D+, 카메라 1대, 2D이미지에 3D BBox 적용하기 |
참고 | 3D Vehicle Detection, Github |
코드 | Keras, TF, pyTorch |
목적 : 3D object detection and pose estimation from a single image
The first network : output estimates the 3D object orientation using a novel hybrid discrete-continuous loss, which significantly outperforms the L2 loss.
The second : output regresses the 3D object dimensions, which have relatively little variance compared to alternatives and can often be predicted for many object types.
3D object detection recovers both the 6 DoF pose and the dimensions of an object from an image.
6자유도(degrees of freedom, 6DOF) 중 3자유도는 Position(위치)이며 나머지 3자유도는 Orientation(자세)이라 한다
목적 : we propose a method that estimates
our method is based on several important insights
We introduce three additional performance metrics measuring the 3D box accuracy:
In summary, the main contributions of our paper include:
기존 전제 : we use the fact that the perspective projection of a 3D bounding box should fit tightly within its 2D detection window.
We assume that the 2D object detector has been trained to produce boxes that correspond to the bounding box of the projected 3D box.
The 3D bounding box is described by
Given the pose of the object in the camera coordinate frame $$(R, T) \in SE(3)$$ and the camera intrinsics matrix K
, the projection of a 3D point $$X_0 = [X, Y, Z, 1]^T$$ in the object’s coordinate frame into the image $$x = [x, y, 1]^T$$ is:
$$ x=k[R T]X_0
$$
Assuming that
D
are known제약 : The constraint that the 3D bounding box fits tightly into 2D detection window requires that each side of the 2D bounding box to be touched by the projection of at least one of the 3D box corners.
This point-to side correspondence constraint results in the equation:
x
coordinate from the perspective projection. 나머지 $$x{max} , y{min}, y{max}$$ 도 같은 공식으로 유추 할수 있다. Similar equations can be derived for the remaining 2D box side parameters $$x{max} , y{min}, y{max}$$ .
In total the sides of the 2D bounding box provide four constraints on the 3D bounding box.
This is not enough to constrain the nine degrees of freedom (DoF) (three for translation, threefor rotation, and three for box dimensions).
There are several different geometric properties we could estimate from the visual appearance of the box to further constrain the 3D box.
The main criteria is that they should be tied strongly to the visual appearance and further constrain the final 3D box.