논문명 | Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite |
---|---|
저자(소속) | Andreas Geiger() |
학회/년도 | CVPR2012, 논문 |
키워드 | |
데이터셋(센서)/모델 | |
관련연구 | |
참고 | 홈페이지, 깃북설명 |
코드 | Download stereo 2015/flow 2015/scene flow 2015 data set (2 GB) |
기존 데이터셋 vs KITTI데이터셋
collection of large amounts of data in real time
the calibration of diverse sensors working at different rates,
the generation of ground truth minimizing the amount of supervision required
Our 3D object detection and orientation estimation benchmark is split into three parts:
First, we evaluate classical 2D object detection by measuring performance using the well established average precision (AP) metric as described in [16].
Detections are iteratively assigned to ground truth labels starting with the largest overlap, measured by bounding box intersection over union.
We require true positives to overlap by more than 50% and count multiple detections of the same object as false positives.
We assess the performance of jointly detecting objects and estimating their 3D orientation using a novel measure which we called the average orientation similarity (AOS)
r = TP / (TP + FN) : the PASCAL object detection recall, where detected 2D bounding boxes are correct if they overlap by at least 50% with a ground truth bounding box.
The orientation similarity s ∈ [0, 1] at recall r is a normalized ([0..1]) variant of the cosine similarity defined as
To penalize multiple detections which explain a single object,
섹션 2.5에서 언급한 내용을 아래 세개의 항목에 대하여 average precision
와 average orientation similarity
를 평가 하였다. We evaluate object detection as well as joint detection and orientation estimation using average precision and average orientation similarity as described in Sec. 2.5.
Our benchmark extracted from the full dataset comprises 12, 000 images with 40, 000 objects.
We first subdivide the training set into 16 orientation classes and use 100 non-occluded examples per class for training the part-based object detector of [18] using three different settings: We train the model in an unsupervised fashion (variable), by initializing the components to the 16 classes but letting the components vary during optimization (fixed init) and by initializing the components and additionally fixing the latent variables to the 16 classes (fixed).
We evaluate all non- and weakly-occluded (< 20%) objects which are neither truncated nor smaller than 40 px in height.
We do not count detecting truncated or occluded objects as false positives.
For our object detection experiment, we require a bounding box overlap of at least 50%, results are shown in Fig. 6(a).
For detection and orientation estimation we require the same overlap and plot the average orientation similarity (Eq. 5) over recall for the two unsupervised variants (Fig. 6(b)).
Note that the precision is an upper bound to the average orientation similarity.
Overall, we could not find any substantial difference between the part-based detector variants we investigated.
All of them achieve high precision, while the recall seems to be limited by some hard to detect objects.
We plan to extend our online evaluation to more complex scenarios such as semi-occluded or truncated objects and other object classes like vans, trucks, pedestrians and cyclists.
Finally, we also evaluate object orientation estimation.
We extract 100 car instances per orientation bin, using 16 orientation bins.
We compute HOG features [12] on all cropped and resized bounding boxes with 19 × 13 blocks, 8×8 pixel cells and 12 orientation bins.
We evaluate multiple classification and regression algorithms and report average orientation similarity (Eq. 5).
Table 3 shows our results.
We found that for the classification task SVMs [11] clearly outperform nearest neighbor classification.
For the regression task, Gaussian Process regression [36] performs best.