논문명 | Efficient Deep Learning for Stereo Matching |
---|---|
저자(소속) | Wenjie Luo (\toronto Uni.) |
학회/년도 | CVPPR2016, 논문 |
키워드 | Luo2016, |
데이터셋(센서)/모델 | KITTI |
관련연구 | Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches 2015, code |
참고 | 홈페이지, CVPR2016, KITTI_LB |
코드 | Code |
stereo 기반 깊이 증정의 챌린지들
-Dealing with cclusions,
기존 접근 방법들 Many approaches have been developed that try to aggregate information from local matches.
기존 방법의 cost functions 문제점 However all these approaches employ cost functions that are
learn how to match for the task of stereo estimation
[30, 28]. [30] J. Zbontar and Y. LeCun. Stereo matching by training a convolutional neural network to compare image patches. arXiv preprint arXiv:1510.05970, 2015.
[28] S. Zagoruyko and N. Komodakis. Learning to compare image patches via convolutional neural networks. In CVPR,2015
최근 CNN기반 연구는 이진 분류 문제 푸는 방식을 이용하여 네트워크 파라미터를 학습하게 한다. Current approaches learn the parameters of the matching network by treating the problem as binary classification;
Given a patch in the left image, the task is to predict if a patch in the right image is the correct match.
[29-Zbontar2015]가 좋은 성능을 보이지만 예측시 분단위 시간 필요 While [29] showed great performance in challenging benchmarks such as KITTI [11], it is computationally very expensive,requiring a minute of computation in the GPU.
This is due to the fact that they exploited a siamese architecture followed by concatenation and further processing via a few more layers to compute the final score
[29-Zbontar2015] J. Zbontar and Y. LeCun. Computing the stereo matching cost with a convolutional neural network. In CVPR, 2015
제안 방식은 초단위 예측 가능 In contrast, in this paper we propose a matching network which is able to produce very accurate results in less than a second of GPU computation.
초단위 예측을 위해 Towards this goal,
We train our network by treating the problem as multi-class classification, where the classes are all possible disparities.
Figure 1: To learn informative image patch representations we employ a siamese network which extracts marginal distributions over all possible disparities for each pixel.
KITTI이용 성능 평가 We demonstrate the effectiveness of our approach on the challenging KITTI benchmark and show competitive results when exploiting smoothing techniques.
코드 다운로드 : Our code and datacan be fond online at:http://www.cs.toronto.edu/deepLowLevelVision.
Early learning based approaches focused on correcting an initially computed matching cost [16, 17].
[16] D. Kong and H. Tao. A method for learning matching errors for stereo computation. In BMVC, 2004.
[17] D. Kong and H. Tao. Stereo matching via learning multiple experts behaviors. In BMVC, 2006
Learning has been also utilized to tune the hyper-parameters of the energy-minimization task.
Slanted plane models model groups of pixels with slanted 3D planes.
강건함이 주 목적이어서 자율주행차에 많이 사용된 They are very competitive in autonomous driving scenarios, where robustness is key.
They have a long history, dating back to [2] and were shown to be very successful on the Middleburry benchmark [22, 15,3, 24] as well as on KITTI [25, 26, 27].
[2] S. Birchfield and C. Tomasi. Multiway cut for stereo and motion with slanted surfaces. In CVPR, 1999.
[25] K. Yamaguchi, T. Hazan, D. McAllester, and R. Urtasun. Continuous markov random fields for robust stereo estimation. In ECCV, 2012.
[26] K. Yamaguchi, D. McAllester, and R. Urtasun. Robust monocular epipolar flow estimation. In CVPR, 2013. 2,
[27] K. Yamaguchi, D. McAllester, and R. Urtasun. Efficient joint segmentation, occlusion labeling, stereo and flow estimation. In ECCV. 2014.
Holistic models which solve jointly many tasks have also been explored.
장점 : many tasks in low-level and high level-vision are related, and thus one can benefit from solving them together.
For example[5-2010, 6-2011, 4-2012, 18-2014, 13-2015] jointly solved for stereo and semantic segmentation.
Guney and Geiger [12] investigated the utility of high-level vision tasks such as object recognition and semantic segmentation for stereo matching.
[12] F. Guney and A. Geiger. Displets: Resolving stereo ambiguities using object knowledge. In CVPR, 2015.
각 match의 신뢰도를 측정하는것은 중요한 요소이다. Estimating the confidence of each match is key when employing stereo estimates as a part of a pipeline.
학습 기반 신뢰도 측정 방법 Learning methods were successfully applied to this task, e.g.,
[14] R. Haeusler, R. Nair, and D. Kondermann. Ensemble learning for confidence measures in stereo vision. In CVPR, 2013
[23] A. Spyropoulos, N. Komodakis, and P. Mordohai. Learning to detect ground control points for improving the accuracy of stereo matching. In CVPR, 2014.
Convolutional neural networks(CNN) have been shown to perform very well on
image classification
, object detection
and semantic segmentation
optical flow prediction
[10]. [10] P. Fischer, A. Dosovitskiy, E. Ilg, P. Hausser, C. Hazirbas ¨ and V. Golkov. FlowNet: Learning Optical Flow with Convolutional Networks. In ICCV, 2015.
기존 연구 : In the context of stereo estimation, [29-Zbontar2015] utilize CNN to compute the matching cost between two image patches.
siamese network
which takes the same sized left and right image patches with a few fully-connected layers on top to predict the matching cost. 유사 연구 In similar spirit to [29],
[29-Zbontar2015] J. Zbontar and Y. LeCun. Computing the stereo matching cost with a convolutional neural network. In CVPR, 2015
[28] S. Zagoruyko and N. Komodakis. Learning to compare image patches via convolutional neural networks. In CVPR, 2015.
Our work is most similar to [29, 28] with two main differences.
we propose to learn a probability distribution over all disparity values using a smooth target distribution.
As a consequence we are able to capture correlations between the different disparities implicitly.
This contrasts a [29] which performs independent binary predictions on image patches.
on top of the convolution layers we use a simple dot-product layer to join the two branches of the network.
This allows us to do a orders of magnitude faster computation.
저자의 다른 연구 [30, 7]도 dot-product layer를 사용
We note that in concurrent work unpublished at the time of submission of our paper [30, 7] also introduced a dot-product layer.
[30] J. Zbontar and Y. LeCun. Stereo matching by training a convolutional neural network to compare image patches. arXiv preprint arXiv:1510.05970, 2015
[7] Z. Chen, X. Sun, L. Wang, Y. Yu, and C. Huang. A deep visual correspondence embedding model for stereo matching costs. In ICCV, 2015