2016-Efficient Deep Learning for Stereo Matching

논문명 Efficient Deep Learning for Stereo Matching
저자(소속) Wenjie Luo (\toronto Uni.)
학회/년도 CVPPR2016, 논문
키워드 Luo2016,
데이터셋(센서)/모델 KITTI
관련연구 Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches 2015, code
참고 홈페이지, CVPR2016, KITTI_LB
코드 Code

DL4SM

  • 기존 분단위 소요 시간을 초단위로 줄임
  • We train our network by treating the problem as multi-class classification, where the classes are all possible disparities

1. Introducion

  • stereo 기반 깊이 증정의 챌린지들
    -Dealing with cclusions,

    • large saturated areas and
    • repetitive patterns
  • 기존 접근 방법들 Many approaches have been developed that try to aggregate information from local matches.

    • Cost aggregation,
      • for example, averages disparity estimates in a local neighborhood.
      • Similarly, semi-global block matching and Markov random field based methods combine pixelwise predictions and local smoothness into an energy function.
  • 기존 방법의 cost functions 문제점 However all these approaches employ cost functions that are

    • hand crafted,
    • where only a linear combination of features is learned from data.

1.1 최근 CNN기반 방법들

  • 최근 CNN을 이용하여 learn how to match for the task of stereo estimation[30, 28].
[30] J. Zbontar and Y. LeCun. Stereo matching by training a convolutional neural network to compare image patches. arXiv preprint arXiv:1510.05970, 2015. 
[28] S. Zagoruyko and N. Komodakis. Learning to compare image patches via convolutional neural networks. In CVPR,2015
  • 최근 CNN기반 연구는 이진 분류 문제 푸는 방식을 이용하여 네트워크 파라미터를 학습하게 한다. Current approaches learn the parameters of the matching network by treating the problem as binary classification;

    • 외쪽 이미지 Patch에서 오른쪽 이미지 patch 예측 Given a patch in the left image, the task is to predict if a patch in the right image is the correct match.
  • [29-Zbontar2015]가 좋은 성능을 보이지만 예측시 분단위 시간 필요 While [29] showed great performance in challenging benchmarks such as KITTI [11], it is computationally very expensive,requiring a minute of computation in the GPU.

    • 이유는 siamese architecture를 사용해서 이다. This is due to the fact that they exploited a siamese architecture followed by concatenation and further processing via a few more layers to compute the final score
[29-Zbontar2015] J. Zbontar and Y. LeCun. Computing the stereo matching cost with a convolutional neural network. In CVPR, 2015

1.2 제안하는 CNN기반 방법

  • 제안 방식은 초단위 예측 가능 In contrast, in this paper we propose a matching network which is able to produce very accurate results in less than a second of GPU computation.

  • 초단위 예측을 위해 Towards this goal,

    • we exploit a product layer which simply computes the inner product between the two representations of a siamese architecture.
  • We train our network by treating the problem as multi-class classification, where the classes are all possible disparities.

    • This allows us to get calibrated scores, which result in much better matching performance when compared to [29].

Figure 1: To learn informative image patch representations we employ a siamese network which extracts marginal distributions over all possible disparities for each pixel.
  • KITTI이용 성능 평가 We demonstrate the effectiveness of our approach on the challenging KITTI benchmark and show competitive results when exploiting smoothing techniques.

  • 코드 다운로드 : Our code and datacan be fond online at:http://www.cs.toronto.edu/deepLowLevelVision.

2.1 tune the hyper-parameters

  • 초반기 학습 기반 방식들은 initially computed matching cost 보정에 초점을 맞추었다. Early learning based approaches focused on correcting an initially computed matching cost [16, 17].
[16] D. Kong and H. Tao. A method for learning matching errors for stereo computation. In BMVC, 2004. 
[17] D. Kong and H. Tao. Stereo matching via learning multiple experts behaviors. In BMVC, 2006
  • 이후 파라미터 보정을 위한 학습도 진행 되었되다. Learning has been also utilized to tune the hyper-parameters of the energy-minimization task.
    • Among the first to train these hyper-parameters were [31, 21, 19], which investigated different forms of probabilistic graphical models.

2.2 Slanted plane models

  • Slanted plane models model groups of pixels with slanted 3D planes.

  • 강건함이 주 목적이어서 자율주행차에 많이 사용된 They are very competitive in autonomous driving scenarios, where robustness is key.

  • They have a long history, dating back to [2] and were shown to be very successful on the Middleburry benchmark [22, 15,3, 24] as well as on KITTI [25, 26, 27].

[2] S. Birchfield and C. Tomasi. Multiway cut for stereo and motion with slanted surfaces. In CVPR, 1999. 
[25] K. Yamaguchi, T. Hazan, D. McAllester, and R. Urtasun. Continuous markov random fields for robust stereo estimation. In ECCV, 2012. 
[26] K. Yamaguchi, D. McAllester, and R. Urtasun. Robust monocular epipolar flow estimation. In CVPR, 2013. 2, 
[27] K. Yamaguchi, D. McAllester, and R. Urtasun. Efficient joint segmentation, occlusion labeling, stereo and flow estimation. In ECCV. 2014.

2.3 Holistic models

  • Holistic models which solve jointly many tasks have also been explored.

  • 장점 : many tasks in low-level and high level-vision are related, and thus one can benefit from solving them together.

  • For example[5-2010, 6-2011, 4-2012, 18-2014, 13-2015] jointly solved for stereo and semantic segmentation.

  • Guney and Geiger [12] investigated the utility of high-level vision tasks such as object recognition and semantic segmentation for stereo matching.

[12] F. Guney and A. Geiger. Displets: Resolving stereo ambiguities using object knowledge. In CVPR, 2015.

2.4 Estimating the confidence

  • 각 match의 신뢰도를 측정하는것은 중요한 요소이다. Estimating the confidence of each match is key when employing stereo estimates as a part of a pipeline.

  • 학습 기반 신뢰도 측정 방법 Learning methods were successfully applied to this task, e.g.,

    • by combining several confidence measures via a random forest classifier [14],
    • by incorporating random forest predictions into a Markov random field [23].
[14] R. Haeusler, R. Nair, and D. Kondermann. Ensemble learning for confidence measures in stereo vision. In CVPR, 2013
[23] A. Spyropoulos, N. Komodakis, and P. Mordohai. Learning to detect ground control points for improving the accuracy of stereo matching. In CVPR, 2014.

2.5 Zbontar2015

  • CNN은 high-level에 좋은 성능을 보였으며 최근에는 low-level에도 좋은 성능 보임 Convolutional neural networks(CNN) have been shown to perform very well on
    • high-level vision tasks such as image classification, object detection and semantic segmentation
    • low-level vision tasks such as optical flow prediction [10].
[10] P. Fischer, A. Dosovitskiy, E. Ilg, P. Hausser, C. Hazirbas ¨ and V. Golkov. FlowNet: Learning Optical Flow with Convolutional Networks. In ICCV, 2015.
  • 기존 연구 : In the context of stereo estimation, [29-Zbontar2015] utilize CNN to compute the matching cost between two image patches.

    • they used a siamese network which takes the same sized left and right image patches with a few fully-connected layers on top to predict the matching cost.
    • They trained the model to minimize a binary cross-entropy loss.
  • 유사 연구 In similar spirit to [29],

    • [28] investigated different CNN based architectures for comparing image patches.
    • They found concatenating left and right image patches as different channels works best, at the cost of being very slow.
[29-Zbontar2015] J. Zbontar and Y. LeCun. Computing the stereo matching cost with a convolutional neural network. In CVPR, 2015
[28] S. Zagoruyko and N. Komodakis. Learning to compare image patches via convolutional neural networks. In CVPR, 2015.

2.6 제안 방식과의 다른점

Our work is most similar to [29, 28] with two main differences.

[First]
  • we propose to learn a probability distribution over all disparity values using a smooth target distribution.

  • As a consequence we are able to capture correlations between the different disparities implicitly.

  • This contrasts a [29] which performs independent binary predictions on image patches.

[Second]
  • on top of the convolution layers we use a simple dot-product layer to join the two branches of the network.

  • This allows us to do a orders of magnitude faster computation.

저자의 다른 연구 [30, 7]도 dot-product layer를 사용 We note that in concurrent work unpublished at the time of submission of our paper [30, 7] also introduced a dot-product layer.

[30] J. Zbontar and Y. LeCun. Stereo matching by training a convolutional neural network to compare image patches. arXiv preprint arXiv:1510.05970, 2015
[7] Z. Chen, X. Sun, L. Wang, Y. Yu, and C. Huang. A deep visual correspondence embedding model for stereo matching costs. In ICCV, 2015

results for ""

    No results matching ""