논문명 Convolutional Neural Network Information Fusion based on Dempster-Shafer Theory for Urban Scene Understanding
저자(소속) Masha (Mikhal) Itkina(스탠포)
학회/년도 cs231n 2017, 리포트
키워드
데이터셋(센서)/모델
참고
코드

Dempster-Shafer 증거이론은 1967 년 Arthur Dempster 가 주창하여 1976 년 Glenn Shafer 가 발전시킨 것으로 이 이론에서는 확신의 정도가 구간으로 표현되고, P(H) 와 P(¬H) 는 더하여 반드시 1 이 될 필요가 없다

Fusion Framework DST(Dempster-Shafer Theory)

Dempster-Shafer은 센서퓨전 프레임워크로, 도심환경에서 동적으로 가려진 물체를 감지 할수 있다. Dempster-Shafer theory provides a sensor fusion framework that autonomously accounts for obstacle occlusion in dynamic, urban environments.

하지만, 움직이는 물체에 대하여서는 파라미터 튜닝 작업이 필요 하다. However, to discern static and moving obstacles, the Dempster-Shafer approach requires manual tuning of parameters dependent on the situation and sensor types.

제안 방식은 Dempster-Shafer퓨젼 알고리즘을 통해 구해진 probabilistic occupancy grid를 뉴럴 네트워크 입력으로 하여 offset을 학습하고 개선된 결과가 나오게 한다. The probabilistic occupancy grid (output of the Dempster-Shafer information fusion algorithm) was provided as input to the neural network. The network then learned an offset from the original DST result to improve semantic labeling performance.

1. Introduction

  • Autonomously accounting for obstacle occlusion(가려진 장애물) is an open problem for self-driving cars.

  • 사람은 도로의 사람이 움직이거나 주차된 차가 도로로 나올지를 예측할수 있다. 자율 주행 차도 비슷한 기능을 가져야 한다. A human driver can infer that a person standing by the road may begin moving or that a parked car may pull out onto the road. An autonomous vehicle should have the capability for similar logic and reactions.

  • DST는 가려진 물체로 인하여 정보 부재시에도 decision-making strategy를 제공 한다. DempsterShafer Theory (DST) provides a decision-making strategy that addresses occlusion by modeling both lack of information and conflicting information directly [14].

  • DST는 셈서정보를 합치는 것도 할수 있다. DST can combine sensor information subject to uncertainty with semantic scene information obtained from a street-level digital map as in [14].

  • 센서digital map occupancy grids는 합쳐(Fuse)져서 잠재 위험도를 포함하고 있는 discern grid cells가 된다. Sensor and digital map occupancy grids are fused to discern grid cells that contain potential hazards (both mobile and stationary) from cells that are navigable by the vehicle.

  • 이 정보를 이용하여서 가려진 위험을 예측하게 된다. This information is stored in a holistic perception grid, which allows for the perception system to anticipate areas where occluded hazards may appear.

  • 하지만 만족할 성과를 위해서는 직접 파라미터를 튜닝 하여야 한다. However, the approach heavily relies on several parameters that require manual tuning specific to the situation in order to achieve desired behavior in detecting static and moving obstacles [14].

[14] Kurdej, M., Moras, D., Cherfaoui, V., and Bonnifait, P. Mapaided Evidential Grids for Driving Scene Understanding. IEEE Intelligent Transportation Systems Magazine, pages 30–41, 2015.
  • 제안 방식은 두 가지를 합쳐서 구현 된다. The proposed approach merges
    • the semantic segmentation framework in [15] using a fully convolutional neuralnetwork (FCN)
    • with the DST information fusion algorithm presented in [14]
  • 목적 : to increase the latter’s robustness in discerning occupancy grid cells containing static and moving objects from navigable space.

  • DST알고리즘의 입력들은 아래와 같다. The inputs to the baseline DST algorithm in [14] are

    • a LIDAR sensor grid containing LIDAR data,
    • a geographic information system (GIS) grid containing semantic map data,
    • and probabilistic occupancy grids which form the perception grid outputted by DST at the previous time-step.
  • FCN의 입력값 The input to the FCN is

    • the set of probabilistic perception grids generated by the DST algorithm at the current and previous time-steps stacked in channels.
  • 결과물 The network outputs

    • the updated perception grid for the current time-step,
    • which is a cell-by-cell classification of the local grid according to its semantic segmentation as described in Section 4.
[15] Long, J., Shelhamer, E., and Darrell, T. . Fully Convolutional Networks for Semantic Segmentation. CVPR, 2015.

인지 시스템은 occupancy grid에 기반하여 동작한다. 본 리포트는 2-D occupancy grids를 대상으로 한다. A perception framework commonly depends on an occupancy grid built in 2-D, 2.5-D, or 3-D space [14, 21, 2]. This paper will focus on approaches dealing with 2-D occupancy grids due to their similarity in spatial structure to images, allowing for direct applicability of existing deep learning algorithms.

[14] Kurdej, M., Moras, D., Cherfaoui, V., and Bonnifait, P. Mapaided Evidential Grids for Driving Scene Understanding. IEEE Intelligent Transportation Systems Magazine, pages 30–41, 2015.
[21] Rieken, J., Matthaei, R., and Maurer, M. Toward Perception Driven Urban Environment Modeling for Automated Road Vehicles. 2015 IEEE 18th International Conference on Intelligent
Transportation Systems, 2015.
[2] Azim, A. and Aycard, O. Detection, Classification and Tracking of Moving Objects in a 3D Environment. 2012 Intelligent Vehicles Symposium, 2012.
[Kurdej]
[14] Kurdej, M., Moras, D., Cherfaoui, V., and Bonnifait, P. Mapaided Evidential Grids for Driving Scene Understanding. IEEE Intelligent Transportation Systems Magazine, pages 30–41, 2015.
  • DST를 이용한 센서 퓨전 논문 One approach to scene understanding and **sensor fusion employs DST** as proposed in [14].

  • 퓨전 정보 Kurdej et al. focus on the benefits of combining evidence in the form of

    • an existing digital street-level maps
    • and sensor data
  • to naturally handle occlusion.

  • A digital map occupancy grid and a sensor occupancy grid are combined to make decisions using DST as to

    • which class a grid cell belongs to in a set of hypotheses (e.g. static, moving, infrastructure, etc.) thus forming a perception grid [14].
  • Kurdej et al do not cluster the grid cells into objects, (in contrast to some Bayesian approaches as in [8]), but rather facilitate perception based on classified grid cell information.

  • 단점 : The drawback to the algorithm proposed in [14] is that

    • 손수 파라미터 튜닝을 하여야 함 the approach relies on several parameters that require manual tuning to achieve desired behavior.
      • For instance, the discounting factor determines how quickly information is discarded.
    • The algorithm also relies on gains and increment/decrement step sizes that determine the speed with which a decision is made that an object is categorized as moving or static [14].
  • 손수 파라미터 튜닝하는 것음 강건하지 않다. Manually tuning these parameters is not a robust solution since better optimization performance could be achieved algorithmically.

[Road Type Classification with Occupancy Grids]
[23] Seeger, C., Muller, A., Schwarz, L., and Manz, M. Towards Road Type Classification with Occupancy Grids. IEEE Intelligent Vehicles Symposium 2016 Workshop: DeepDriving - Learning Representations for Intelligent Vehicles, 2016.
  • [23] also utilizes DST to fuse information from several sensors in order to perform obstacle detection.

  • In [23], sensor information is discounted based on associations to obstacles from different sensor types, which leads to a biasing of the obstacle detections to more accurate sensor data.

  • The requirements are also loosened on occupancy grid cell independence in [23] as compared to [14].

2.1 CNN을 이용한 센서 퓨젼

그럼 DST는 hand-crafted방식 처럼 과거에 쓰이던 방법인가?

Recently, several works have investigated convolutionalneural networks (CNN) as a direct means to perform sensor fusion.

[Deep Stereo Fusion]
[18] Poggi, M. and Mattoccia, S. Deep Stereo Fusion: combining multiple disparity hypotheses with deep-learning. 2016 Fourth International Conference on 3D Vision, 2016.
  • 두대의 카메라 정보가 퓨젼 되어 disparity map을 생성 In [18], the authors fuse data from **stereo cameras** with a **6-layer FCN framework** to predict a disparity map utilizing the KITTI [6] dataset for training.

  • The resulting algorithm is robust to obstacle occlusion.

[Multimodal RGB-D]
[5] Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M.,and Burgard, W. Multimodal Deep Learning for Robust RGB-D Object Recognition. 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015.
  • RGB와 깊이 정보를 퓨전되어 물체 탐지 In [5], RGB and depth information was passed through a two-stream CNN separately to successfully perform object recognition.

  • The two streams were unified with fully connected layers.

[25, 16]
[25] Yao, W., Poleswkia, P., and Krzystek, P. Classification of Urban Aerial Data Based on Pixel Labelling with Deep Convolutional Neural Networks and Logistic Regression. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLI-B7, 2016.
[16] Oh, S.-I. and Kang, H.-B. Object Detection and Classification by Decision-Level Fusion for Intelligent Vehicle Systems. Sensors 2017, 2017.
  • DST has previously been used in perception as a pre-processing information fusion step to a CNN to achieve both semantic image labeling as in [25] and object detection and classification as in [16].
    • [25] presents a custom, 4-layer CNN,
    • while [16] utilizes a pre-trained VGG-16 network for each sensor.

2.2 scene segmentation

  • There have also been some recent work in scene segmentation utilizing LIDAR occupancy grids and deep learning.

  • 최근에 라이다 데이터 획득이 쉬워 지면서 딥러닝과 연계한 연구가 많이 시도 되고 있다. Since LIDAR datasets have started to emerge for public use only recently, utilizing deep learning techniques on LIDAR data is an active area of research.

  • LIDAR 2-D occupancy grids provide a parallel with pixel-image data, since both are a 2-D representation of spatial information that can be stacked into channels.

    • [22] investigates some common CNN architectures pre-trained on the ImageNet datasetsuch as AlexNet, GoogLeNet, VGG-16 to classify cells into road types.

    • [22] determined that using networks pre-trainedon images was advantageous as compared to training custom architectures from scratch.

    • [7] utilizes LIDAR occupancy grids to discern hallways from rooms in a building with a 5-layer CNN architecture.

    • [3] uses a deep FCN with 12-convolutional layers to provide semantic labels for the grid cells discerning the road from the rest of the environment.

      • This algorithm outperforms the state-of-the-art on theKITTI dataset.
      • The advantage of FCNs is the minimized number of parameters required and the ability to maintain the spatial representation of the input throughout training.
      • [3] utilizes dilation to achieve a larger receptive field within the network, aiding in the segmentation task.
[22] Seeger, C., Manz, M., Matters, P., and Hornegger, J. Locally Adaptive Discounting in Multi Sensor Occupancy Grid Fusion. 2016 IEEE Intelligent Vehicles Symposium (IV), 2016.
[7] Goeddel, R. and Olson, E. Learning Semantic Place Labels from Occupancy Grids using CNNs. 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016.
[3] Caltagirone. Fast LIDAR-based Road Detection Using Fully Convolutional Neural Networks. CoRR, abs/1703.03613, 2017.

2.3 본 리포트에서는

  • In this paper, the classical FCN image semantic segmentation approach proposed in [15] is merged with the information fusion algorithm presented in [14] to improve the performance of the DST algorithm in discerning occupancy grid cells containing both static and moving objects from navigable space.

3. Dataset and Features

4. Methods

results matching ""

    No results matching ""