논문명 | Confidence Based Pedestrian Tracking in Unstructured Environments Using 3D Laser Distance Measurements |
---|---|
저자(소속) | () |
학회/년도 | 2014, 논문 |
Citation ID / 키워드 | Velodyne HDL-64E, SVM, 파티클 필터 |
데이터셋(센서)/모델 | |
관련연구 | K. Kidono, Pedestrian recognition using high-definition LIDAR. In Proceedings, 2011 |
참고 | |
코드 | ROS |
we address the problem of tracking multiple pedestrians in unstructured 3D point clouds by extracting and discarding additional candidates from vegetation and other structures.
Our approach
Tracking is performed by using a particle filter
차량 탐지 대비 보행자 탐지가 어려운 이유 Pedestrians are challenging obstacles due to
본 논문에서는
본 논문의 기여 our own contribution consists of
Camera-based approaches ([6] for an overview) mostly address the problem by detecting and tracking pedestrians in single images or in video streams.
[6] P. Dollar, C. Wojek, B. Schiele, and P. Perona. Pedestrian Detection: An Evaluation of the State of the Art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4):743–761, 2012.
In addition, several Laser-based approaches emerged over the past years using the data of 2D or 3D laser range finders (LRFs).
Breitenstein et al. [2] present a tracking-by-detection approach for multiple persons using particle filters in color image series.
[2] M. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier, and L. Gool. Robust Tracking-by-Detection using a Detector Confidence Particle Filter. In Proceedings of the IEEE International Conference on Computer Vision, pages 1515–1522, 2009
Petrovskaya and Thrun [16] present an approach to detect and track vehicle with a particle filter in 3D data.
[16] A. Petrovskaya and S. Thrun. Model Based Vehicle Detection and Tracking for Autonomous Urban Driving. Autonomous Robots, Special Issue: Selected papers from Robotics: Science and Systems, 26(2–3):123–139, 2009.
Scholer et al. [19] use a Velodyne HDL-64E LRF to detect and track people in 3D point clouds.
[19] F. Scholer, J. Behley, V. Steinhage, D. Schulz, and A.B. Cremers. Person Tracking in Three-Dimensional Laser Range Data with Explicit Occlusion Adaption. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 1297–1303, 2011.
Spinello et al. [21] present a combined bottom-up topdown detector for pedestrians in Velodyne HDL-64E data in urban outdoor environments.
[21] L. Spinello, M. Luber, and K. Arras. Tracking People in 3D Using a Bottom-Up Top-Down Detector. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 1304–1310, 2011.
Navarro-Serment et al. [15] use geometric and motion features to detect and track pedestrians while driving in outdoor regions.
[15] L. Navarro-Serment, C. Mertz, and M. Hebert. Pedestrian Detection and Tracking Using Three-dimensional LADAR Data. International Journal of Robotics Research, Special Issue: Seventh International Conference on Field and Service Robots, 29(12):1516–1528, 2010.
Kidono et al. [11] extend features of [15] by a slice feature and by reflection intensities of their Velodyne HDL-64E.
[11] K. Kidono, T. Miyasaka, A. Watanabe, T. Naito, and J. Miura. Pedestrian recognition using high-definition LIDAR. In Proceedings of the IEEE Intelligent Vehicles Symposium, pages 405–410, 2011.
Thornton et al. [23] present a multi-sensor approach including a 3D LRF for human detection and tracking in cluttered environments.
[23] S. Thornton, M. Hoffelder, and D. Morris. Multi-sensor Detection and Tracking of Humans for Safe Operations with Unmanned Ground Vehicles. In IEEE Workshop on Human Detection from Mobile Platforms, pages 103–112, 2008.
A probabilistic person detector on multiple layers of 2D laser range scans classified using AdaBoost [8] is presented by Mozos et al. [14].
[14] O. Mozos, R. Kurazume, and T. Hasegawa. Multi-Part People Detection Using 2D Range Data. International Journal of Social Robotics, 2(1):31–40, 2010.
Premebida et al. [17] present a laser-based pedestrian detection system and focus on information extraction from LRFs.
[17] C. Premebida, O. Ludwig, and U. Nunes. Exploiting LIDAR-based Features on Pedestrian Detection in Urban Scenarios. In Proceedings of the International IEEE Conference on Intelligent Transportation Systems, pages 405–410, 2009.
Carballo et al. [3] fuse multiple 2D LRFs on two layers to detect pedestrians in uncluttered indoor environments.
[3] A. Carballo, A. Ohya, and S. Yuta. Fusion of Double Layered Multiple Laser Range Finders for People Detection from a Mobile Robot. In Proceedings of the IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, pages 677–682, 2008.
Gidel et al. [9] present a pedestrian detection and tracking approach using an LRF system on multiple layers, too.
[9] S. Gidel, P. Checchin, C. Blanc, T. Chateau, and L. Trassoudaine. Pedestrian Detection and Tracking in an Urban Environment Using a Multilayer Laser Scanner. IEEE Transactions on Intelligent Transportation Systems, 11(3):579–588, 2010.
Sato et al. [18] use an LRF with six scanning layers to track pedestrians with a Kalman filter in urban environments.
[18] S. Sato, M. Hashimoto, M. Takita, K. Takagi, and T. Ogawa. Multilayer lidar-based pedestrian tracking in urban environments . In Intelligent Vehicles Symposium, pages 849–854, 2010.
정의 : The Task of ground removal is to separate obstacle points from ground points (cf. [11], [15]) in order to reduce the computational load and to perform a first selection of candidates.
[11] K. Kidono, T. Miyasaka, A. Watanabe, T. Naito, and J. Miura. Pedestrian recognition using high-definition LIDAR. In Proceedings of the IEEE Intelligent Vehicles Symposium, pages 405–410, 2011.
[15] L. Navarro-Serment, C. Mertz, and M. Hebert. Pedestrian Detection and Tracking Using Three-dimensional LADAR Data. International Journal of Robotics Research, Special Issue: Seventh International Conference on Field and Service Robots, 29(12):1516–1528, 2010.
높이 정보를 기준으로 하면 실패 가능성이 높다. In unstructured environments, an assumption of a planar ground surface with a fixed ground height is likely to fail.
In our approach,
The ground removal algorithm yields groups of cells which contain obstacles.
Afterwards, those groups need to be prepared for feature extraction and classification.
On the one hand, the clustering algorithm needs to cluster obstacle cells to groups of sufficiently large 3D distance measurements which represent pedestrian candidates.
On the other hand, the algorithm needs to split up large groups of 3D points in order to separate pedestrians from other obstacles which are close to them, including trees, building and other pedestrians.
Our approach consists of two steps and aims to find a high number of pedestrian candidates even if they are located close to other objects.
We explicitly choose not to discard anything in the slightest similarity to a human.
Thus, our next step is to further analyze clusters that are too large to represent a single candidate, which in turn yields increased runtime.
In order to keep the overall runtime low, several algorithms were inspected.
k-mean 방법은 k값과, 초기설정이 필요한 단점이 있다. Approaches like k-means [13] exhibit fast runtimes but the original algorithm requires
해결책 #1 : The k-means++ extension [1] improves the initial distribution and provides faster runtimes.
해결책 # 2 : The problem of an unknown k is solved by the dp-means algorithm [12], which we use and that starts with k = 1 and increments it, if a cluster grows too large, and in addition exhibits fast runtimes.
[1] D. Arthur and S. Vassilvitskii. K-means++: The Advantages of Careful Seeding. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1027–1035, 2007
[12] B. Kulis and M. Jordan. Revisiting k-means: New Algorithms via Bayesian Nonparametrics. In Proceedings of the International Conference on Machine Learning, pages 513–520, 2012.
A resulting separated cluster is exemplary shown in Fig. 1 where the replaced cluster is framed in blue and the new clusters are framed in green.
The aforementioned step results in an over-clustering of large obstacles, which are now separated wrt. the extent of pedestrians.
Hence, our next step is to re-merge nearby clusters without a gap between them.
The distance between the centers of two clusters is divided into eight histogram bins and if the two bins in the middle contain less than 80% of the measurements compared to the average number of measurements in all bins, a gap is detected.
An example of re-merged clusters is shown in Fig. 2. Here, the new smaller clusters cannot be separated adequately and the original group is maintained.
This strategy has the advantage of finding pedestrian candidates close to any other obstacles at the expense of an increased runtime and the possibility of more false-positives.
Our feature vector per cluster consists of 8 different features introduced in the literature,
Those features are the 3D covariance matrix of a cluster, the normalized moment of inertia tensor, the 2D covariance matrix in different zones (cf. [15]), the normalized 2D histogram for the main plane and the normalized 2D histogram for the secondary plane.
In another approach, Kidono et al. [11] introduce two additional features.
[17] C. Premebida, O. Ludwig, and U. Nunes. Exploiting LIDAR-based Features on Pedestrian Detection in Urban Scenarios. In Proceedings of the International IEEE Conference on Intelligent Transportation Systems, pages 405–410, 2009
[15] L. Navarro-Serment, C. Mertz, and M. Hebert. Pedestrian Detection and Tracking Using Three-dimensional LADAR Data. International Journal of Robotics Research, Special Issue: Seventh International Conference on Field and Service Robots, 29(12):1516–1528, 2010
[11] K. Kidono, T. Miyasaka, A. Watanabe, T. Naito, and J. Miura. Pedestrian recognition using high-definition LIDAR. In Proceedings of the IEEE Intelligent Vehicles Symposium, pages 405–410, 2011.
분류 목적 : 이진 분류 (사람 or NOT) The task of the classifier is to perform a binary classification
between pedestrian and non-pedestrian as precise as possible.
알고리즘 :SVM + RBF 커널 As proposed by Kidono et al. [11], we use a SVM with radial basis function (RBF) Kernel [4], [5] together
[11] K. Kidono, T. Miyasaka, A. Watanabe, T. Naito, and J. Miura. Pedestrian recognition using high-definition LIDAR. In Proceedings
[4] C.-C. Chang and C.-J. Lin. LIBSVM: A library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27:1–27:27, 2011.
[5] C. Cortes and V. Vapnick. Support-Vector Networks. Machine Learning, 20(3):273–297, 1995.
Feature : 본 논문 3.3에서 언급한 특징들
학습 데이터 : For training the SVM we annotated pedestrians in different datasets from the campus Koblenz of the University of Koblenz-Landau,
학습 알고리즘 : For this work, we used the LIBSVM library from Chang and Lin [4] that computes a separating hyperplane which discriminates between the pedestrian and non-pedestrian.
[4] C.-C. Chang and C.-J. Lin. LIBSVM: A library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27:1–27:27, 2011.
분류 이후 작업 : Our classifier returns a vector of confidence values in [0, 1] for each class how likely a candidate belongs to the class.
Our initial idea for the tracking was to use virtual 2D scans and was inspired by Petrovskaya and Thrun [16].
With an adopted measurement model we observed that, with a high resolution 2D scan, many pedestrians could be successfully tracked.
Unlike, in urban scenarios varying inclination angles of the terrain and many occlusions, e.g. caused by vegetation, are problematic.
Hence, we developed a new measurement model and decided to focus on an sophisticated interaction of the detection algorithm with the particle filter.
This allows our approach to discard hypotheses early and enables it to deal efficiently with false detection that occur frequently in unstructured environments.
Our measurement model approximates pedestrian geometry as cylindrical shape of non-zero depth (cf. Fig. 4).
The likelihood of range readings is modeled according to three different regions omitting the height value.
A region of free space in form of an outer cylinder is modeled around an inner cylinder with one half facing the sensor and the other half facing away.
The majority of range readings are expected to fall in the region facing the sensor (green cylinder-half).
A minority is expected on the other side (yellow cylinderhalf) as humans represent solid objects and laser rays passing a human occur infrequently, e.g. during limb movement.
The region around the pedestrian is expected to contain few to none points.
Hence, the measurement models separates neighboring obstacles adequately while taking pedestrians very close to other objects into account, too.
Tracking is performed using a Rao-Blackwellized particle filter [7] with 40 particles for each target where we estimate target extent separately for each positional hypothesis.
For the measurement model we follow the derivations of [16]. Each pedestrian hypothesis consists of a 2D position, an orientation, a rotation angle wrt. the sensor, a velocity, and a circular extent. The velocity compensates for pedestrian movement by applying a model of constant velocity due to the small possible change within one rotation of the LRF. In case of a false detection, e.g., caused by a tree or a bush, a particle filter would be initiated and remain on the target until it disappears from view. Since false detections occur inevitably in the target domain, we sought a solution to handle them. Confidence-based approaches (in images [2], [22]) additionally grade system estimations to reinforce correct inferences. In our approach, a particle filter is discarded if either no re-detection occurs for a predefined time or if re-detections occur but the confidence is insufficiently low to maintain the target. The first criteria allows continuous tracking in case of occlusions for a short period of time and the second criteria ensures that no particle filter remains on invalid targets for a longer period of time. In other words, the idea is that many low-confidence detections or few highconfidence detections are both able to maintain a target, even if the tracker yields inaccurate results.
We presented an approach for pedestrian tracking in unstructured environments that aims to identify pedestrians close to vegetation by using split and merge strategies on 3D clusters.
While the algorithms separate pedestrians from other structures, a great number of newly created candidates affect precision and runtime.
The proposed tracking approach performs well in all test environments, especially when regarding the low recall values of the SVM in cluttered environments.
The SVM is unable to deal with a divergent update frequency of the Velodyne HDL-64E, thus lowering precision and recall on the publicly available datasets and creating a further challenge for the tracking system which proved reliable under these complicated circumstances.