논문명 | Tracking People with a 360-Degree Lidar |
---|---|
저자(소속) | John Shackleton() |
학회/년도 | 2010, 논문 |
Citation ID / 키워드 | |
데이터셋(센서)/모델 | |
관련연구 | |
참고 | |
코드 |
절차
챌린지 three failure modes became the most apparent:
다음장에서 각 챌리지에 대하여 다룬다.
Our approach takes advantage of two characteristics of lidar data.
unique signature
formed by the curvature of its surface.accumulated
from a previous frame can facilitate and validate target detection in subsequent frames.Our implementation settles upon a surface matching technique called spin images
,
정의 : collects a set of contour descriptions local to each vertex on an object’s surface
스핀이미지가 3D 데이터 처리에 좋은 이유 Spin images are a good fit for 3D real-time point clouds because
they handle occlusions very well, since they reduce the surface representation to a set of localized patches.
Spin images are also relatively efficient, especially compared to other surface matching techniques developed for applications with static scenes that are not real-time constrained.
본 논문에서는 surface matching 를 Tracking을 위해 사용 하였다. We utilize surface matching for lidar tracking as follows.
절차
공간 정보를 기반으로 그룹핑 A group of points is initially segmented based on spatial separation using 3D occupancy grids over the scene.
간단한 규칙 기반으로 후보군(사람) 구분 Point groups are considered candidate human targets based on simple constraints of dimensions and orientation.
후보군들에 대해 스핀 이미지 생성 For potential human targets, a spin image map is constructed.
다음 프레임 생성시 각 후보군의 이전 스핀 이미지와 새로 생성된 스핀 이미지를 비교 한다. As consecutive frames are generated, spin image maps of new potential targets are compared to the spin image maps of previously identified targets, using the correlation formula described in [6].
상관관계 점수가 좋으면 Match된것으로 간주 한다. When a new track has a correlation score that compares well enough with a prior target, then a match is declared.
When a match is detected the spin image map of the target is updated with the latest surface descriptions of the track, and the target is tracked for another frame.
Tracks that do not match an existing target, and meet the dimensional constraints of a human, are identified as new targets.
We could also use spin images to derive a set of templates representing various human shapes, which in turn can be used to classify new targets.
For this initial work, however, we defer template-matching classification and use simple dimensional characteristics to identify new targets.
스핀 이미지를 수백개의 타겟과 비교 하는것은 비 효율적인 작업이다. It would be inefficient to perform spin image comparisons against all existing targets in a scene, potentially hundreds of targets.
따라서 EKF를 이용하여 타겟의 위치를 에측 하였다. Consequently, we employ an extended Kalman filter (EKF) to estimate the position of a target from one frame to the next [7].
이후 근처에 있는 대상에 대하여서만 수행 한다. Surface matching is then limited to the comparison of potential targets of the current frame against the spin image signatures near known tracks.
스핀이미지의 상관관계 점수 는 예측된 거리 정보를 기반으로 아래 공식으로 보정될수 있다. The spin image correlation score for each comparison is thus refined to include the distance of the current track position from the estimated position of the established target, as shown in Eq. (1)
자세한 식 및 설명은 논문 참고
스핀 이미지 알고리즘 사용시 각 point들은 동일한 간격을 두고 있어야 하는 가정 사항이 있다.A key assumption is that adjacent 3D points on an object’s surface are approximately equidistant from each other.
하지만 Lidar의 특성상 이 가정사항을 만족 할수 없다.
해결책 : Instead, we compensate by
[9] Hoppe, H. “Surface Reconstruction from Unorganized Points”. PhD Thesis, Dept. of Computer Science and Engineering, University of Washington, June 1994.
에러의 대부분은 배경 처리때문이다. A high number of tracking errors and false-positives with 360-degree lidars are related to the interaction of the people and the background scene.
예를 들어, For example,
people who stand next to the wall are often mistaken for the wall.
Another key observation is that people cast lidar shadows, a transient, moving patch devoid of points in the scene created when a moving object passes in front of a fixed object (Figure 3).
해결책은 고정된 물체를 비인간, 지면, 벽, 가구로 인식하고 제거 하는것이다. Both of these problems are addressed by eliminating from each frame the persistent(끈질) static objects that are known to be non-human, such as the ground, walls, and furniture.
In 2D image processing, this technique is known as background subtraction.
본 논문에서는 separate occupancy grid를 static objects로 populating함으로써 배경을 제거 하였다. We perform 3D background subtraction by populating a separate occupancy grid with static objects.
The cells of the occupancy grid form our background mask.
각 셀은 최근 occupancy를 기록하고 있다. Each 3D cell also includes a history buffer that records its recent occupancy, represented as a string of bits (eight bits in our implementation).
절차
To determine which cells represent the static background, each cell in the scene’s background mask is examined.
A cell is marked as static background if a majority of its bits are set.
A persistently empty space will have all bits in its cell mask unset.
After sampling a minimum number of periodic frames, the learned background mask is applied continuously as a filter for future frames.
A point in subsequent frames is discarded if it occupies a background cell.
이 방법을 통해 90%까지 입력 데이터를 줄일수 있다.
In practice, this approach reduces the lidar input data by as much as 90% (Figure 4).
This approach yields other benefits as well.
붙어 있는 사람을 분석하는것은 어려운 문제 이다.
표면 매칭 방식이 휴먼 타겟 조각들의 모호성을 없애주기는 하지만, 이 작업을 위해서는 선행적으로 두 물체가 적절하게 세그멘테이션 되어 있어야 한다. While surface matching can disambiguate patches of human targets, reliable surface matching comparisons within real-time lidar data cannot take place until after the human targets are first properly segmented into separate objects.
For example, if multiple tracks in close proximity are incorrectly segmented into a single object, such as two people shaking hands (Figure 5), the resulting surface description does not contain sufficient detail to identify the separate tracks within the same.
Moreover, sampling of correlation points of a multi-track object will likely include an unordered mixture of points from each of the tracks, and therefore will not capture a surface description that separates the multiple tracks properly.
Therefore, false negatives during tracking will occur.
따라서 분리된 오브젝트는 최대 한개의 트랙만 가지고 있어야 한다. Thus, a segmented object should contain at most one track.
오브젝트에 한개 이상의 트랙이 발생하는데에는 아래의 3가지 이유일수 있다. Three kinds of events can lead to an object with more than one track:
이후 이 3가지 시나리오에 대하여 두가지 군집화 알고리즘(K-mean, DBSCAN)을 이용하여 해결
Clustering is a common technique to segment 3D points in space with many algorithms and variations to choose from [10].
[10] Xu, R., Wunsch, D. “Survey of Clustering Algorithms”. IEEE Transactions on Neural Networks, Vol. 16, No. 3. pp 645-678. May 2005.
본 논문에서는 2개의 군집화 알고리즘을 적용 하였다. Our solution applies two common clustering algorithms, to objects of a certain minimum size that may contain more than one track.
두 알고리즘 적용을 위해서 3D 포인트는 X-Y좌표(Top-Down View)로 클러스터링 되었다. For both clustering algorithms, the 3D points of the larger object are clustered in the x-y plane, i.e., the top-down view.
위 3가지 발생 가능한 시나리오 중에서 1번째
For the first case, K-means is used when known targets come together into a single larger object.
The algorithm is seeded with the number of expected clusters (tracks) and an initial guess for the centroid point of each cluster, which are the estimated x-y midpoints generated by the Kalman filter.
The K-means clustering iterates recursively over the points in the object until convergence is reached and all the points are assigned to a new cluster subdivided from the original object.
The new (smaller) objects are then treated as any other potential track in the system, ready for classification.
Clusters that do not have enough points for surface matching are discarded.
위 3가지 발생 가능한 시나리오 중에서 2,3번째
The other two scenarios are less predictable, because we have less confidence as to the number of tracks bundled into the larger object.
Thus, we switch to DBSCAN clustering when the number of human tracks within an object is unknown.
We tried an approach that exclusively used DBSCAN, but it was not reliable enough.
For example, if two people are shoulder to shoulder, the DBSCAN approach often clusters the two tracks into a single object, while K-means is able to identify the separate tracks when their existence is established in prior frames.
Certainly many other clustering algorithms are suitable for segmenting the point cloud, each perhaps optimal in specific situations.
an extended Kalman filter estimates the frame-by-frame position of each person
[7] Ramachandra, K.V. Kalman Filtering Techniques for Radar Tracking. CRC, 2000.
3D 형태 기술자, Object recognition에 활용 가능
절차
참고 : Spin-Images: A Representation for 3-D Surface Matching, Spin Images, [논문_2012] 각 분할 스핀영상을 사용한 3차원 얼굴 특징점 검출 방법,