| 논문명 | Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age |
|---|---|
| 저자(소속) | Cesar Cadena( ETH Zurich), L. Carlone(MIT) ) |
| 학회/년도 | arXiv Jun 2016~Jan 2017, 논문 |
| Citation ID / 키워드 | |
| 데이터셋(센서)/모델 | |
| 관련연구 | |
| 참고 | 홈페이지 |
| 코드 |
- robustness and scalability in long-term mapping,
- metric and semantic representations for mapping,
- theoretical performance guarantees,
- active SLAM and exploration,
- and other new frontiers.
(포지션, 방향)나 기타 정보(속도, 오차, 파라미터)로 표현 할수 있다. : the robot state is described byThe map is a representation of aspects of interest (e.g.,position of landmarks, obstacles) describing the environment in which the robot operates.지도가 필요한 두가지 이유 The need to use a map of the environment is two fold.
First, the map is often required to support other tasks;Second, the map allows limiting the error committed in estimating the state of the robot.In the absence of a map, dead-reckoning would quickly drift over time;on the other hand, using a map, (e.g.,a set of distinguishable landmarks), the robot can “reset” its localization error by re-visiting known areas (so-called loop closure).따라서 SLMA은 지도가 없어 생성해야 하는 서비스에서도 활용이 가능하다. Therefore, SLAM finds applications in all scenarios in which a prior map is not available and needs to be built.
In some robotics applications the location of a set of landmarks is known a priori.For instance, a robot operating on a factory floor can be provided with a manually-built map of artificial beacons in the environment.Another example is the case in which the robot has access to GPS (the GPS satellites can be considered as moving beacons at known locations).In such scenarios, SLAM may not be required if localization can be done reliably with respect to the known landmarks.The popularity of the SLAM problem is connected with the emergence of indoor applications of mobile robotics.
20년간의 SLAM 문제 연구에 대한 추천 서베이 논문 : [7, 69]A thorough historical review of the first 20 years of the SLAM problem is given by Durrant-Whyte and Bailey in two surveys [7, 69].
These mainly cover what we call the classical age (1986-2004);Two other excellent references describing the three main SLAM formulations of the classical age are The subsequent period is what we call the algorithmic-analysis age (2004-2015), and is partially covered by Dissanayake etal. in [64].
We review the main SLAM surveys to date in Table I,observing that most recent surveys only cover specific aspects or sub-fields of SLAM.The popularity of SLAM in the last 30years is not surprising if one thinks about the manifold aspects that SLAM involves.최근 논문 들은 SLAM에 대한 오버뷰와 연구 방향을 제시 하고 있다. The present paper gives a broad overview of the current state of SLAM, and offers the perspective of part of the community on the open problems and future directions for the SLAM research.
우리의 주 관심은 metric과 semantic SLAM이다. 추천할 만한 서베이논문은 [160]이다. Our main focus is on metric and semantic SLAM, and we refer the reader to the recent survey by Lowry etal. [160],
which provides a comprehensive review of vision based place recognition and topological SLAM.[160] S. Lowry, N. Sunderhauf, P. Newman, J. J. Leonard, D. Cox, P. Corke, and M. J. Milford. Visual Place Recognition: A Survey. IEEE Transactions on Robotics (TRO), 32(1):1–19, 2016.
Before delving into the paper, we first discuss two questions that often animate discussions during robotics conferences:We will revisit these questions at the end of the manuscript.첫번째 질문에 답변하기 위해서는 무엇이 SLAM을 Unique하게 만드는지 이해 해야 한다. Answering the question “Do autonomous robots really need SLAM?” requires understanding what makes SLAM unique.
슬램의 목표는 글로벌하고 일관된 representation을 생성 하는것이다. SLAM aims at building a globally consistent representation of the environment,
키워드는 loop closure이다. The keyword here is “loop closure”:
if we sacrifice loop closures, SLAM reduces to odometry.초창기에는 odometry는 바퀴 인코더를 통해서 얻었다. In early applications, odometry was obtained by integrating wheel encoders.
바퀴 odometry 로 예측한 Pose는 쉽게 drifts되어 몇미터만 이동하여도 쓸모없게 된다. The pose estimate obtained from wheel odometry quickly drifts, making the estimate unusable after few meters[128, Ch. 6];
this was one of the main thrusts behind the development of SLAM: the observation of external landmarks is useful to reduce the trajectory drift and possibly correct it [185].However, more recent odometry algorithms are based on visual and inertial information, and have very small drift(< 0.5% of the trajectory length [82]).따라서, 첫번째 질문은 다음과 같이 3가지로 legitimate된다. Hence the question becomes legitimate: do we really need SLAM? Our answer is three-fold.
최근 10년간의 슬램 연구들은 visual-inertial odometry알고리즘을 itself produced해다. First of all, we observe that the SLAM research done over the last decade has itself produced the visual-inertial odometry algorithms that currently represent the state of the art, e.g., [163, 175];
이런 관점에서 VIN은 SLAM이다. in this sense Visual-Inertial Navigation(VIN) is SLAM:
VIN can be considered a reduced SLAM system, in which the loop closure (or place recognition) module is disabled.SLAM은 센서 퓨전에 대한 연구를 이끌어 왔다. More generally, SLAM has directly led to the study of sensor fusion under more challenging setups (i.e., no GPS, low quality sensors) than previously considered in other literature (e.g., inertial navigation in aerospace engineering).
The second answer regards the true topology of the environment.
[Fig. 1: Left:]
- map built from odometry.
- The map is homotopic to a long corridor that goes from the starting position A to the final position B.
- Points that are close in reality (e.g., B and C) may be arbitrarily far in the odometric map.
[Fig. 1: Right:]
- map build from SLAM.
- By leveraging loop closures, SLAM estimates the actual topology of the environment, and “discovers” shortcuts in the map.
A robot performing odometry and neglecting loop closures interprets the world as an “infinite corridor” (Fig. 1-left) in which the robot keeps exploring new areas indefinitely.-LC는 로봇에게 corridor가 겹치는걸 알려 준다. A loop closure event informs the robot that this “corridor”keeps intersecting itself (Fig. 1-right).
LC의 장점은 다음과 같다. The advantage of loopclosure now becomes clear: by finding loop closures,
the robot understands the real topology of the environment,and is able to find shortcuts between locations (e.g., point Band C in the map).따라서 정확한 토폴로지를 아는게 SLAM의 장점 이라면 왜 metric를 drop하고 단순히 place recognition만 하지 않는 것인가? Therefore, if getting the right topology of the environment is one of the merits of SLAM, why not simply drop the metric information and just do place recognition?
The answer is simple: the metric information makes place recognition much simpler and more robust;the metric reconstruction informs the robot about loop closure opportunities and allows discarding spurious loop closures [150].Therefore, while SLAM might be redundant in principle (an oracle place recognition module would suffice for topological mapping), SLAM offers a natural defense against wrong data association and perceptual aliasing, where similarly looking scenes, corresponding to distinct locations in the environment,would deceive place recognition.
In this sense, the SLAM map provides a way to predict and validate future measurements:we believe that this mechanism is key to robust operation
세번째 대답은 SLAM은 globally consistent map을 필요로 하는 많은 서비스에 필요로 한다. The third answer is that SLAM is needed for many applications that, either implicitly or explicitly, do require a globally consistent map.
예를 들어 대부분의 로봇의 목표는 주변을 돌아 다니면서 지도를 사용자에게 전달 하는 것이다. For instance, in many military and civilian applications, the goal of the robot is to explore an environment and report a map to the human operator, ensuring that full coverage of the environment has been obtained.
또 다른 예로는 구조물에 대한 검사를 하는 롯봇이다. 이 경우에 globally consistent 3D reconstruction는 꼭 필요 하다. Another example is the case in which the robot has to perform structural inspection (of a building, bridge, etc.); also in this case a globally consistent 3D reconstruction is a requirement for successful operation.
이 질문은 로보틱스 커뮤니티에서 자주 회자 되는 질문이다. This question of “is SLAM solved?” is often asked within the robotics community, c.f. [87].
이 질문은 특정환경/로봇/성능에 따라 다르므로 답변하기 어려운 주제이다. This question is difficult to answer because SLAM has become such abroad topic that the question is well posed only for a given robot/environment/performance combination.
SLAM기술이 완벽한지 평가 하려면 아래 사항들에 대하여 정의 되어야 한다. In particular,one can evaluate the maturity of the SLAM problem once the following aspects are specified:
예를 들어 아래 상황에 대한 SLAM은 Mature하다고 볼수 있다.
For instance, mapping a 2D indoor environment with a robot equipped with wheel encoders and a laser scanner,with sufficient accuracy (< 10cm) and sufficient robustness(say, low failure rate), can be considered largely solved (an example of industrial system performing SLAM is the Kuka Navigation Solution [145]).Similarly, vision-based SLAM with slowly-moving robots (e.g., Mars rovers [166], domesticrobots [114]), and visual-inertial odometry [94] can be considered mature research fields.On the other hand,Current SLAM algorithms can be easily induced to fail when either the motion of the robot or the environment are too challenging (e.g., fast robot dynamics, highly dynamic environments);본 논문에서는 SLAM연구가 새로운 전기를 맞이 하고 있다고 본다. In this paper, we argue that we are entering in a third era for SLAM,
robust-perception age는 아래의 요구사항을 가진다. the robust-perception age, which is characterized by the following key requirements:
robust performance:
high-level understanding:
resource awareness:
task-driven perception:
Paper organization.
Section II : 공식 및 구조 The paper starts by presenting a standard formulation and architecture for SLAM
Section III : 강건성 tackles robustness in life-long SLAM.
Section IV : 확장성 deals with scalability.
Section V : discusses how to represent the geometry of the environment.
Section VI : extends the question of the environment representation to the modeling of semantic information.
Section VII : 최근 학문적 성과물 오버뷰 provides an overview of the current accomplishments on the theoretical aspects of SLAM.
Section VIII : 문제점 broadens the discussion and reviews the active SLAM problem in which decision making is used to improve the quality of the SLAM results.
Section IX : 최신 트랜드(센서, 딥러닝) provides an overview of recent trends in SLAM, including the use of unconventional sensors and deep learning.
Section X : 정리 provides final remarks.
Throughout the paper, we provide many pointers to related work outside the robotics community.
Despite its unique traits, SLAM is related to problems addressed in computer vision, computer graphics, and control theory, and cross-fertilization among these fields is a necessary condition to enable fast progress.
For the non-expert reader, we recommend to read DurrantWhyte and Bailey’s SLAM tutorials [7, 69] before delving in this position paper.[7] T. Bailey and H. F. Durrant-Whyte. Simultaneous Localisation and Mapping (SLAM): Part II. Robotics and Autonomous Systems (RAS), 13(3):108–117, 2006.
[69] H. F. Durrant-Whyte and T. Bailey. Simultaneous Localisation and Mapping (SLAM): Part I. IEEE Robotics and Automation Magazine, 13(2):99–110, 2006.

[Fig. 2: Front-end and back-end in a typical SLAM system.]
- The back-end can provide feedback to the front-end for loop closure detection and verification.
The front-end abstracts sensor data into models that are amenable for estimation,
The back-end performs inference on the abstracted data produced by the front-end.
SLAM의 공식들은 [161], [101]을 기반을 두고 있다. The current de-facto standard formulation of SLAM has its origins in the seminal paper of Lu and Milios [161], followed by the work of Gutmann and Konolige [101].
위 내용을 기반으로 개선된 방식[63, 81, 100, 125, 192, 241]들이 제안 되었다. Since then, numerous approaches have improved the efficiency and robustness of the optimization underlying the problem[63, 81, 100, 125, 192, 241].
위 모든 방식들은 SLAM을 MAP문제로 보고 있다. 일부는 factor graphs [143] 를 사용한다. All these approaches formulate SLAM as a maximum a posteriori estimation problem, and often use the formalism of factor graphs [143] to reason about the interdependence among variables.
변수 X를 측정한다고 가정해 봅시다. SLAM에서 변수 X는 일반적으로 trajectory of the robot(discrete set of poses)이나 랜드마크의 위치 입니다. Assume that we want to estimate an unknown variable X ; as mentioned before, in SLAM the variable X typically includes the trajectory of the robot (as a discrete set of poses) and the position of landmarks in the environment.
We are given a set of measurements $$Z = {z_k : k = 1, ...,m} $$ such that each measurement can be expressed as a function of X , i.e.$$z_k = h_k(X_k)+\epsilon _k$$,
In MAP estimation, we estimate $$X$$ by computing the assignment of variables $$X\star $$ that attains the maximum of the posterior $$p (X \mid Z)$$ (the belief over $$X$$ given the measurements):
$$ x\star = argmax_x p(X \mid Z) = argmax x P(Z \mid x)p(x)
$$
본장에서는 geometry 를 어떻게 SLAM에 모델링 하는지 알아 보겠다. This section discusses how to model geometry in SLAM.
More formally, a metric representation (or metric map) is a symbolic structure that encodes the geometry of the environment.
알맞은 metric representation를 선택 하는것은 중요 하다. We claim that understanding how to choose a suitable metric representation for SLAM (and extending the set or representations currently used in robotics) will impact many research areas,
첫번째는 : The former models the environment as a sparse set of landmarks,
두번째는 : the latter discretizes the environment in cells and assigns a probability of occupation to each cell.
2D상에서 이러한 representations 표준화에 걸림돌은 IEEE RAS Map Data Representation워킹 그룹이다. The problem of standardization of these representations in the 2D case has been tackled by the IEEE RAS Map Data Representation Working Group,
3D geometry 모델링 문제는 좀더 예민하고, Mappling중에 3D geometry를 효율적으로 모델링 하는가에 대한 문제는 아직 초보적인 수준이다. The question of 3D geometry modeling is more delicate, and the understanding of how to efficiently model 3D geometry during mapping is in its infancy.
본 장에서는 metric representations에 대하여 넓은 관점(로보틱스, 컴퓨터 비젼, CAD, 컴퓨터 그래픽)에서 살펴 보겠다. In this section we review metric representations, taking a broad perspective across robotics,computer vision, computer aided design (CAD), and computer graphics.
사용된 분류는 [80, 209, 221]를 기반으로 한다. Our taxonomy draws inspiration from [80, 209, 221],and includes pointers to more recent work.
Recently, the limitations of purely geometric maps have been recognized and this has spawned a significant and ongoing body of work in semantic mapping of environments,These observations have led to different approaches for semantic mapping which vary in the numbers and types of semantic concepts and means of associating them with different parts of the environments.
[206] A. Pronobis and P. Jensfelt. Large-scale semantic mapping and reasoning with heterogeneous modalities. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 3515–3522. IEEE, 2012.
토폴로지 맵핑은 metric정보를 잃어 버리고 오직 공간인식(place recognition)을 이용하여 그래프를 생성한다. 그래프의 각 노드는 식별가능한 Place이고 그래프의 엣지는 place간의 접근도이다. As mentioned in Section I, topological mapping drops the metric information and only leverages place recognition to build a graph in which the nodes represent distinguishable “places”, while edges denote reachability among places.
토폴로지 맵핑은 원래 시맨틱 맨틱과는 다르다는 것을 알아야 한다. We note that topological mapping is radically different from semantic mapping.
첫번째는 이전에 본 장소에 대한 인식이 필요한 반면, 두번째는 시멘틱 레벨에 따른 공간 분류에 관심을 둔다. While the former requires recognizing a previously seen place (disregarding whether that place is a kitchen, a corridor, etc.), the latter is interested in classifying the place according to semantic labels.
이와 관련된 비젼 기반 토폴로지 SLAM 서베이 논문은 [160]을 참고 하기 바란다. 이 방식의 챌리지들은 3장에 설명되어 있다. A comprehensive survey on vision-based topological SLAM is presented in Lowry et al. [160], and some of its challenges are discussed in Section III.
다음장에서는 시맨틱 맵핑에 대하여 초점을 맞추겠다. In the rest of this section we focus on semantic mapping.
The unlimited number of(, and relationships among,) concepts for humans opens a more philosophical and task-driven decision about the level and organization of the semantic concepts.
The detail and organization depend on the context of what, and where, the robot is supposed to perform a task, and they impact the complexity of the problem at different stages.
Level/Detail of semantic concepts
Organization of semantic concepts:
처음에는 시멘틱 맵핑을 직설적인 접근법으로 수행 하였다. 즉, 고전적인 SLAM방식으로 생성된 Metric Map에 segmenting(??의미 부여를) 하였다. The first robotic researchers working on semantic mapping started by the straightforward approach of segmenting the metric map built by a classical SLAM system into semantic concepts.
초창기 연구는 [176]이다. 이 방식은 2D 레이저 스캐너를 이용하여 geometric map 을 생성하고 associative Markov network으로 식별된 semantic places를 Fuse하였다(오프라인방식).An early work was that of Mozos et al. [176], which builds a geometric map using a 2D laser scan and then fuses the classified semantic places from each robot pose through an associative Markov network in an offline manner.
이와 비슷한 연구는 [148]이다. 이 방식은 RGB-D를 이용하여 3D 지도를 만들고 물체 분류(object classification)를 적용 하였다(오프라인). Similarly, Lai et al. [148] build a 3D map from RGB-D sequences to then carry out an offline object classification.
An online semantic mapping system was later proposed by Pronobis et al. [206], who combine three layers of reasoning (sensory, categorical, and place) to build a semantic map of the environment using laser and camera sensors.최근 연구는 [26]이다. 이 방식은 모션 예측(motion estimation)을 이용하였으며 희박한 시멘틱 세그멘티이션을 서로 다른 물체 탐지 방법을 Interconnect 하여 좋은 성능을 보였다. More recently, Cadena et al. [26] use motion estimation, and interconnect a coarse semantic segmentation with different object detectors to outperform the individual systems.
[201]에서는 monocular SLAM을 이용하여 비디오에서의 물체 인식 성능을 향상 시켰다. Pillaiand Leonard [201] use a monocular SLAM system to boost the performance in the task of object recognition in videos.
시멘틱 맵이후에 시멘틱 정보(semantic classes/objects)를 이용하는 연구가 진행 되었다. Soon after the first semantic maps came out, another trend started by taking advantage of known semantic classes or objects.
기본 아이디어는 지도의 물체를 인지 할수 있게 된다면 이를 이용하여 지도 예측(estimation of that map)의 정확도를 높이는 것이다. The idea is that if we can recognize objects or other elements in a map then we can use our prior knowledge about their geometry to improve the estimation of that map.
이와 관련된 연구들이 아래와 같이 진행 되었다. First attempts were done in
[217]에서는 RGB-D센서의 장점을 이용하여 물체 탐지 기반 SLAM을 제안 하였다. Taking advantage of RGB-D sensors, Salas-Moreno et al. [217] propose a SLAM system based on the detection of known objects in the environment.
컴퓨터 비젼과 로보틱스에 능통한 연구원들은 두 학문을 합쳐서 monocular SLAM와 맵 세그멘테이션을 수행 할수 있을것이라고 생각 하였다. Researchers with expertise in both computer vision and robotics realized that they could perform monocular SLAM and map segmentation within a joint formulation.
The online system of Flint etal. [79] presents a model that leverages the Manhattan world assumption to segment the map in the main planes in indoor scenes.
Bao et al. [9] propose one of the first approaches to jointly estimate camera parameters, scene points and objectlabels using both geometric and semantic attributes in thescene.
In their work, the authors demonstrate the improvedobject recognition performance and robustness, at the cost of arun-time of 20 minutes per image-pair, and the limited numberof object categories makes the approach impractical for on line robot operation.
In the same line, Hane ¨ et al. [102] solvea more specialized class-dependent optimization problem inoutdoors scenarios. Although still offline, Kundu et al. [147]reduce the complexity of the problem by a late fusion of thesemantic segmentation and the metric map, a similar idea wasproposed earlier by Sengupta et al. [219] using stereo cameras.It should be noted that [147] and [219] focus only on themapping part and they do not refine the early computed posesin this late stage.
Recently, a promising online system wasproposed by Vineet et al. [251] using stereo cameras and adense map representation.
[1] The SLAM community has been largely affected by the “curse of manual tuning”, in that satisfactory operation is enabled by expert tuning of the system parameters (e.g., stopping conditions, thresholds for outlier rejection).
Indoor operation rules out the use of GPS to bound the localization error; furthermore, SLAM provides an appealing alternative to user-built maps, showing that robot operation is possible in the absence of an ad hoc localization infrastructure.