논문명 | SECOND:Sparsely Embedded Convolutional Detection |
---|---|
저자(소속) | Yan Yan(=traveller59), Bo Li() |
학회/년도 | 2018, 논문 |
Citation ID / 키워드 | |
데이터셋(센서)/모델 | KITTI-3D Object Detection |
관련연구 | Saprse ConvNet 아이디어 활용 |
참고 | |
코드 | 깃허브 |
년도 | 1st 저자 | 논문명 | 코드 |
---|---|---|---|
2014 | Benjamin Graham | Spatially-sparse convolutional neural networks | 2013-2015 |
2017 | Benjamin Graham | Submanifold Sparse Convolutional Networks | SparseConvNet |
2018 | Benjamin Graham | 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks | |
2018 | Bo Li | SECOND:Sparsely Embedded Convolutional Detection | SECOND |
2019 | Alex H. Lang | PointPillars: Fast Encoders for Object Detection from Point Clouds | PointPillars |
Lidar는 여러 분야에 중요 하다. LiDAR-based or RGB-D-based object detection is used in numerous applications, ranging from autonomous driving to robot vision.
Voxel-based 3D convolutional networks는 Lidar데이터 에서 의미 있는 정보 수집시 사용 되어 왔다. Voxel-based 3D convolutional networks have been used for some time to enhance the retention of information when processing point cloud LiDAR data.
문제는 느린 성능과, 자세(=Orientation)추정 성능이다. However, problems remain, including a slow inference speed and low orientation estimation performance.
학습/테스트 속도를 올릴수 있는 sparse conv네트워크를 조사 하였다. We therefore investigate an improved sparse convolution method for such networks, which significantly increases the speed of both training and inference.
또한 새로운 자세 추정을 위한 angle LOSS와 데이터 증폭 기법을 제안 한다. We also introduce a new form of angle loss regression to improve the orientation estimation performance and a new data augmentation approach that can enhance the convergence speed and performance.
제안 기법은 빠른 성능을 보이면서 좋은 결과를 나타냈다. The proposed network produces state-of-the-art results on the KITTI 3D object detection benchmarks while maintaining a fast inference speed.
State-of-the-art methods can achieve
최근 3D Detector 연구 트랜트는 이미지와 클라우드 데이터를 퓨전하여 많이 사용한다. : Many current 3D detectors use a fusion method that exploits both images and point cloud data.
[8] Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3D object detection network for autonomous driving. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; Volume 1, p. 3. [Google Scholar]
[9] Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S. Joint 3D Proposal Generation and Object Detection from View Aggregation. arXiv, 2017; arXiv:1712.02294. [Google Scholar]
[10] Du, X.; Ang Jr, M.H.; Karaman, S.; Rus, D. A general pipeline for 3D detection of vehicles. arXiv, 2018; arXiv:1803.00387. [Google Scholar]
In [11],
[11] Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum PointNets for 3D Object Detection from RGB-D Data. arXiv, 2017; arXiv:1711.08488. [Google Scholar]
In other methods, such as those of [12,13,14,15],
[12] Wang, D.Z.; Posner, I. Voting for Voting in Online Point Cloud Object Detection. In Proceedings of the Robotics: Science and Systems, Rome, Italy, 13–17 July 2015; Volume 1. [Google Scholar]
[13] Engelcke, M.; Rao, D.; Wang, D.Z.; Tong, C.H.; Posner, I. Vote3deep: Fast object detection in 3D point clouds using efficient convolutional neural networks. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 1355–1361. [Google Scholar]
[14] Zhou, Y.; Tuzel, O. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. arXiv, 2017; arXiv:1711.06396. [Google Scholar]
[15] Li, B. 3D fully convolutional network for vehicle detection in point cloud. In Proceedings of the IEEE 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 1513–1518. [Google Scholar]
Recently, a new approach called VoxelNet [14] has been developed.
At present, this is a state-of-the-art approach. However, its computational cost makes it difficult to use for real-time applications.
본 논문에서 제안 하는 SECOND를 이용하여 이런 문제를 해결 하고자 한다. In this paper, we present a novel approach called SECOND (Sparsely Embedded CONvolutional Detection), which addresses these challenges in 3D convolution-based detection by maximizing the use of the rich 3D information present in point cloud data.
This method incorporates several improvements to the existing convolutional network architecture.
Spatially sparse convolutional networks are introduced for LiDAR-based detection and are used to extract information from the z-axis before the 3D data are downsampled to something akin to 2D image data.
제안 하는 Sparse conv네트워크는 기존의 Dense conv네트워크에 비해서 속도가 빠르다. In comparison to a dense convolution network, our sparse-convolution-based detector achieves a factor-of-4 speed enhancement during training on the KITTI dataset and a factor-of-3 improvement in the speed of inference.
As a further test, we have designed a small model for real-time detection that has a run time of approximately 0.025 s on a GTX 1080 Ti GPU, with only a slight loss of performance.
제출 시점에 큰 모델로는 20fps, 작은 모델은 40fps 속도로 KITI 3D Detection을 수행 ㅏ였다.
Another advantage of using point cloud data is that it is very easy to scale, rotate and move objects by applying direct transformations to specified points on those objects.
SECOND incorporates a novel form of data augmentation based on this capability.
A ground-truth database is generated that contains the attributes of objects and the associated point cloud data.
Objects sampled from this database are then introduced into the point clouds during training.
This approach can greatly increase the convergence speed and the final performance of our network.
In addition to the above, we also introduce a novel angle loss regression approach to solve the problem of the large loss generated when the difference in orientation between the ground truth and the prediction is equal to π, which yields a bounding box identical to the true bounding box.
The performance of this angle regression approach surpasses that of any current method we know about, including the orientation vector regression function available in AVOD [9].
We also introduce an auxiliary direction classifier to recognize the directions of objects.
The key contributions of our work are as follows:
Below, we briefly review existing works on 3D object detection based on point cloud data and images.
Methods using 2D representations of RGB-D data can be divided into two classes:
In typical image-based methods [5], 2D bounding boxes, class semantics and instance semantics are generated first, and then hand-crafted approaches are used to generate feature maps.
Another method [17] uses a CNN to estimate 3D bounding boxes from images and a specially designed discrete-continuous CNN to estimate the orientations of objects.
Methods using LiDAR [18] involve the conversion of point clouds into front-view 2D maps and the application of 2D detectors to localize objects in the front-view images.
These methods have been shown to perform poorly for both BEV detection and 3D detection compared to other methods.
MV3D [8] is the first method to convert point cloud data into a BEV representation.
ComplexYOLO [19]
In [21],
위 방법들의 단점은 BEV를 생성하면서 버려지는 정보(point)들이 많다. A key problem with all of these approaches, however, is that many data points are dropped when generating a BEV map, resulting in a considerable loss of information on the vertical axis.
This information loss severely impacts the performance of these methods in 3D bounding box regression.
Most 3D-based methods
In [12],
Ref. [13]
위 방법들은 수작업 특징정보를 이용하였다. 특정 데이터셋에 기반한 결과는 좋았지만 자율주행 같은 복잡한 환경에는 적합 하지 않았다. These methods use hand-crafted features, and while they yield satisfactory results on specific datasets, they cannot adapt to the complex environments commonly encountered in autonomous driving.
In a distinct approach,
위 방법들은 직접 포인트클라우들 다루기에 많은 수의 포인트 들을 처리 하기 어렵다. 그래서 이미지 기반 탐지 작업후 원본 포인트들을 한번 걸러낸다. These methods directly process point cloud data to perform 1D convolution on k-neighborhood points, but they cannot be applied to a large number of points; thus, image detection results are needed to filter the original data points.
일부 CNN기반 기법들은 포인트 클라우드를 Voxel로 변환하여 처리 한다. Some CNN-based detectors convert point cloud data into voxels.
위 방법들은 연산 부하가 크다. The major problem with these methods is the high computational cost of 3D CNNs. Unfortunately, the computational complexity of a 3D CNN grows cubically with the voxel resolution.
일부 제안은 sparse convolution이 제안 되었다. In [25,26], a spatially sparse convolution is designed that increases the 3D convolution speed,
however, there is no known method that uses sparse convolution for the detection task.
위 여러 방법들과 같이 제안 방식도 3D conv구조를 지닌다. 하지만 몇가지 개선 사항을 가지고 있다. Similar to all of these approaches, our method makes use of a 3D convolutional architecture, but it incorporates several novel improvements.
일부 기법은 카메라 이미지와 포이트 클라우드를 합쳤다. Some methods combine camera images with point clouds.
For instance, the authors of [29]
In the method presented in [8],
The authors of [9]
In [11],
위 퓨전 기반 식들은 많은 이미지 파일을 처리 해야 하기 때문에 대부분 느리다. However, fusion-based methods typically run slowly because they need to process a significant amount of image input.
또한 카메라와 Lidar간의 시간 동기화 문제도 고려 해야 한다. The additional requirement of a time-synchronized and calibrated camera with LiDAR capabilities restricts the environments in which such methods can be used and reduces their robustness.
반대로 우리의 제안 방식은 Lidar데이터만 가지고도 좋은 성과를 보인다. Our method, by contrast, can achieve state-of-the-art performance using only LiDAR data.
# host PC에서
$ cd /workspace
$ git clone https://github.com/traveller59/second.pytorch.git
$ docker pull adioshun/second:latest
## KITTI 데이터 위치 : /media/adioshun/data/datasets
## code 위치 : /workspace/second.pytorch
$ docker run --runtime=nvidia -it --privileged --network=host -v /tmp/.X11-unix:/tmp/.X11-unix --volume="$HOME/.Xauthority:/root/.Xauthority:rw" -e DISPLAY -v /media/adioshun/data/datasets:/datasets --volume /workspace:/workspace --name 'second' adioshun/second:latest /bin/bash
$ docker start
$ docker exec -it second bash
# 도커 상에서
$ cd /workspace/second.pytorch/second
Create kitti infos:`python create_data.py create_kitti_info_file --data_path=/datasets`
Create reduced point cloud: `python create_data.py create_reduced_point_cloud --data_path=/datasets`
Create groundtruth-database infos:`python create_data.py create_groundtruth_database --data_path=/datasets`
TRAIN : `python ./pytorch/train.py train --config_path=./configs/car.fhd.config --model_dir=/path/to/model_dir`
Evaluate : `python ./pytorch/train.py evaluate --config_path=./configs/car.fhd.config --model_dir=/path/to/model_dir --measure_time=True --batch_size=1`
└── KITTI_DATASET_ROOT (eg. /datasets/
├── training <-- 7481 train data
| ├── image_2 <-- for visualization
| ├── calib
| ├── label_2
| ├── velodyne
| └── velodyne_reduced <-- empty directory
└── testing <-- 7580 test data
├── image_2 <-- for visualization
├── calib
├── velodyne
└── velodyne_reduced <-- empty directory
git clone https://github.com/nutonomy/second.pytorch.git
It is recommend to use the Anaconda package manager.
First, use Anaconda to configure as many packages as possible.
conda create -n pointpillars python=3.7 anaconda
source activate pointpillars
conda install shapely pybind11 protobuf scikit-image numba pillow
conda install pytorch torchvision -c pytorch
conda install google-sparsehash -c bioconda
Then use pip for the packages missing from Anaconda.
pip install --upgrade pip
pip install fire tensorboardX
Finally, install SparseConvNet. This is not required for PointPillars, but the general SECOND code base expects this to be correctly configured.
git clone [email protected]:facebookresearch/SparseConvNet.git
cd SparseConvNet/
bash build.sh
# NOTE: if bash build.sh fails, try bash develop.sh instead
Additionally, you may need to install Boost geometry:
sudo apt-get install libboost-all-dev
You need to add following environment variables for numba to ~/.bashrc:
export NUMBAPRO_CUDA_DRIVER=/usr/lib/x86_64-linux-gnu/libcuda.so
export NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so
export NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice
Add second.pytorch/ to your PYTHONPATH.
wget https://github.com/Kitware/CMake/releases/download/v3.13.3/cmake-3.13.3.tar.gz
./bootstrap
make
make install
sudo apt-get install libboost-all-dev
cd ~
#pip3 install shapely fire pybind11 tensorboardX protobuf scikit-image numba pillow numba
conda install -c conda-forge shapely fire pybind11 tensorboardX protobuf scikit-image numba pillow numba
#pip3 install torch torchvision
conda install -c anaconda pytorch-gpu
git clone https://github.com/traveller59/spconv.git --recursive
cd spconv
python3 setup.py bdist_wheel
cd ./dist
pip3 install spconv-1.0-cp36-cp36m-linux_x86_64.whl
# Setup cuda for numba
vi ~/.bashrc
export NUMBAPRO_CUDA_DRIVER=/usr/lib/x86_64-linux-gnu/libcuda.so
export NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so
export NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice
export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
export CUDA_HOME=/usr/local/cuda
export PYTHONPATH=$PYTHONPATH:/workspace/second.pytorch
source ~/.bashrc
cd /workspace
git clone https://github.com/traveller59/second.pytorch.git
--
/usr/lib/x86_64-linux-gnu/libcuda.so: file too short
-> libcuda.so의 심볼릭 링크 확인 후 재 설
pybind11/detail/common.h:112:20: fatal error: Python.h: No such file or directory
-> apt install python3.6-dev