논문명 |
Detect to Track and Track to Detect |
저자(소속) |
() |
학회/년도 |
ICCV 2017, 논문 |
Citation ID / 키워드 |
|
데이터셋(센서)/모델 |
|
관련연구 |
|
참고 |
홈페이지, Youtube |
코드 |
깃허브 |
Detect to Track and Track to Detect
1. Introduction
기존의 tracking by detection
방식은 새로운 패러다임이지만 frame-level detection 방식을 장악 하진 못하였다.
In the case of object detection and tracking in videos,
recent approaches have mostly used detection as a first step, followed by post-processing methods such as applying a tracker to propagate detection scores over time.
Such variations on the ‘tracking by detection’ paradigm have seen impressive progress but are dominated by frame-level detection methods
ImageNet video object detection challenge (VID)와 ImageNet object detection (DET) challenge 차이
- (i) size: the sheer number of frames that video provides (VID has around 1.3M images, compared to around 400K in DET or 100K in COCO [22]),
- (ii) motion blur: due to rapid camera or object motion,
- (iii) quality: internet video clips are typically of lower quality than static photos,
- (iv) partial occlusion: due to change in objects/viewer positioning, and
- (v) pose: unconventional object-to-camera poses are frequently seen in video
이러한 문제점을 해결 하기 위해서 VID에서 좋은 성적을 보이는 알고리즘은 frame-level detectors 이전에 exhaustive post-processing 을 수행 한다.
- For example,: the winner [17] of ILSVRC’15 uses two multi-stage Faster R-CNN [31] detection frameworks, context suppression, multi-scale training/testing, a ConvNet tracker [39], optical-flow based score propagation and model ensembles.