논문명/저자/소속	YOLO9000: Better, Faster, Stronger
저자(소속)	Joseph Redmon, Ali Farhadi ()
학회/년도	CVPR 2017, 논문
키워드	Joseph2016b
데이터셋/모델	PASCAL VOC 2007, 2012, COCO 2015
참고	홈페이지v2, 번역, TFKR_PR, Youtube 정리 , 모연
코드	Darknet, pyTorch#1, pyTorch#2

YOLO2

1. 개요

YOLOv2는 네트워크의 크기를 조절하여 FPS(Frames Per Second)와 MaP(Mean Average Precision) 를 균형 있게 조절할 수 있다.

성능과 속도를 모두 개선시켜서 SSD(Single Shot MultiBox Detector) 보다 뛰어나다.

2. 특징(YOLO 1 VS. YOLO2 )

Batch Normalization - 모든 컨볼루션 레이어에 배치 정규화를 추가
High Resolution Classifier - ImageNet 데이터로 classfication network를 먼저 학습시켜서 고해상도 이미지에도 잘 동작하게 함
Convolutional - FCL(Fully Conneted Layer)를 Convolution Layer로 대체
Anchor Boxes - 경계 박스를 처음부터 직접 예측 -> 앵커 박스를 초기값으로 사용하여 예측
new network - Darknet-19 를 특징 추출기로 사용
Dimension Clusters - 실제 경계 박스들을 클러스터링하여 최적의 앵커박스를 찾음
Direct location prediction - 경계 박스의 위치x,y 는 직접 예측(이전 버전과 동일)
passthrough - 26x26 크기의 중간 특징맵을 skip 하여 13x13레이어에 붙임(concatenate)
Multi-Scale Training - 학습데이터의 크기를 320x320, 352x352, ..., 608x608 로 resize 하면서 다양한 스케일로 학습시킴
Fine-Grained Features - 최종 특징맵의 크기를 7x7에서 13x13으로 키움

[출처] YOLOv2 와 YOLO9000|작성자 sogangori

2.1 Batch Normalization

http://blog.naver.com/PostView.nhn?blogId=sogangori&logNo=221011203855&parentCategoryNo=&categoryNo=6&viewDate=&isShowPopularPosts=false&from=postView

2.2 High Resolution Classifier

3. 구조

YOLO v1,2 의 최종 특징맵 차이

n_boxes = 5 n_classes = 20 YOLO : 7 x 7 x (5 + 5 + n_classes)
YOLOv2 : 13 x 13 x (5 x (5 + n_classes))

그래서 최종 출력은 13x13 크기의 125 채널 특징맵이 된다. (13x13x(5x(5+20)) = 13x13x5x25 = 13 x 13 x 125

YOLO 9000

Better (YOLOv2)
- Batch normalization
- High resolution classifier
- Convolution with anchor boxes
- Dimension clusters
- Direct location prediction
- Fine-grained features
- Multi-scale training
Faster (YOLOv2)
- Darknet-19
- Training for classification
- Training for detection
Stronger (YOLO9000 - 9000개 클래스)
- Hierarchical classification
- Dataset combination with Word-tree
- Joint classification and detection

Dataset combination with WordTree