부스트캠프 AI Tech 1기 [T1209 최보미]/U stage

Day33 학습정리 - CV3

B1001101 2021. 3. 10. 23:56

강의복습

1. Object detection

더보기

1) Object detection

  • Sementic segmentation: 클래스만 구분 가능
    Instane segmentation, Panopti segmentation: 개체 각각 구분 가능
  • object detection: Classification + Box localization
  • 예시: Autonomous driving, Optical Character Recognition(OCR)

2) Two-stage detector(R-CNN family)

  • Gradient-based detector (e.g. HOG)
  • Selective search(Box proposal)
  • R-CNN: Regions with CNN features, Directly leverage image classification networks for object detection
  • Fast R-CNN: Recycle a pre-computed feature for multiple object detection, 
  • Faster R-CNN: End-to-end object detection by neural region proposal
    • IoU (Intersection over Union): A metric commonly used in object detection
    • Anchor box
    • Region Proposal Network(RPN)
    • Non-Maximum Suppression (NMS)
      • Step 1: Select the box with the highest objectiveness score
      • Step 2: Compare IoU of this box with other boxes
      • Step 3: Remove the bounding boxes with IoU ≥ 50%
      • Step 4: Move to the next highest objectiveness score
      • Step 5: Repeat steps 2-4
    • 참고: curt-park.github.io/2017-03-17/faster-rcnn/
  • Summary of the R-CNN family

3) Single-stage detector

  • YOLO (You only look once)
  • Single Shot MultiBox Detector(SSD)

4) Single-stage detector vs two-stage detector

  • Focal loss: Class imbalance (negative >> positive) 문제 해결
    • 분류하기 쉬운 easy negative들이 대부분인데 이들이 학습에 기여하는 것이 거의 없기 때문에 비효율적
  • RetinaNet: one-stage network (Feature Pyramid Networks(FPN) + class/box prediction branches)

5) Detection with Transformer (DETR)

  • Object query: Learned positional encodings for querying

6) Further reading

  • Detecting objects as points
CornerNet CenterNet

2. CNN visualization

더보기

1) Visualizing CNN

  • ZFNet
  • Filter visualization

2) Analysis of model behaviors

  • Embedding feature analysis
    • Nearest neighbors in a feature space
    • Dimensionality reduction
  • Activation investigation
    • Layer activation
    • Maximally activating patches
      1. Pick a channel in an certain layer
      2. Feed a chunk of images and record each activation value (of the chosen channel)
      3. Crop image patches around maximum activation values
    • Class visualization - Gradient ascent
      1. Get a predicton score (of a target class) of a dummy image (blank or random initial)
      2. Backpropagate the gradient maximizing the target class score w.r.t the input image
      3. Update the current image
Layer activation Maximally activating patches Class visualization

 

3) Model decision explanation

  • Saliency test
    • Occlusion map
    • via Backpropagation
      1. Get a class score of the target source image
      2. Backpropagate the gradient of the class score w.r.t input domain
      3. Visualize the obtained gradient magnitude map (optional)
occlusion map via Backpropagation

 

  • Backpropagate features
    • Rectified unit (backward pass)
    • Guided backpropagation
    • Class activation mapping(CAM): Global average pooling(GAP) layer instead of the FC layer
    • Grad-CAM
    • SCOUTER
Rectified unit Guided backpropagation
CAM Grad-CAM
SCOUTER

 


코멘트

오늘 강의 내용은 흥미롭긴 했지만 어려웠다. 직접 해보지 않는 이상 잘 안 와닿을 것 같다.