Day34 학습정리 - CV4

1. Instance/Panoptic segmentation


1) Instance segmentation

  • Instance segmentation = Semantic segmentation + distinguishing instances
  • Mask R-CNN
    • FasterR-CNN+Mask branch
    • RoIAlign: 소수점 픽셀까지 정교하게 계산
  • YOLACT(You Only Look At CoefficientTs)
    • Prototypes: mask를 합성할 수 있는 재료들
    • 적은 개수의 prototypes의 선형결합으로 다양한 mask 생성
  • YolactEdge: video로 확장

2) Panoptic segmentation

  • Semantic segmentations: Stuff(배경 관련) + Things(배경 제외)
    Panoptic segmentation: Stuff + Instance of Things
  • UPSNet
    • Semantic & Instance head → Panoptic head → Panoptic logits
  • VPSNet: For Video
    1. Align reference features onto the target feature map (Fusion at pixel level)
    2.  Track module associates different object instances (Track at instance level)
    3.  Fused-and-tracked modules are trained to synergize each other

3) Landmark localization

  • 중요한 특징 부분들을 정의하고 추적 (keypoint estimation)
  • Coordinate regression: landmark가 N개일 때 각 포인트의 x,y 위치(2N개) 예측하는 방법, inaccurate & biased
    Heatmap classification: 각 키포인트마다확률맵을 픽셀별로 표현, 정확하지만 계산량 많아짐
  • Gaussian heatmap
  • Hourglass network
  • DensePose
    • Faster R-CNN + 3D surface regression branch
    • UV map: 3D mesh 표현
  • RetinaFace: FPN + Multi-task branches
    • Extension pattern: FPN + Target-task branches

4) Detecting objects as keypoints

  • CornerNet
    • Bounding box: {Top-left, Bottom-right} corners
  • CenterNet
    1. Bounding box: {Top-left, Bottom-right, Center} points
    2. Bounding box: {Width, Height, Center} points

2. Conditional generative model


1) Conditional generative model

  • Conditional generative model: 주어진 조건에서 random sample 생성
    • audio super resolution, machine translation, article generation with the title 등
  • Conditional GAN and image translation
    • Image-to-image translation: Style transfer, Super resolution, Colorization 등
  • Example of Conditional GAN: Super resolution
    • Input: low resolution image
      Output: High resolution image
    • MAE/MSE: pixel intensity difference 측정, 여러 패치들 존재
      GAN loss: 진짜처럼 보이는지 아닌지 비교
    • 실제 이미지가 검은색, 흰색 2가지 색밖에 없을 때
      L1 loss: 회색 output 생성
      GAN loss: 검은색 또는 흰색 output 생성
    • SRGAN(GAN loss for Super Resolution)

2) Image translation GANs

  • Pix2Pix: Translating an image to a corresponding image in another domain (e.g. style)
  • CycleGAN: enables the translation between domains with non-pairwise datasets
    • GAN loss: translation (X→Y, Y→X)
    • Cycle-consistency loss: preserve contents (self-supervision)
    •  CycleGAN loss = GAN loss (in both direction) + Cycle-consistency loss
  • Perceptual loss
    • 코드 작성 및 학습 쉬움 (simple forward & backward computation)
    • pre-trained network 필요
    • Image Transform Net: transformed image 출력
    • Loss Network: style & feature loss 계산, 주로 VGG 모델 사용
    • Feature reconstruction loss: output과 target image의 feature map의 L2  loss 계산
      Style reconstruction loss: featurer map으로부터 생성된 gram matrices의 L2 loss 계산

3) Various GAN applications

  • Deepfake
  • Face de-identification
  • Video translation


오늘은 여러가지 segmentation 및 생성 모델에 대해 배웠다. 깊게 파고들면 너무 어려울 것 같아서 그냥 이런 것들도 있구나 하고 넘어갔다.

