강의복습 1. Self-supervised Pre-training Models 더보기 1) GPT-1 효과적인 transfer learning 위해 // $ 와 같은 special token 사용 2) BERT masked language modeling task large-scale data & large-scale model Pre-training Tasks Masked Language Model (MLM) Mask some percentage of the input tokens at random, and then predict those masked tokens. 15% of the words to predict 80% of the time, replace with [MASK] 10% of the ..