Masked Language Model
Next sentence prediction enables sentence relationships required for. For example Question-Answering, Natural language inference. NSP and bi-directionality empirically improved performance in BERT. It predict logits of is/isn’t the next sentence via a fully connected layer with non-linear activation.
Bidirectional Encoder Representations from Transformers
2018 10 released, NAACL 2019 best paper
GPT1에서 feature based approach에서 bert되면서 Fine Tuning based approach로 LLM 다루는 방식이 변함
- Fine-tuning: update including embedding
- Feature-based: Fix embedding and update only the layer above.
BERT Notion
BERT Usages