Masked Language Model
Next sentence prediction enables sentence relationships required for. For example Question-Answering, Natural language inference. NSP and bi-directionality empirically improved performance in BERT. It predict logits of is/isn’t the next sentence via a fully connected layer with non-linear activation.
Bidirectional Encoder Representations from Transformers
2018 10 released, NAACL 2019 best paper
GPT1에서 feature based approach에서 bert되면서 Finetuning based approach로 LLM 다루는 방식이 변함
- Fine-tuning: update including embedding
- Feature-based: Fix embedding and update only the layer above.

BERT Notion
BERT Usages
pbelcak/UltraFastBERT-1x11-long · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/pbelcak/UltraFastBERT-1x11-long
BERT: Pre-training of Deep Bidirectional Transformers for Language...
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is...
https://arxiv.org/abs/1810.04805

기본 제공 BERT 알고리즘 시작하기 | AI Platform Training | Google Cloud
https://cloud.google.com/ai-platform/training/docs/algorithms/bert-start?hl=ko

Seonglae Cho