Emergent ability of Attention Mechanism since of Induction head
In modern language models, tokens later in the context are easier to predict than tokens earlier in the context. As the context gets longer, loss goes down. In some sense this is just what a sequence model is designed to do (use earlier elements in the sequence to predict later ones), but as the ability to predict later tokens from earlier ones gets better, it can increasingly be used in interesting ways (such as specifying tasks, giving instructions, or asking the model to match a pattern) that suggest it can usefully be thought of as a phenomenon of its own. When thought of in this way, it is usually referred to as in-context learning.
Emergent in-context learning was noted in GPT-2 and gained significant attention in GPT-3. Simply by adjusting a “prompt”, transformers can be adapted to do many useful things without re-training, such as translation, question-answering, arithmetic, and many other tasks. Using “prompt engineering” to leverage in-context learning became a popular topic of study and discussion. - Anthropic AI
In-context learning Notion
Open AI paper
Few-shot PEFT is cost efficient for specific task than in-context learning
Overall Korean
In-context Learning 에 대해 알아보자 (Feat. 논문 읽는 tip) - Ai 언어모델 로컬 채널
In-Context Learning (aka. few-shot learning)대형 언어 생성 모델에는 정말 신기한 점이 많다. 처음에 Causal Model, 또는 AutoRegressive 모델이라고 불리는, 이
https://arca.live/b/alpaca/75432756


Seonglae Cho