YSU NLP Final

객관식 단답형 without seminar

마지막

AI Tool Calling 에서 간단한 한문제

BERT sequence 생성 못하고 하나의 고정 길이 벡터로 변환한다 (classification) CLS Token

unlike gpt2 pre-training, feature based라서 embedding 가중치 두고 추가 레이어만 학습시켰다

중간고사 많이 틀린거 다시 내는 교수님이니 개념 명확히 하기

시험 예상

LLM 을 이용한 아이디어 자유문제 하나 나올듯

GPT2

Paradigm shift

Word Vectors + Task specific architectures → Multi layer RNN → Pre-trained transformers + Fine-tuning

Task별 Limitations of Pre-training ➔ Fine-Tuning End up with many “copies” of the same model

학습 분포에 오버피팅이 될 뿐, Out-of-distribution(분포 외) 샘플에 대해서 제대로 동작하지 않음

벤치마크에서 높은 성능을 달성하더라도 그 데이터셋을 푼 것이지 그 태스크를 푼 것은 아님 Spurious correlation

Scaling up
Scaling Law

In-context Learning
Meta Learning (in charge of the inner loop while SGD is responsible for the outer loop)

Larger Models Learn Better In-Context

In context learning based on few shot

Unlike fine-tuning, the model is only trained once for all downstream tasks.

In-context Learning(Recognition)과 이전 Adaptation의 차이

(Pre-training and Fine-tuning): Adaptation

Dataset or metrics for GPT3

Perplexity (Language Modeling)

LAMBADA (Predict last word)

HellaSwag (ending)

StoryCloze (ending)

Natural Questions Web Questions TriviaQA

Translation Task (into English > from English)

Winograd-Style Tasks : Reading comprehension test Which word a pronoun refers to

Common Sense Reasoning: OpenBookQA, PIQA, ARC

Reading Comprehension (CoQA, QuAC, DROP, RACE, SQuADv2) - GPT3 bad

SuperGLUE

Natural Language Inference: ability to understand the relationship between two sentences (bad)

Because of the huge dataset ➔GPT-3 doesn't overfit on test data it has seen before. Performance drop when seen samples are removed from test set is small

Arithmetic

Is GPT-3 Just Memorizing Tables? No!

Word Manipulation

Cycle letters in word (CL)

Anagrams of all but the first and last k letters (A1, A2 for k=1,2)

Random insertion in word (RI)

Reversed words (RW)

Qualitative Tasks

News article generation

Limitation

Limited common sense

Poor one-shot and zero-shot performance

Lack of grounding

Commonsense Reasoning

Knowledge Graph, Knowledge Base

ConceptNet semantic relations + ATOMIC if-then = ATOMIC 20 20

COMET Commonsense Transformers, VisualCOMET

Benckmark

WinoGrade Schema Challenge (WG)

Choice of Plausible Alternatives (COPA): Commonsense causality, Visual COPA

CosmosQA: Commonsense Machine Comprehension but also reasoning with background knowledge

CommonsenseQA(CSQA)

Social Intelligence QA (SocialIQA) about social events from ATOMIC

Symbolic Knowledge Distillation Symbolic Distillation

From General Language Models to Commonsense Models

Machine-to-corpus-to-machine pipeline does not require human-authored knowledge

Loose teacher with critic model

Naive knowledge distillation trains the student model to match the teacher probabilities, thereby making it intractable.

Distill a symbolic knowledge graph

Distill only a selective aspect of the teacher model

LLM as Clinical Reasoner

Medical Chain-of-Thought Distillation

Dialogue Systems

Task-oriented Dialogue System, Open-domain Dialogue System

Persona-Grounded Dialogue

EmpatheticDialogues (benchmark)

Long Term Conversation

Multi Session Chat

BlenderBot (search)

Multi-modal Chatbot