GPT 2

Creator

Creator

Seonglae Cho

Created

Created

2020 Sep 14 4:31

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Apr 6 18:11

Refs

Refs

Zero shot learning

Few shot learning

karpathy • Updated 2024 Apr 10 2:38

Small (default 12 layers): 117M parameters

Reddit Data 40GB

Task에 따른 Fine tuning 없이 기존 Task의 SOTA 모델들을 넘어섬

잘 학습된 LLM 모델 하나로 모든 Task를 할 수 있을지도 모른다는 임팩트

notion image

notion image

Small (default)

117M

12 layers

12 attention heads per layer

768 hidden dim

Language Models are Unsupervised Multitask Learners

https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20 · karpathy llm.c · Discussion #481

Let's reproduce the GPT-2 (124M) in llm.c (~4,000 lines of C/CUDA) in 90 minutes for $20. The 124M model is the smallest model in the GPT-2 series released by OpenAI in 2019, and is actually qu...

Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20 · karpathy llm.c · Discussion #481

https://github.com/karpathy/llm.c/discussions/481

Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20 · karpathy llm.c · Discussion #481

Part 1: Best Practices for Finetuning Large Transformer Language models Part 2: How I (almost) replicated OpenAI's GPT-2 (124M version) small update (01/25/21): I posted this to HN and it ended up at the top of the front page for a couple of hours :O A few months ago I started working on a research project trying to pretrain my own, more efficient language model from scratch.

https://bkkaggle.github.io/blog/algpt2/2020/07/17/ALGPT2-part-2.html#replicating-gpt-2

Backlinks

SAE Implementation AI Coder

Recommendations

/////////