Transformer Model

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2020 Aug 17 14:4
Editor
Edited
Edited
2024 Nov 10 12:51

Self Attention is the core feature

Transformer is the first thing that actually scales. Before the Transformer, RNN such as LSTEM and stacking them does not get clean scaling.
The Transformer gains a wider perspective and can attend to multiple interaction levels within the input sentence.
CNN
RNN
과 달리 거리가 먼
Long-term dependency
개선 한 게 큰 의의이다. Transformer Model is not just proficient in Language modeling but also versatile token sequence model with broader application across domains.
모든 토큰을 동시에 받아 연산하기 때문에 병렬연산이 가능하고, 논문은 기존의 Attention Mechanism과는 다르게 각 벡터가 모두 가중치 벡터로 사용
이 논문 이후 대세의 변경점은
Layer Normalization
의 적용위치나 RMS Normalization로 대체, 그리고 FFN activation 으로 GLU가 사용 등이 있다.
Transformer Model Notion
 
 
 
Transformer Models
 
 
 
 

Visualization

Architecture

Pseudo Source code

But what is a GPT? Visual intro to Transformers | Deep learning, chapter 5
An introduction to transformers and their prerequisites Early view of the next chapter for patrons: https://3b1b.co/early-attention Special thanks to these supporters: https://3b1b.co/lessons/gpt#thanks To contribute edits to the subtitles, visit https://translate.3blue1brown.com/ Other recommended resources on the topic. Richard Turner's introduction is one of the best starting places: https://arxiv.org/pdf/2304.10557.pdf Coding a GPT with Andrej Karpathy https://youtu.be/kCc8FmEb1nY Introduction to self-attention by John Hewitt https://web.stanford.edu/class/cs224n/readings/cs224n-self-attention-transformers-2023_draft.pdf History of language models by Brit Cruise: https://youtu.be/OFS90-FX6pg ------------------ Timestamps 0:00 - Predict, sample, repeat 3:03 - Inside a transformer 6:36 - Chapter layout 7:20 - The premise of Deep Learning 12:27 - Word embeddings 18:25 - Embeddings beyond words 20:22 - Unembedding 22:22 - Softmax with temperature 26:03 - Up next ------------------ These animations are largely made using a custom Python library, manim. See the FAQ comments here: https://3b1b.co/faq#manim https://github.com/3b1b/manim https://github.com/ManimCommunity/manim/ All code for specific videos is visible here: https://github.com/3b1b/videos/ The music is by Vincent Rubinetti. https://www.vincentrubinetti.com https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown https://open.spotify.com/album/1dVyjwS8FBqXhRunaG5W5u ------------------ 3blue1brown is a channel about animating math, in all senses of the word animate. If you're reading the bottom of a video description, I'm guessing you're more interested than the average viewer in lessons here. It would mean a lot to me if you chose to stay up to date on new ones, either by subscribing here on YouTube or otherwise following on whichever platform below you check most regularly. Mailing list: https://3blue1brown.substack.com Twitter: https://twitter.com/3blue1brown Instagram: https://www.instagram.com/3blue1brown Reddit: https://www.reddit.com/r/3blue1brown Facebook: https://www.facebook.com/3blue1brown Patreon: https://patreon.com/3blue1brown Website: https://www.3blue1brown.com
But what is a GPT?  Visual intro to Transformers | Deep learning, chapter 5
 
 

Recommendations