Absolute Positional Encoding

Creator

Creator

Created

Created

2024 Mar 1 14:13

Editor

Editor

Edited

Edited

2024 Mar 2 7:6

Refs

Refs

d is model embedding dimension

i is index of embedding vector

p is token position of input text

전체적으로 위치별로 다른 frequency를 가지게 해서 인식가능하게 하고 even odd 함수 다르게 아는 거는 위상차로 위치별로 극도로 다르게 보이게 함

아래 수식의 핵심

embedding dimension에 따라 exponential한 position을 고려하기 위한 설계

고주파 저주판 분리해 frequency로 position을 구분하기 위함

https://wikidocs.net/31379

즉 이론적으로 model embedding depth의 exponential한 position을 구분할 수 있다

결국 attention weight이 이런 수학적 position encoding에 fitting시키는 것 이런 함수의 설계는 embedding 끼리 구분만 되면 그만

Papers with Code - Absolute Position Encodings Explained

Absolute Position Encodings are a type of position embeddings for [Transformer-based models] where positional encodings are added to the input embeddings at the bottoms of the encoder and decoder stacks. The positional encodings have the same dimension $d_{model}$ as the embeddings, so that the two can be summed. In the original implementation, sine and cosine functions of different frequencies are used: $$ \text{PE}\left(pos, 2i\right) = \sin\left(pos/10000^{2i/d_{model}}\right) $$ $$ \text{PE}\left(pos, 2i+1\right) = \cos\left(pos/10000^{2i/d_{model}}\right) $$ where $pos$ is the position and $i$ is the dimension. That is, each dimension of the positional encoding corresponds to a sinusoid. The wavelengths form a geometric progression from $2\pi$ to $10000 \dot 2\pi$. This function was chosen because the authors hypothesized it would allow the model to easily learn to attend by relative positions, since for any fixed offset $k$, $\text{PE}_{pos+k}$ can be represented as a linear function of $\text{PE}_{pos}$. Image Source: D2L.ai

Papers with Code - Absolute Position Encodings Explained

https://paperswithcode.com/method/absolute-position-encodings

Papers with Code - Absolute Position Encodings Explained

Korean

[논문 스터디 Week 4-5] Attention is All You Need

[논문 스터디 Week 4-5] Attention is All You Need

https://velog.io/@stapers/논문-스터디-Week4-5-Attention-is-All-You-Need

[논문 스터디 Week 4-5] Attention is All You Need

[딥러닝] 언어모델, RNN, GRU, LSTM, Attention, Transformer, GPT, BERT 개념 정리

언어모델에 대한 기초적인 정리

[딥러닝] 언어모델, RNN, GRU, LSTM, Attention, Transformer, GPT, BERT 개념 정리

https://velog.io/@rsj9987/딥러닝-용어정리

[딥러닝] 언어모델, RNN, GRU, LSTM, Attention, Transformer, GPT, BERT 개념 정리

Recommendations

////////