Streamling LLM

Created
Created
2023 Oct 14 11:35
Creator
Creator
Seonglae ChoSeonglae Cho
Editor
Edited
Edited
2024 Nov 27 15:4
Refs
Refs

The context window remains unchanged. Only the most recent tokens and attention sinks are retained

StreamingLLM is orthogonal to recent context extension methods and can be integrated with them
연속적인 스트리밍 데이터에 대한 효율적인 학습 및 추론
메모리 사용량을 크게 줄이면서도 높은 성능을 유지
최대 4백만 토큰 이상의 입력을 안정적이고 효율적으로 처리
notion image
 

SINK TOKEN

Even though keeping the first token might seem semantically meaningless, it has significance. This is because due to the characteristics of the Attention Mechanism, the first token is used as an anchor for calculating the Attention Score through positional embedding. Therefore, even if it's semantically meaningless, the model structurally requires it.
 
 
 
 
 

Recommendations