TreeFormer idea

Treensformer

Hirearchical structural context embedding (reuse context vector as an input)

llm이 context length가 매우 제한되어 있으니 그걸 늘이려는 연구 많음 context length 늘일수록 그에 따른 긴 데이터 학습이 필요하고 시간과 computing 많이 든다 rope같은 연구가 해결방안을 제시했지만 lost in the middle같이 문제점이 명확하다 ring attention같이 context 길이 늘이는 혁신적인 방안을 제시했지만 근본적인 context 제한을 없에지는 못한다 결국 사람과 지식의 확장성을 위해선 들어오는 대로 context를 수정없이 사용하는 게 아니라 업데이트해야한다 지수적으로 context를 관리하기 위해서는 context를 tree구조로 관리하여 연관된 부분만 효율적으로 사용해야 한다 즉 context을 크게 부분부분으로 나누어 abstraction하여 중요한 part만 vector로 decompose해서 context로 사용해야 한다 여기서 우리는 transformer model의 context vector를 사용할 건데, 기본적으로 text embedding과 비슷한 접근방식을 사용한다 다만 동일 trasformer model의 최종 attention output인 context vector를 재사용하며 n token길이의 텍스트를 1개의 context vector로 대체할 수 있다.

가장 간단한 방식의 연구로는 그냥 QA의 passage들을 개별 embedding으로 사용한 뒤에 비교 (max_token을 더 정확하게 사용가능하다)

TreeFormer Notion

TreeFormer POC

Treensformer Expanding Strategy

Transformer Shrink Strategy

TreeFormer Cons

TreeFormer Implementation

TreeFormer idea

Treensformer

Recommendations