Attention Sink
with abstraction

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Dec 14 3:1
Editor
Edited
Edited
2024 Mar 14 10:2
Specific
Specific
Specific
Refs
Refs
Computable
Computable
Computable

transformer context vector 로 context update shorten

결국 attention의 square complexity는 모든 걸 해결해주지 않는다. 현재 sparse attention기법은 임의로 사람이 선택한 방식으로 attention을 optimization할 뿐 근본적인 해결책이 되지 못한다. LLM은 적재적소에 attention mechanism을 적용하면 충분한 지능을 보여주지만 모든 context를 고려하면 complexity가 너무 높아지고, 현재로써는 임의선택 말고 휴리스틱하게 attention을 선택하는 방법은 없다. 이 연구가 그에 대한 해결책으로 제안하는 방법은 context management with streaming이다. 결국 정보가 늘어날 수록 search에 대한 cost는 지속하여 늘어나기 때문에, cost 의 variable인 context length를 지속적으로 낮게 관리하는 방법밖에 없다. (또는 attention_mask를 이용하여 지속적인 가중치 관리)
summarization과 abstraction이 인간의 주요 기능이다. next token prediction이라는 language 기반의 사람 능력 위에서 발생하는 high level ability인 abstraction은 consciousness를 위한 중요한 발판이다. 이런 기능의 구현을 위해 연속적인 의식 agent를 attention sink에 기능을 추가하여 approximation하였다. Core idea는 fixed window에서 토큰을 추가생성하거나, 추가될 때 (듣거나 말할 때) 본인의 window 내부의 attention 값이 낮은 토큰들을 버리는 것이다.
Sparse attention 과 마찬가지로 quantization은 engineering optimization이지 근본적인 해결책이 아니다.

Streaming is all you need for AGI

 
 

계산 능력은 이미 인간수준이고 앞으로 AI 핵심은 메모리 계층이다. (rag는 일환)

  • reading 하고 inference하고 끝인데 그거를 반복하는 AI
  • 학습 과정에서 human brain처럼 아래의 cache layer를 호출하는 token으로 분리해서 학습시에도 사용해야 perplexity 낮추는 최적의 토큰 찾도록 학습
    • working memory (context)
      • long term memory (weights)
        • RAG (external knowledge)
 
 
for Multi-modality
 
topk 정보에서 output 말고도 우리 뇌에서는 사용하는데? 그게 반영이 되고 있나? hard하고 soft 가 같이 들어가야 할 것 같은데..
  • 2D attention 아니라 3D Attention
 
 

Super alignment

short description

In <=5 bullet points, summarize your application.
  • Implementing context updating through LLM streaming to mock consciousness.
  • Emulating human working memory through the output streaming of working memory models.
  • Developing the CU (Context-Updating) Transformer architecture for self-managing context.
  • Applying the CU architecture to existing Transformer-based language models for continuous, agent-like cognitive processing.
 
 

Your research project

In a half-page or less, describe the research you wish to pursue.
Please be very concrete! Include milestones and expected output.
You can include links, and/or upload a pdf with further details at the end of this form.
 

Abstraction

Providing high-quality context is crucial for the performance of Large Language Models (LLMs). However, consistently delivering quality text remains challenging, especially when handling raw data obtained through Retrieval Augmented Generation (RAG) methods. One strategy involves preprocessing the context to compress it and improve readability, or using Sparse Attention techniques to extend the context window. Despite these efforts, a fundamental problem persists: the context provided to the LLM is static. To achieve a continuous LLM, it is necessary to dynamically update the context through the attention mechanism, using streaming input. In response, we propose the Context-Updating (CU) Transformer, an innovative architecture designed for self-managing context. This architecture can be seamlessly integrated into existing transformer-based language models, enabling them to operate autonomously, similar to agents engaged in ongoing consciousness.

Introduction

In the quest to enhance Large Language Models (LLMs), the static nature of context has been a significant hurdle, limiting their ability to mimic the dynamic human cognitive process of continuously updating working memory in response to new stimuli. Traditional methods like quantization and sparse attention have provided partial solutions but fail to capture the fluidity of human thought. The Context-Updating (CU) Transformer introduces a novel approach by enabling dynamic context management through streaming inputs, promising to close the gap between AI and human cognitive capabilities. This innovation aims to enhance LLM performance and foster the development of AI systems that can interact with humans more naturally and safely, adapting to changing information and tasks with ease.

Background

The human memory system is hierarchically structured, with working memory at the forefront of conscious thought, processing incoming information and interfacing with the more stable and expansive long-term memory. Human long-term memory (LTM) significantly influences working memory (WM) (Bruning & Lewis-Peacock, 2020). Similarly, the in-context learning capabilities of large language models (LLMs) are shaped by their pre-training experiences (Dai et al., 2023).
However, LLMs are limited by static context processing, contrasting sharply with human memory's dynamic updating. Overcoming this challenge requires innovative solutions for dynamic context management, enhancing LLMs' adaptability and learning capabilities. Insights from cognitive science and computational model advancements herald a new era of AI systems, promising more natural interaction and learning, a significant step toward replicating human-like cognition in machines.
 
 

Connection to alignment and safety of superhuman AI systems

Please briefly explain the motivation for your proposed research and how it will help with the alignment and safety of future advanced AI systems.
 
The underlying motivation for our proposed research lies in addressing the concerns surrounding the development of Artificial General Intelligence (AGI) and ensuring the alignment and safety of future advanced AI systems. The apprehension about AGI stems from the notion of a powerful AI agent; while current AI systems are perceived more as dangerous tools than as imminent threats, it's imperative to prepare for the emergence of more autonomous AI agents that possess continuous consciousness and could independently access information, including the internet, without direct human oversight.
Our proposed Context-Updating (CU) Transformer architecture aims to mitigate these concerns by facilitating the operation of AI systems with the capability for infinite context updating. This approach allows for the continuous assessment of induced incentives during pre-training (Wei et al., 2022), which may remain concealed within the deeper structures of the model. By emulating aspects of consciousness, we can create a framework for understanding and predicting the behavior of AI systems, thus ensuring they align with human values and safety protocols. This research is crucial for developing AI systems that can be trusted to act autonomously in complex environments while remaining under human control and aligned with our ethical standards.
 
 
 
 

Budget description

Briefly break down how you intend to use the funds.
In addition to your mainline budget, it can be helpful for you to give a lower and an upper bound (what are smaller/larger versions of the project we could fund?).
 
Total: An estimated $75k cost allocated for the OpenAI Superalignment Fellowship
  • $15k for the OpenAI API for automated LLM evaluation
  • $40k for GPU resources & computing cost from cloud providers
  • $10k for the Huggingface inference API & Space deployment fee
 
Lower Bound Scenario
In a scenario where funding is limited to a smaller version of the project, priority would be given to essential computational resources and limited API access. A possible distribution could be $25,000 for GPU resources (reduced scope or less frequent experiments) and $5,000 for limited OpenAI API usage. This would total $30k, focusing on core development and testing.
Upper Bound Scenario
For a larger version of the project that can be funded, additional allocations could include expanded cloud resources and broader API access for extensive testing and evaluation. This could include an additional $25k for expanded GPU computing and API access totaling an increased budget of $100k.
 
 
 
 
 
 
 

Recommendations