Test-time Regression

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jun 23 22:46
Editor
Edited
Edited
2025 Jun 23 22:51
 
 
 
 
This sequence modeling framework explains
Seq2Seq
models, including various
Transformer Model
, from the perspective of
Associative Learning
rather than
RNN
. Based on associative memory, it defines Memorization as a weighted regression problem with key-value pairs and Retrieval: understanding it as applying the learned regression function to a query to extract values.
It derives existing techniques such as linear attention, state space models, fast-weight, online learners, and softmax attention as special cases, while presenting mathematical foundations for query-key normalization and new design possibilities like local polynomial attention extensions. Specifically, when derived using linear least squares, it can be run in a recursive form using Woodbury updates or LMS updates, making it appear like an RNN, but softmax attention and other non-recurrent models can be explained using the same principle.
 
 

Recommendations