Activation Oracle

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Dec 19 12:13
Editor
Edited
Edited
2025 Dec 19 12:20
Refs
A model that takes LLM activations as input and answers questions in natural language. Performance consistently improves as diverse training data (system prompt QA, classification, self-supervised context prediction) is mixed together.
The limitation is that it doesn't just insert into token embeddings, but rather manipulates the residual stream (activation) at specific positions to inject information on placeholder token.
 
 
 
 
 
 

Recommendations