A model that takes LLM activations as input and answers questions in natural language. Performance consistently improves as diverse training data (system prompt QA, classification, self-supervised context prediction) is mixed together.
The limitation is that it doesn't just insert into token embeddings, but rather manipulates the residual stream (activation) at specific positions to inject information on placeholder token.

Seonglae Cho