rather than simply putting it into the token embedding, the limitation is that it manipulates the residual stream (activation) at a specific position to inject it
Latent Interpretation Tuning
fine-tune a LLM with activation–question–answer triples
arxiv.org
https://arxiv.org/pdf/2412.08686

Seonglae Cho