arxiv.org
https://arxiv.org/pdf/2402.09221
interpreting GPT: the logit lens — AI Alignment Forum
This post relates an observation I've made in my work with GPT-2, which I have not seen made elsewhere. …
https://www.alignmentforum.org/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens

Seonglae Cho