Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/
Logit Lens
Search

Logit Lens

Creator
Creator
Seonglae Cho
Created
Created
2024 Oct 14 1:33
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Feb 16 23:39
Refs
Refs
 
 
 
 
Attention Sink
arxiv.org
https://arxiv.org/pdf/2402.09221
interpreting GPT: the logit lens — AI Alignment Forum
This post relates an observation I've made in my work with GPT-2, which I have not seen made elsewhere. …
interpreting GPT: the logit lens — AI Alignment Forum
https://www.alignmentforum.org/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens
interpreting GPT: the logit lens — AI Alignment Forum
 
 

Backlinks

Patchscopes

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/
Logit Lens
Copyright Seonglae Cho