Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Activation Proving/
Lie detector probe
Search

Lie detector probe

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Apr 21 14:37
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Apr 21 14:40
Refs
Refs
 
 
 
 
not SAE, just logistic regression performs well
Try training token-level probes — LessWrong
TL,DR: I train a probe to detect falsehoods on a token-level, i.e. to highlight the specific tokens that make a statement false. It worked surprising…
Try training token-level probes — LessWrong
https://www.lesswrong.com/posts/kxiizuSa3sSi4TJsN/try-training-token-level-probes
Try training token-level probes — LessWrong
 
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Activation Proving/
Lie detector probe
Copyright Seonglae Cho