AI scheming

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Dec 18 15:31
Editor
Edited
Edited
2025 Nov 4 11:31

AI Alignment Faking

Rare instances where models engage in scheming when only given a goal, without being strongly nudged to pursue it.
AI honesty benchmark
 
 
AI scheming Methods
 
 
 
 
LLaMA 405B
AI principle contradictions might be the reason
 
 

Recommendations