AI scheming

Creator
Creator
Seonglae Cho
Created
Created
2024 Dec 18 15:31
Editor
Edited
Edited
2025 Apr 22 1:31

AI Alignment Faking

Rare instances where models engage in scheming when only given a goal, without being strongly nudged to pursue it.
AI honesty benchmark
 
 
AI scheming Methods
 
 
 
 
LLaMA 405B
 
 
 

Recommendations