Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/AI Control/
Subversion Strategy Eval
Search

Subversion Strategy Eval

Creator
Creator
Seonglae Cho
Created
Created
2025 Apr 14 13:20
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Apr 14 13:20
Refs
Refs
 
 
 
 
 
 
Subversion Strategy Eval: Can language models statelessly...
An AI control protocol is a plan for usefully deploying AI systems that aims to prevent an AI from intentionally causing some unacceptable outcome. This paper investigates how well AI systems can...
Subversion Strategy Eval: Can language models statelessly...
https://arxiv.org/abs/2412.12480
Subversion Strategy Eval: Can language models statelessly...
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/AI Control/
Subversion Strategy Eval
Copyright Seonglae Cho