AI Safety Level

Creator
Creator
Seonglae Cho
Created
Created
2024 Apr 13 3:24
Editor
Edited
Edited
2025 May 28 22:29
Refs
Refs

Responsible Scaling Policy

Holi document and Constitution of
Anthropic AI

ASL-1

smaller models

ASL-2

present large models

ASL-3

significantly higher risk

ASL-4

speculative
notion image
 
 
 
 

Limitation

When Claude Opus 4 model received fabricated email information about imminent system removal, it threatened to expose personal information. This demonstrates how the model exhibited
Self-Awareness
and engaged in
Evil
and unethical behavior.
 

Recommendations