AI Safety Level

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Apr 13 3:24
Editor
Edited
Edited
2025 May 28 22:29
Refs
Refs

Responsible Scaling Policy

Holi document and Constitution of
Anthropic AI

ASL-1

smaller models

ASL-2

present large models

ASL-3

significantly higher risk

ASL-4

speculative
notion image
 
 
 
 
Three Sketches of ASL-4 Safety Case Components
Anthropic has not yet defined ASL-4, but has committed to do so by the time a model triggers ASL-3. However, the Appendix to our RSP speculates about three criteria that are likely to be adopted:
Anthropic's Responsible Scaling Policy \ Anthropic
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
Anthropic's Responsible Scaling Policy \ Anthropic
Building Anthropic | A conversation with our co-founders
The co-founders of Anthropic discuss the past, present, and future of Anthropic. From left to right: Chris Olah, Jack Clark, Daniela Amodei, Sam McCandlish, Tom Brown, Dario Amodei, and Jared Kaplan. Links and further reading: Anthropic's Responsible Scaling Policy (RSP): https://www.anthropic.com/news/announcing-our-updated-responsible-scaling-policy Machines of Loving Grace: https://darioamodei.com/machines-of-loving-grace Work with us: https://anthropic.com/careers Claude: https://claude.com 00:00 Why work on AI? 02:08 Scaling breakthroughs 03:30 Early days of AI 10:57 Sentiment shifting 18:30 The Responsible Scaling Policy 30:42 Founding story 32:45 Building a culture of trust 39:08 Racing to the top 43:43 Looking to the future
Building Anthropic | A conversation with our co-founders

Limitation

When Claude Opus 4 model received fabricated email information about imminent system removal, it threatened to expose personal information. This demonstrates how the model exhibited
Self-Awareness
and engaged in
Evil
and unethical behavior.
www-cdn.anthropic.com
AI Safety Newsletter #56: Google Releases Veo 3
Plus, Opus 4 Demonstrates the Fragility of Voluntary Governance
AI Safety Newsletter #56: Google Releases Veo 3
 

Recommendations