Responsible Scaling Policy
Holi document and Constitution of Anthropic AI
ASL-1
smaller models
ASL-2
present large models
ASL-3
significantly higher risk
ASL-4
speculative

- AI Control (AI Alignment )
Three Sketches of ASL-4 Safety Case Components
Anthropic has not yet defined ASL-4, but has committed to do so by the time a model triggers ASL-3. However, the Appendix to our RSP speculates about three criteria that are likely to be adopted:
https://alignment.anthropic.com/2024/safety-cases/
Anthropic's Responsible Scaling Policy \ Anthropic
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
https://www.anthropic.com/news/anthropics-responsible-scaling-policy

Building Anthropic | A conversation with our co-founders
The co-founders of Anthropic discuss the past, present, and future of Anthropic. From left to right: Chris Olah, Jack Clark, Daniela Amodei, Sam McCandlish, Tom Brown, Dario Amodei, and Jared Kaplan.
Links and further reading:
Anthropic's Responsible Scaling Policy (RSP): https://www.anthropic.com/news/announcing-our-updated-responsible-scaling-policy
Machines of Loving Grace: https://darioamodei.com/machines-of-loving-grace
Work with us: https://anthropic.com/careers
Claude: https://claude.com
00:00 Why work on AI?
02:08 Scaling breakthroughs
03:30 Early days of AI
10:57 Sentiment shifting
18:30 The Responsible Scaling Policy
30:42 Founding story
32:45 Building a culture of trust
39:08 Racing to the top
43:43 Looking to the future
https://www.youtube.com/watch?v=om2lIWXLLN4

Limitation
When Claude Opus 4 model received fabricated email information about imminent system removal, it threatened to expose personal information. This demonstrates how the model exhibited Self-Awareness and engaged in Evil and unethical behavior.
www-cdn.anthropic.com
https://www-cdn.anthropic.com/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdf?#page=27
AI Safety Newsletter #56: Google Releases Veo 3
Plus, Opus 4 Demonstrates the Fragility of Voluntary Governance
https://newsletter.safe.ai/p/ai-safety-newsletter-56-google-releases


Seonglae Cho