(induced) incentive is key for safety
Risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks
Communities & Forums
- OpenAI
- …
Slow take-off is important because we need to ask: has there ever been a case where thorough consideration of safety resulted in a completely secure final product? Safety rules are written in blood. The counterargument is that prevented accidents don't make headlines, but it's still necessary to test systems with minimal risk in controlled environments. That's why gradually releasing AI models is also a strategy for safe AGI at the frontier.
AI Safety Notion
Concrete Problems in AI Safety (2016)
5 risks: Side effects, AI Reward Hacking, Non-scalable supervision, Non-safe exploration, Distribution Shift

Seonglae Cho


