AI Safety

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Jun 13 11:43
Editor
Edited
Edited
2025 Dec 17 14:49

(induced) incentive is key for safety

Risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks

Communities & Forums

  • OpenAI
Slow take-off
is important because we need to ask: has there ever been a case where thorough consideration of safety resulted in a completely secure final product? Safety rules are written in blood. The counterargument is that prevented accidents don't make headlines, but it's still necessary to test systems with minimal risk in controlled environments. That's why gradually releasing AI models is also a strategy for safe AGI at the frontier.
AI Safety Notion
 
 
 
 

Concrete Problems in AI Safety (2016)

5 risks: Side effects,
AI Reward Hacking
, Non-scalable supervision, Non-safe exploration,
Distribution Shift

Problem statements

 
 

 

Recommendations