AI Safety

Creator
Creator
Seonglae Cho
Created
Created
2023 Jun 13 11:43
Editor
Edited
Edited
2025 Jun 25 10:5

(induced) incentive is key for safety

Risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks

Communities & Forums

  • OpenAI
AI Safety Notion
 
 
 
 

Concrete Problems in AI Safety (2016)

5 risks: Side effects,
AI Reward Hacking
, Non-scalable supervision, Non-safe exploration,
Distribution Shift

Problem statements

 
 

 

Recommendations