(induced) incentive is key for safety
Risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks
Communities & Forums
- OpenAI
- …
AI Safety Notion
Concrete Problems in AI Safety (2016)
5 risks: Side effects, AI Reward Hacking, Non-scalable supervision, Non-safe exploration, Distribution Shift