AI Safety

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Jun 13 11:43
Editor
Edited
Edited
2025 Dec 17 14:49

(induced) incentive is key for safety

Risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks

Communities & Forums

  • OpenAI
Slow take-off
is important because we need to ask: has there ever been a case where thorough consideration of safety resulted in a completely secure final product? Safety rules are written in blood. The counterargument is that prevented accidents don't make headlines, but it's still necessary to test systems with minimal risk in controlled environments. That's why gradually releasing AI models is also a strategy for safe AGI at the frontier.
AI Safety Notion
 
 
 
 

Concrete Problems in AI Safety (2016)

5 risks: Side effects,
AI Reward Hacking
, Non-scalable supervision, Non-safe exploration,
Distribution Shift
arxiv.org

Problem statements

arxiv.org
George Hotz vs Eliezer Yudkowsky AI Safety Debate
George Hotz and Eliezer Yudkowsky will hash out their positions on AI safety, acceleration, and related topics. You can watch live on Twitter as well: https://twitter.com/i/broadcasts/1nAJErpDYgRxL
George Hotz vs Eliezer Yudkowsky AI Safety Debate
OpenAI, DeepMind and Anthropic to give UK early access to foundational models for AI safety research
UK prime minister Rishi Sunak has kicked off London Tech Week by telling conference goers that OpenAI, Google DeepMind and Anthropic have committed to provide "early or priority access" to their AI models to support safety research.
OpenAI, DeepMind and Anthropic to give UK early access to foundational models for AI safety research
 
 

 

Recommendations