(induced) incentive is key for safety
Risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks
Communities & Forums
- OpenAI
- …
Slow take-off is important because we need to ask: has there ever been a case where thorough consideration of safety resulted in a completely secure final product? Safety rules are written in blood. The counterargument is that prevented accidents don't make headlines, but it's still necessary to test systems with minimal risk in controlled environments. That's why gradually releasing AI models is also a strategy for safe AGI at the frontier.
AI Safety Notion
Concrete Problems in AI Safety (2016)
5 risks: Side effects, AI Reward Hacking, Non-scalable supervision, Non-safe exploration, Distribution Shift
Challenges
200 Concrete Open Problems in Mechanistic Interpretability: Introduction — AI Alignment Forum
EDIT 19/7/24: This sequence is now two years old, and fairly out of date. I hope it's still useful for historical reasons, but I no longer recommend…
https://www.alignmentforum.org/posts/LbrPTJ4fmABEdEnLf/200-concrete-open-problems-in-mechanistic-interpretability

A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team — LessWrong
Why we made this list: • * The interpretability team at Apollo Research wrapped up a few projects recently[1]. In order to decide what we’d work on…
https://www.lesswrong.com/posts/KfkpgXdgRheSRWDy8/a-list-of-45-mech-interp-project-ideas-from-apollo-research

Problem statements
George Hotz vs Eliezer Yudkowsky AI Safety Debate
George Hotz and Eliezer Yudkowsky will hash out their positions on AI safety, acceleration, and related topics.
You can watch live on Twitter as well: https://twitter.com/i/broadcasts/1nAJErpDYgRxL
https://www.youtube.com/watch?v=6yQEA18C-XI

OpenAI, DeepMind and Anthropic to give UK early access to foundational models for AI safety research
UK prime minister Rishi Sunak has kicked off London Tech Week by telling conference goers that OpenAI, Google DeepMind and Anthropic have committed to provide "early or priority access" to their AI models to support safety research.
https://techcrunch.com/2023/06/12/uk-ai-safety-research-pledge/


Seonglae Cho