AI Control Notion
AI Control Benchmarks
Agent control
storage.googleapis.com
https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/securing-the-future-of-ai-agents/gdm-ai-control-roadmap.pdf
Securing internal systems against increasingly capable and imperfectly aligned AI
Discover our AI Control Roadmap: a defense-in-depth system to securely manage advanced, potentially misaligned AI agents.
https://deepmind.google/blog/securing-the-future-of-ai-agents/
arxiv.org
https://arxiv.org/pdf/2312.06942
Concordance AI Control
Token injection likeActivation Steering https://github.com/concordance-co/quote
Token Injection as a Steering Mechanism for Large Language Models
Lightweight steering of LLMs through token injection at inference time
https://www.concordance.co/blog/token-injection-steering-llms


Seonglae Cho