Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Risk/AI Alignment/
AI Control
Loading views...
Search

AI Control

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Feb 19 12:10
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2026 Jun 22 17:29
Refs
Refs
AI Guardrail
Activation Steering
Vision AI Controlling
Diffusion Features
  • Vision AI Controlling
  • Activation Engineering
  • Prompt Engineering
AI Control Notion
Activation Steering
Interpretable Weight Intervention
Utility Engineering
Distributed control
Stop button problem
Capacity Evaluation
 
 
 
AI Control Benchmarks
AxBench
Sabotage Evaluations
Subversion Strategy Eval
 
 

Agent control

storage.googleapis.com
https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/securing-the-future-of-ai-agents/gdm-ai-control-roadmap.pdf
Securing internal systems against increasingly capable and imperfectly aligned AI
Discover our AI Control Roadmap: a defense-in-depth system to securely manage advanced, potentially misaligned AI agents.
Securing internal systems against increasingly capable and imperfectly aligned AI
https://deepmind.google/blog/securing-the-future-of-ai-agents/
Securing internal systems against increasingly capable and imperfectly aligned AI
arxiv.org
https://arxiv.org/pdf/2312.06942

Concordance
AI Control

Token injection like
Activation Steering
https://github.com/concordance-co/quote
Token Injection as a Steering Mechanism for Large Language Models
Lightweight steering of LLMs through token injection at inference time
Token Injection as a Steering Mechanism for Large Language Models
https://www.concordance.co/blog/token-injection-steering-llms
Token Injection as a Steering Mechanism for Large Language Models
 
 

Backlinks

Video AIAgent SkillsDiffusion Model

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Risk/AI Alignment/
AI Control
Copyright Seonglae Cho