Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Risk/AI Alignment/
AI Control
Search

AI Control

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Feb 19 12:10
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2026 Feb 10 16:42
Refs
Refs
AI Guardrail
Steering Vector
Vision AI Controlling
  • Vision AI Controlling
  • Activation Engineering
  • Prompt Engineering
AI Control Notion
Steering Vector
Interpretable Weight Intervention
Utility Engineering
Distributed control
Stop button problem
Capacity Evaluation
 
 
 
AI Control Benchmarks
AxBench
Sabotage Evaluations
Subversion Strategy Eval
 
 
 
arxiv.org
https://arxiv.org/pdf/2312.06942

Concordance
AI Control

Token injection like
Steering Vector
https://github.com/concordance-co/quote
Token Injection as a Steering Mechanism for Large Language Models
Lightweight steering of LLMs through token injection at inference time
Token Injection as a Steering Mechanism for Large Language Models
https://www.concordance.co/blog/token-injection-steering-llms
Token Injection as a Steering Mechanism for Large Language Models
 

Backlinks

Video AIAgent Skills

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Risk/AI Alignment/
AI Control
Copyright Seonglae Cho