Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Steering Vector/
Conditional Vector Steering
Search

Conditional Vector Steering

Creator
Creator
Seonglae Cho
Created
Created
2025 Jan 29 11:8
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Apr 6 17:35
Refs
Refs

CAST (Conditional Activation Steering)
activation-steering
IBM • Updated 2025 May 3 15:22

  • Conditional SAE clamping
  • Conditional SAE steering
  • Constant SAE clamping
 
 
 
 

Conditional refusal steering

arxiv.org
https://arxiv.org/pdf/2409.05907
arxiv.org
https://arxiv.org/pdf/2411.11296v1

Sieve (2024.12)

for code generation specifically not using regex (very simple and naive task)
Sieve: SAEs Beat Baselines on a Real-World Task (A Code Generation Case Study) | Tilde
Our methods achieve Pareto dominance on the axis of task success rate vs task constraint satisfaction vs general model performance.
Sieve: SAEs Beat Baselines on a Real-World Task (A Code Generation Case Study) | Tilde
https://www.tilderesearch.com/blog/sieve
Sieve: SAEs Beat Baselines on a Real-World Task (A Code Generation Case Study) | Tilde
tilde-research/sieve_coding · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
tilde-research/sieve_coding · Hugging Face
https://huggingface.co/tilde-research/sieve_coding
tilde-research/sieve_coding · Hugging Face
Compare
Alpaca Dataset
/
Sorry Bench
  • AI Condition Vector
    (extract to prompt)
  • Refusal vector (apply to response)
notion image
 
 
 
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Steering Vector/
Conditional Vector Steering
Copyright Seonglae Cho