Steering on correlated sae features improve benchmmakrs not only probing
CRL (Control model training with RL), CTRL (Control model Training with RL)
- SPOT (Sparse Policy Optimization for Circuit control)
- OSCAR (Optimizing Sparse Circuits via Autoencoder Reinforcement)
declartion file after acc knolwedgement
Control RL Papers
Seonglae Cho