RL based SAE activation control
It improves LLM performance to interact with environment with PPO like RL techniques without fine-tuning. I guess it automatic policy learning could be helpful to LLM for adapting Sweet Spot of Feature Steering
sae-rl
Jazhyc • Updated 2025 Jun 24 9:2