RL based SAE activation control
It improves LLM performance to interact with environment with PPO like RL techniques without fine-tuning. I guess it automatic policy learning could be helpful to LLM for adapting Sweet Spot of Feature Steering
sae-rl
Jazhyc • Updated 2024 Dec 8 20:40