Model Interpretability Methods
LoRA LoRA-Models-for-SAEsmatchten • Updated 2025 Feb 3 19:47 (LoRA on LLMs not SAE)
LoRA-Models-for-SAEs
matchten • Updated 2025 Feb 3 19:47
For fine-tuning, we used approximately 15M tokens randomly sampled from The Pile dataset. For evaluation, we also used a separate validation set of 1M tokens and significantly reduced the distance between logits using LoRA. For the loss, they used KL divergence loss of logits for LoRA training.