Self-Adapting LLMs

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jun 27 14:44
Editor
Edited
Edited
2025 Oct 14 22:31

SEAL

SEAL is a framework that trains models to generate 'self-edit' instructions and synthetic training data to update their weights. The model reconstructs training data or suggests hyperparameters based on input context, performs lightweight fine-tuning using LoRA, and then improves the self-edit generation policy through
ReSTEM
using performance as a reward.
Specifically, instead of directly learning QuAD sentences, the model fine-tunes using implication sentences it creates, increasing QA accuracy from 33.5% to 47.0% even without context. Additionally, in the ARC reasoning task, it automatically selects appropriate data augmentation and optimization settings, raising the success rate from 0% to 72.5%.
notion image

Outer RL loop

Optimizes self-edit generation method using reinforcement learning.

Inner loop

Updates actual model parameters with SFT using self-edit.
Long-term adaptation: Permanently modifies weights through actual finetuning (SFT).
Evaluates self-edit quality with ReSTEM and optimizes in the outer loop. Relatively slow (performs actual finetuning, takes tens of seconds).
 
 
 
 
 

Recommendations