SDFT

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Feb 18 17:9
Editor
Edited
Edited
2026 Feb 18 17:11

Self-Distillation Enables Continual Learning

The model uses itself as a teacher for on-policy learning. It generates its own outputs (its own trajectories) → learns from them again → learns including its own mistake states (on-policy) → reduces forgetting + learns new tasks better. By distilling on-policy using its own outputs, continual learning becomes possible without RL.
 
 
 
SDFT: Self-Distillation Enables Continual Learning
Idan Shenfeld1, Mehul Damani1, Jonas Hübotter2, Pulkit Agrawal1
 
 

Recommendations