Self-Distillation Enables Continual Learning
The model uses itself as a teacher for on-policy learning. It generates its own outputs (its own trajectories) → learns from them again → learns including its own mistake states (on-policy) → reduces forgetting + learns new tasks better. By distilling on-policy using its own outputs, continual learning becomes possible without RL.
SDFT: Self-Distillation Enables Continual Learning
Idan Shenfeld1, Mehul Damani1, Jonas Hübotter2, Pulkit Agrawal1
https://self-distillation.github.io/SDFT.html

Seonglae Cho