SDFT

Self-Distillation Enables Continual Learning

The model uses itself as a teacher for on-policy learning. It generates its own outputs (its own trajectories) → learns from them again → learns including its own mistake states (on-policy) → reduces forgetting + learns new tasks better. By distilling on-policy using its own outputs, continual learning becomes possible without RL.

SDFT: Self-Distillation Enables Continual Learning

Idan Shenfeld1, Mehul Damani1, Jonas Hübotter2, Pulkit Agrawal1

https://self-distillation.github.io/SDFT.html

SDFT

Self-Distillation Enables Continual Learning

Recommendations