SFT

Supervised Fine-Tuning

Dataset for AI are three types

Background information -
Pretraining

Problems with solution -
SFT

Practice problems -
Reinforcement Learning

SFT Memorizes, RL Generalizes (

AI Memory,

Model Generalization,

OOD)

While the provocative title is not exactly correct, it provides insight even for Multimodality

SFT Memorizes, RL Generalizes

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

https://tianzhechu.com/SFTvsRL/

SFT rarely alters the underlying model capabilities which means practitioners can unintentionally remove a model’s safety wrapper by merely fine-tuning it on a superficially unrelated task

openreview.net

https://openreview.net/pdf?id=FbFyO1r7rV

OOD generalization is crucial given the wide range of real-world scenarios in which these models are being used, while output diversity refers to the model’s ability to generate varied outputs and is important for a variety of use cases

RLHF generalizes better than SFT to new inputs, particularly as the distribution shift between train and test becomes larger. However, RLHF significantly reduces output diversity compared to SFT across a variety of measures, implying a tradeoff in current LLM fine-tuning methods between generalization and diversity.

arxiv.org

https://arxiv.org/pdf/2310.06452

Supervised Fine-tuning Trainer

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/docs/trl/sft_trainer

SFT

Supervised Fine-Tuning

Dataset for AI are three types

Backlinks

Recommendations