Reinforcement learning 아니지만 wrapper로 들어있다
max_seq_length-min(tokenizer.model_max_length, 1024)
packing- Dataset Packing
formatting_func
Supervised Fine-tuning Trainer
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/docs/trl/sft_trainer

Seonglae Cho