initialization
초기에는 작은 learning rate 사용하다 안정되면 높이는 방식
warmup_steps
Number of steps used for a linear warmup from 0 to learning_rate
warmup_ratio
Ratio of total training steps used for a linear warmup from 0 tolearning_rate
.
warmup_steps
Number of steps used for a linear warmup from 0 to learning_ratewarmup_ratio
Ratio of total training steps used for a linear warmup from 0 to learning_rate
.