Learning Rate

Creator

Creator

Seonglae Cho

Created

Created

2023 Jun 1 4:37

Editor

Editor

Seonglae Cho

Edited

Edited

2024 Dec 4 15:8

Refs

Refs

pytorch-lr-finder

davidtvs • Updated 2024 Feb 29 17:14

Model Optimizer

As high as possible, as low as necessary for convergence

Typical is 1e-4 or somewhere the model get smaller, you can use more higher learning rate like 1e-3 or 1e-2

Model Regularization tradeoff

Andrej Kapathy

notion image

Learning Rate Usages

Learning rate Warmup

Learning Rate Decay

Learning Rate Scheduler

Loss Oscillation

MiniCPM: Unveiling the Potential of End-side Large Language Models | Notion

Authors: Shengding Hu, Yuge Tu, Xu Han*, Ganqu Cui, Chaoqun He, Weilin Zhao, Xiang Long, Zhi Zheng, Yewei Fang, Kaihuo Zhang, Yuxiang Huang, Zhenning Dai, Baitao Gong, Chongyi Wang, Yuan Yao, Jie Zhou, Jie Cai, Xinrong Zhang, Zhongwu Zhai, Ning Ding, Chao Jia, Guoyang Zeng, Dahai Li, Zhiyuan Liu*, Maosong Sun

MiniCPM: Unveiling the Potential of End-side Large Language Models | Notion

https://shengdinghu.notion.site/MiniCPM-Unveiling-the-Potential-of-End-side-Large-Language-Models-d4d3a8c426424654a4e80e42a711cb20

MiniCPM: Unveiling the Potential of End-side Large Language Models | Notion

Recommendations

/////