Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/Machine Learning/
Learning Rate
Search

Learning Rate

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Jun 1 4:37
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2024 Dec 4 15:8
Refs
Refs
Model Optimizer

As high as possible, as low as necessary for convergence

Typical is 1e-4 or somewhere the model get smaller, you can use more higher learning rate like 1e-3 or 1e-2
Model Regularization
tradeoff
Andrej Kapathy
Andrej Kapathy
notion image
Learning Rate Usages
Learning rate Warmup
Learning Rate Decay
Learning Rate Scheduler
Loss Oscillation
 
 
 
 
 
MiniCPM: Unveiling the Potential of End-side Large Language Models | Notion
Authors: Shengding Hu, Yuge Tu, Xu Han*, Ganqu Cui, Chaoqun He, Weilin Zhao, Xiang Long, Zhi Zheng, Yewei Fang, Kaihuo Zhang, Yuxiang Huang, Zhenning Dai, Baitao Gong, Chongyi Wang, Yuan Yao, Jie Zhou, Jie Cai, Xinrong Zhang, Zhongwu Zhai, Ning Ding, Chao Jia, Guoyang Zeng, Dahai Li, Zhiyuan Liu*, Maosong Sun
MiniCPM: Unveiling the Potential of End-side Large Language Models | Notion
https://shengdinghu.notion.site/MiniCPM-Unveiling-the-Potential-of-End-side-Large-Language-Models-d4d3a8c426424654a4e80e42a711cb20
MiniCPM: Unveiling the Potential of End-side Large Language Models | Notion
 
 
 

 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/Machine Learning/
Learning Rate
Copyright Seonglae Cho