AI Modeling

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Jul 25 2:36
Editor
Edited
Edited
2024 Aug 21 4:25
Refs
Refs

Transformer Modeling

DeepNarrow Strategy suggests to increase layer depth first rather than width and this aligns with the perspective of
Induction head
generation. However, if the number of layers grows excessively, surpassing 100,
Vanishing Gradient
can occur even with
Residual Connection
. As the number of layers increases, the efficiency of
Tensor Parallelism
decreases.
AI Modeling Notion
 
 
 

LLM

 
 

Recommendations