Sparse Gated MoE

Creator
Creator
Seonglae Cho
Created
Created
2024 Jan 16 3:54
Editor
Edited
Edited
2025 Jan 28 13:11
Refs
Refs
FFNN
MoE recursive hierarchy MoE is available
MoEs replace dense feed-forward network layers with sparse MoE layers, consisting of a certain number of "experts", each being a neural network. This setup enables efficient pre-training and faster inference compared to dense models.
MoEs enable more compute-efficient pretraining compared to dense models, allowing for scaling up the model or dataset size with the same compute budget.
 
 
 

Sparsely Gated MoE

 
 
 

Recommendations