Sparse Gated MoE

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Jan 16 3:54
Editor
Edited
Edited
2024 Apr 3 5:16
Refs
Refs
MoE recursive hierarchy MoE is available
MoEs replace dense feed-forward network layers with sparse MoE layers, consisting of a certain number of "experts", each being a neural network. This setup enables efficient pre-training and faster inference compared to dense models.
MoEs enable more compute-efficient pretraining compared to dense models, allowing for scaling up the model or dataset size with the same compute budget.
 
 
 

Sparsely Gated MoE

 
 
 

Recommendations