MoE Routing

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Jan 16 4:18
Editor
Edited
Edited
2025 Nov 14 0:43

AI Load Balancing

Usually MLP routing is done per layer (since attention weights are shared) and routing is based on
Affinity Score
, then top-k is selected and weighted sum is performed based on the scores.
Typically, there is a small single layer per layer, then softmax logits are applied and top-k is selected
MoE Routing Notion
 
 
 

DeepSeek

Load Balancing loss (
Parallel Training
)

Octopus

 
 

Recommendations