AI Load Balancing
Usually MLP routing is done per layer (since attention weights are shared) and routing is based on Affinity Score, then top-k is selected and weighted sum is performed based on the scores.
MoE Routing Notion
 Seonglae Cho
Seonglae Cho Seonglae Cho
Seonglae Cho