AI Load Balancing
Usually MLP routing is done per layer (since attention weights are shared) and routing is based on Affinity Score, then top-k is selected and weighted sum is performed based on the scores.
Typically, there is a small single layer per layer, then softmax logits are applied and top-k is selected
MoE Routing Notion

Seonglae Cho