Mixture-of-Experts
Mixture-of-Experts models improves efficiency by activating a small subset of model weights for a given input, decoupling model size from inference efficiency.
MoEs have seen great success in LLMs. In a nutshell, MoEs are pre-trained faster, and have a faster inference, but require more memory and face challenges in fine-tuning.
MoE Notion
Structure
1991