DeepSeekMoE: Towards Ultimate Expert Specialization in...
In the era of large language models, Mixture-of-Experts (MoE) is a promising architecture for managing computational costs when scaling up model parameters. However, conventional MoE architectures...
https://arxiv.org/abs/2401.06066v1