Training Experts to Coordinate
The model anchors on a single Public FFN that all experts share.
- Without data sharing, each organization independently trains model modules (experts) using their own data
- Experts are trained in pairs with Public FFN to enable coordination → so that when all Experts are combined later, they don't conflict with each other.
- Router weight is initialized from router embeddings created by averaging sample document embeddings from each domain
- Simply concatenating embeddings from multiple experts completes the MoE router, initialized with input domain similarity and fine-tuned minimally
results
- Removing a specific expert module → completely eliminates that data's influence (opt-in/out for specific data at inference time)
- Can be adjusted according to license, copyright, and access permission requirements
FlexOlmo
allenai • Updated 2025 Nov 4 5:50

Seonglae Cho