Training Experts to Coordinate
The model anchors on a single Public FFN that all experts share.
- Without data sharing, each organization independently trains model modules (experts) using their own data
- Experts are trained in pairs with Public FFN to enable coordination → so that when all Experts are combined later, they don't conflict with each other.
- Router weight is initialized from router embeddings created by averaging sample document embeddings from each domain
- Simply concatenating embeddings from multiple experts completes the MoE router, initialized with input domain similarity and fine-tuned minimally
results
- Removing a specific expert module → completely eliminates that data's influence (opt-in/out for specific data at inference time)
- Can be adjusted according to license, copyright, and access permission requirements
arxiv.org
https://arxiv.org/pdf/2507.07024
allenai/FlexOlmo-7x7B-1T · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/allenai/FlexOlmo-7x7B-1T
FlexOlmo
allenai • Updated 2026 Feb 2 19:10
Introducing FlexOlmo: a new paradigm for language model training and data collaboration | Ai2
Explore how FlexOlmo enables collaborative language model training without sacrificing data privacy or control, introducing a new, flexible approach to building shared AI models.
https://allenai.org/blog/flexolmo

Seonglae Cho