Lightweight text embedding model: 308M parameters, runs on-device with less than 200MB RAM when quantized. Achieves top performance among sub-500M models on MTEB (multilingual, English, and code).
- Gemma 3-based encoder-decoder initialization. Gemma 3 is a decoder-only LLM. It's further trained as an encoder-decoder structure, then only the encoder is extracted and used as the initial weights for the embedding model.
- Large model distillation from Gemini Embedding
- Spread-Out Regularization: a constraint that prevents embeddings from collapsing to a single point or direction.
- Model Souping: averaging weights from multiple models trained on different data mixes. "Merging a brain good at retrieval + a brain good at classification 50/50"
- MRL (Matryoshka Representation Learning): supports variable embedding dimensions with MRL. Matryoshka Embedding
arxiv.org
https://arxiv.org/pdf/2509.20354
EmbeddingGemma
Google DeepMind
https://deepmind.google/models/gemma/embeddinggemma/
Huggingface
EmbeddingGemma - a google Collection
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/collections/google/embeddinggemma
How to use
Welcome EmbeddingGemma, Google's new efficient embedding model
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/blog/embeddinggemma

Seonglae Cho