image, text input not video
Introducing Gemma 3: The most capable model you can run on a single GPU or TPU
Today, we're introducing Gemma 3, our most capable, portable and responsible open model yet.
https://blog.google/technology/developers/gemma-3/

Gemma 3 Technical Report
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding...
https://arxiv.org/abs/2503.19786

Gemma 3n
multimodality using MatTransformer (E4B, E2B)
- Per-Layer Embeddings: Core transformer weights in accelerator memory with remaining embeddings processed on CPU to maximize memory efficiency
- KV Cache Sharing: Reusing intermediate layer key-value information doubles the speed of long context streaming preprocessing


Google for Developers Blog - News about Web, Mobile, AI and Cloud
Learn how to build with Gemma 3n, a mobile-first architecture, MatFormer technology, Per-Layer Embeddings, and new audio and vision encoders.
https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/

Gemma 3n - a google Collection
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/collections/google/gemma-3n-685065323f5984ef315c93f4

Seonglae Cho