Gemma 3n
multimodality using MatTransformer (E4B, E2B)
- Per-Layer Embeddings: Core transformer weights in accelerator memory with remaining embeddings processed on CPU to maximize memory efficiency
- KV Cache Sharing: Reusing intermediate layer key-value information doubles the speed of long context streaming preprocessing

