Gemma 2's efficient design allows it to fit on less than half the compute of comparable models. The 27B model is optimized to run on NVIDIA’s GPUs or can run efficiently on a single TPU host in Vertex AI
- Sliding window attention: Interleaves local and global attention layers to balance quality and efficiency.
- Soft-Capping: Prevents logits from excessive growth, ensuring stable training.