Gemma 2's efficient design allows it to fit on less than half the compute of comparable models. The 27B model is optimized to run on NVIDIA’s GPUs or can run efficiently on a single TPU host in Vertex AI
- Sliding window attention: Interleaves local and global attention layers to balance quality and efficiency.
- Soft-Capping: Prevents logits from excessive growth, ensuring stable training.
Flash Attention implementation is wrong which brokes the model
Google for Developers Blog - News about Web, Mobile, AI and Cloud
The Gemma family expands further with the introduction of PaliGemma, and a sneak peek into the near future with the announcement of Gemma 2.
https://developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/

TheDrummer/Big-Tiger-Gemma-27B-v1 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/TheDrummer/Big-Tiger-Gemma-27B-v1

Seonglae Cho