Gemma 2

Creator

Creator

Created

Created

2024 May 18 7:37

Editor

Editor

Edited

Edited

2025 Mar 13 19:37

Refs

Refs

Gemma 2's efficient design allows it to fit on less than half the compute of comparable models. The 27B model is optimized to run on NVIDIA’s GPUs or can run efficiently on a single TPU host in Vertex AI

Sliding window attention: Interleaves local and global attention layers to balance quality and efficiency.

Soft-Capping: Prevents logits from excessive growth, ensuring stable training.

Flash Attention implementation is wrong which brokes the model

Training Fails with attn_implementation="flash_attention_2" in gemma-2-9b Model

Updated 2025 Apr 30 12:22

google/gemma-2-9b · Gemma 2's Flash attention 2 implementation is strange...

I tested with torch.manual_seed(0).

google/gemma-2-9b · Gemma 2's Flash attention 2 implementation is strange...

https://huggingface.co/google/gemma-2-9b/discussions/23

google/gemma-2-9b · Gemma 2's Flash attention 2 implementation is strange...

Google for Developers Blog - News about Web, Mobile, AI and Cloud

The Gemma family expands further with the introduction of PaliGemma, and a sneak peek into the near future with the announcement of Gemma 2.

https://developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/

Google for Developers Blog - News about Web, Mobile, AI and Cloud

TheDrummer/Big-Tiger-Gemma-27B-v1 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

TheDrummer/Big-Tiger-Gemma-27B-v1 · Hugging Face

https://huggingface.co/TheDrummer/Big-Tiger-Gemma-27B-v1

TheDrummer/Big-Tiger-Gemma-27B-v1 · Hugging Face

Recommendations

/////////