used by any function like
from_pretrained()
Parameters
Pytorch
attn_implementation- eager (manual implementation of the attention)
- sdpa torch.nn.functional.scaled_dot_product_attention
- flash_attention_2 Flash Attention
Configuration
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/docs/transformers/main_classes/configuration

Seonglae Cho