used by any function like
from_pretrained()
Parameters
Pytorch
attn_implementation
- eager (manual implementation of the attention)
- sdpa torch.scaled_dot_product_attention
- flash_attention_2 Flash Attention
PretrainedConfig
**kwargsfrom_pretrained()
attn_implementation