used by any function like
from_pretrained()
Parameters
Pytorch
attn_implementation- eager (manual implementation of the attention)
- sdpa torch.nn.functional.scaled_dot_product_attention
- flash_attention_2 Flash Attention
PretrainedConfig **kwargs
Seonglae Cho
Seonglae Chofrom_pretrained()attn_implementation