torch.nn.functional.scaled_dot_product_attention

Creator

Creator

Created

Created

2024 Feb 29 8:23

Editor

Editor

Edited

Edited

2024 Oct 18 22:54

Refs

Refs

Scaled Attention

Dot-Product Attention

SDPA

torch.compile() 을 통해 매우 큰 성능향상

Flash Attention

Memory-efficient Attention

cuDNN Fused Flash Attention over FlashAttentionV2 for H100

PyTorch 2.5 Release Blog

We are excited to announce the release of PyTorch® 2.5 (release note)! This release features a new CuDNN backend for SDPA, enabling speedups by default for users of SDPA on H100s or newer GPUs. As well, regional compilation of torch.compile offers a way to reduce the cold start up time for torch.compile by allowing users to compile a repeated nn.Module (e.g. a transformer layer in LLM) without recompilations. Finally, TorchInductor CPP backend offers solid performance speedup with numerous enhancements like FP16 support, CPP wrapper, AOT-Inductor mode, and max-autotune mode.

https://pytorch.org/blog/pytorch2-5/

PyTorch 2.5 Release Blog

(Beta) Scaled Dot Product Attention (SDPA)로 고성능 트랜스포머(Transformers) 구현하기

저자: Driss Guessous 번역: 이강희 요약: 이 튜토리얼에서, 트랜스포머(Transformer) 아키텍처 구현에 도움이 되는 새로운 torch.nn.functional 모듈의 함수를 소개합니다. 이 함수의 이름은 torch.nn.functional.scaled_dot_product_attention 입니다. 함수에 대한 자세한 설명은 PyTorch 문서 를 참고하세요. 이 함수는 이미 torch.nn.MultiheadAttention 과 torch.nn.TransformerEncoderLayer 에서 사용되고 있습니...

https://tutorials.pytorch.kr/intermediate/scaled_dot_product_attention_tutorial.html

(Beta) Scaled Dot Product Attention (SDPA)로 고성능 트랜스포머(Transformers) 구현하기

Backlinks

torch.nn Huggingface PretrainedConfig **kwargs

Recommendations

//////////