SDPAtorch.compile() 을 통해 매우 큰 성능향상Flash Attention Memory-efficient Attention cuDNN Fused Flash Attention over FlashAttentionV2 for H100PyTorch 2.5 Release BlogWe are excited to announce the release of PyTorch® 2.5 (release note)! This release features a new CuDNN backend for SDPA, enabling speedups by default for users of SDPA on H100s or newer GPUs. As well, regional compilation of torch.compile offers a way to reduce the cold start up time for torch.compile by allowing users to compile a repeated nn.Module (e.g. a transformer layer in LLM) without recompilations. Finally, TorchInductor CPP backend offers solid performance speedup with numerous enhancements like FP16 support, CPP wrapper, AOT-Inductor mode, and max-autotune mode.https://pytorch.org/blog/pytorch2-5/(Beta) Scaled Dot Product Attention (SDPA)로 고성능 트랜스포머(Transformers) 구현하기저자: Driss Guessous 번역: 이강희 요약: 이 튜토리얼에서, 트랜스포머(Transformer) 아키텍처 구현에 도움이 되는 새로운 torch.nn.functional 모듈의 함수를 소개합니다. 이 함수의 이름은 torch.nn.functional.scaled_dot_product_attention 입니다. 함수에 대한 자세한 설명은 PyTorch 문서 를 참고하세요. 이 함수는 이미 torch.nn.MultiheadAttention 과 torch.nn.TransformerEncoderLayer 에서 사용되고 있습니...https://tutorials.pytorch.kr/intermediate/scaled_dot_product_attention_tutorial.html