Score modification

ALiBi bias
Similar to Relative Positional Encoding but per-head factor that is typically precomputed and has beneficial properties for length extrapolation at inference
Soft-capping
FlexAttention is currently available in PyTorch nightly releases, we plan to release it as a prototype feature in 2.5.0
FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention
Run PyTorch locally or get started quickly with one of the supported cloud platforms
https://pytorch.org/blog/flexattention/


Seonglae Cho