Score modification
ALiBi bias
Similar to Relative Positional Encoding but per-head factor that is typically precomputed and has beneficial properties for length extrapolation at inference
Soft-capping
FlexAttention is currently available in PyTorch nightly releases, we plan to release it as a prototype feature in 2.5.0