Domain specific now, but could be generalized?
The core of Transformers is dot-product attention, but in complex space it becomes Hermitian inner product, which introduces a phase term that makes softmax meaningless and probabilistic interpretation of Attention Weight impossible

Seonglae Cho