In LLM attention blocks, Q/K/V have no bias, and only the output projection (O) has bias.
The reason for Q/K/V is that the input mean is 0 right after LayerNorm, so bias has almost no effect, while O is added to the residual connection and trains well학습된다
Transformer Attentions

Seonglae Cho