Layer-wise Pruning of Transformer Attention Heads for Efficient...While Transformer-based models have shown impressive language modeling performance, the large computation cost is often prohibitive for practical use. Attention head pruning, which removes...https://arxiv.org/abs/2110.03252