For each context sentence, the attribution score is computed by learning (using linear regression and LASSO) the difference in output log probabilities between masked (ablated) input and full input.
Using metrics like "top-k log probability drop" and "LDS rank correlation", this method consistently outperforms existing attribution techniques based on attention, gradients, and similarity.