For each context sentence, the attribution score is computed by learning (using linear regression and LASSO) the difference in output log probabilities between masked (ablated) input and full input.
Using metrics like "top-k log probability drop" and "LDS rank correlation", this method consistently outperforms existing attribution techniques based on attention, gradients, and similarity.
ContextCite: Attributing Model Generation to Context
How do language models use information provided as context when generating a response? Can we infer whether a particular generated statement is actually grounded in the context, a...
https://arxiv.org/abs/2409.00729


Seonglae Cho