Bayesian Influence Function
Classical Influence Function in DNNs faces challenges due to singular Hessians and high dimensionality, leading to errors and high costs in inverse matrix approximation. BIF calculates influence using covariance from the posterior distribution instead of the "Hessian inverse," employing a localized posterior with Gaussian localization around the trained checkpoint w. For regular (non-singular) models, the first-order term of BIF converges to the classical IF (capturing higher-order information).

In vision models, similar to EK-FAC, BIF identifies training samples that are visually and semantically similar to the query as having high influence. In language models, token-level BIF captures correlations between semantically similar tokens.

BIF offers advantages: no Hessian inverse required, applicable to arbitrary architectures, and excellent scalability to large models. However, it requires many samples for covariance estimation (increasing time costs) and is sensitive to hyperparameters.