AGOP
Average Gradient Outer Product
The covariance matrix of the model's gradient, showing how sensitively the model responds at input points and representing the average directionality of these sensitivities. Therefore, AGOP extracts the "feature directions that the model considers important".
RFM
An algorithm that repeatedly updates the feature space using AGOP
- Feature Update (reweighting)
- Model Refit (learning)
Less important feature directions gradually shrink, resulting in dimensionality reduction toward the task-relevant subspace.
arxiv.org
https://arxiv.org/pdf/2401.04553
RFM extracts concept vectors from each block. These vectors can be added to or subtracted from activation values (+εv) to steer model outputs toward specific concepts or to monitor whether certain concepts have been activated.
arxiv.org
https://arxiv.org/pdf/2502.03708v2

Seonglae Cho