AGOP
Average Gradient Outer Product
The covariance matrix of the model's gradient, showing how sensitively the model responds at input points and representing the average directionality of these sensitivities. Therefore, AGOP extracts the "feature directions that the model considers important".
RFM
An algorithm that repeatedly updates the feature space using AGOP
- Feature Update (reweighting)
- Model Refit (learning)
Less important feature directions gradually shrink, resulting in dimensionality reduction toward the task-relevant subspace.
RFM extracts concept vectors from each block. These vectors can be added to or subtracted from activation values (+εv) to steer model outputs toward specific concepts or to monitor whether certain concepts have been activated.