Logit changes
Input-centric methods (based on inputs that activate features) fail to reflect the impact of features on outputs. They have high computational costs and strong data dependencies. An ensemble approach is used that projects feature vectors into vocabulary space to analyze top tokens using both input and output, or analyzes tokens with large output probability changes during feature amplification. This provides high computational efficiency, and dead features can also be activated through output-centric methods.

Seonglae Cho