LF Steering

Creator

Creator

Seonglae Cho

Created

Created

2025 Aug 11 9:54

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Aug 11 10:2

Refs

Refs

Latent Feature Level Steering

Top-1 Influential Layer Identification: Train a classifier to predict consistency using hidden states of paraphrase pairs → Select the layer with highest influence

Calculate feature differences between paraphrase pairs that give correct/incorrect answers → Select features above a threshold

https://arxiv.org/pdf/2501.11036v2

Recommendations

//////////////