Latent Feature Level Steering
- Top-1 Influential Layer Identification: Train a classifier to predict consistency using hidden states of paraphrase pairs → Select the layer with highest influence
- Calculate feature differences between paraphrase pairs that give correct/incorrect answers → Select features above a threshold