CorrSteer Introduction

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jan 16 15:16
Editor
Edited
Edited
2025 Aug 10 20:49
Refs
Refs
이거 아이디어 그게 나을듯 dataset 뿐만 아니라 generated 된 text 사용해서 mean correlated 맞고 틀리기 corerlation 걸경우 그걸로 쉽게 feature 찾을 수 있다고 dataset도 마찬가지
NeuronEval
보면 굉장히 좋은 어프로치였고.
seelctive baenchmark like anthropic did is somehow does not work and do not reflect language model’s generation ability directly than generation based benchmark : paper motivation
  • llm inconsistent criteria which is a common problem when using LLM as a judge, correlation method provides absolute metric to measure the accuracy of mutual information.
Correlation plays also important role for its ability to find
such as circuit discovery and identify feature spliting across different SAEs with each expansion factor. frok pearson correlatipn to rank correlation
 
 
 
 
 
 
 

Recommendations