CorrSteer Related Limitation

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jan 16 17:18
Editor
Edited
Edited
2025 Feb 9 16:0
Refs
Refs
데이터셋에 한정
output gradient나 log prob 변화 없이 input label에만 의존하는 한계
  • well labeled dataset
Interestingly, less correalted feature required more coefficient to prevent the stereotype which results in lower perormance and low perplexity performance.
We used max pooling for steering vector extraction which might contributed to non-linearity.
low correlation requires higher maniputlation coefficient
 
Single SAE feature 에 의존한다
 
 
 
 
 

Recommendations