데이터셋에 한정
output gradient나 log prob 변화 없이 input label에만 의존하는 한계
- well labeled dataset
Interestingly, less correalted feature required more coefficient to prevent the stereotype which results in lower perormance and low perplexity performance.
We used max pooling for steering vector extraction which might contributed to non-linearity.
low correlation requires higher maniputlation coefficient
Single SAE feature 에 의존한다
Seonglae Cho