SAE Structure & Weight Initialization

dynamic top k using norm sae for simplex finding

모두 같은 방향으로

훈련되고 나서도 그래야 하니 차원별 각도만 학습? superpositoin toy model 그것처럼 feature dimensionality 를 학습하는건 어떤가 ㅋㅋ 그게 sae dictionary 1 될대까지모두

그뒤에 fine tuning 으로 derived weight 에 조절

feature diemnsionality 일정하게 차원 사용하게 initialization

single token feautre 에서 feature fusion 되는거 이용해서 token embedding 에 하나씩 강제 매칭으로 학습 초반 시키고

다른 아이디어는 token 과 dictionary 1대 1 매칭시키고 dictionary size 로 top frequent 로 weight warmup

핵심 구현

weight

차원별 feature dimensionality
차원별 feature angle

각 차원별 feature 가 어떻게 ㅅ들어갔는지 직접적으로 dynamic 하게 학습할 수 있어서, dictionary size 고정이 아니게 된다

Continuous Concept Changing Hypothesis

sae concept space 에서 autoregressive token think 중 direciton vector 가 연속적으로 변한다

Similar Neuron Hypothesis

feature universality 뿐만 아니라 inter-model universality of neuron in the perspective of neuron combination

similar neuron hypothesis 되면 그걸 기반으로 common feature와 양방향 init 가능

check cos similarity of encoder decoder metric

training

Sample efficiency and

for future studies, feature hierachy and subset/superset related structure across scaling model scaling might happens when trained on synthetic dataset