MoE SAE Circuit

chat

circuit tracing을 포함한 circuit analysis의 핵심이 network component 간에 causality 를 알아내는 건데, 그런 관점에서 행동을 모방하는 transcoder 가 representation 자체를 학습하는 sae 보다 적합하다고 여겨지는 것 같아요. circuit tracing 논문을 저도 소화는 아직 다 못했지만 cross-layer transcoder 를 사용하는 걸로 알고 있고 latent dimension 만 share 하는 crosscoder 랑 달리 encoder 를 공유해서 이후 layer 들별로 다른 decoder 를 학습시키더라고요. 그렇게 어마어마하게 layer 간 절반 fully connected graph 에서 pruning 하고 corr 으로 importance score 이런저런 메트릭 써서 attribution graph 얻던데, 저한테는 좀 과하게 테크닉이 들어간 느낌이긴 하더라고요. 저희 moe 같은 연구 경우엔 expert 간에 causality 를 분석하려는 건 아니니 transcoder 가 굳이 필요할까 싶은 생각이지만, expert 사이 공유되는 attention input 을 output 으로 학습시키는 건 괜찮을 수도있겠다 싶긴 하고요 이게 token 사이 operation 이라 transcoder 로 학습이 안될거같긴 한데. iid 가 아니라 attention transcoder는 batchtopk 로 근사는 될텐데 이상적 해결책은 아님

Circuit

circuit tracing을 포함한 circuit analysis의 핵심이 network component 간에 causality 를 알아내는 건데, 그런 관점에서 행동을 모방하는 transcoder 가 representation 자체를 학습하는 sae 보다 적합. latent dimension 만 share 하는 crosscoder 랑 달리 encoder 를 공유해서 이후 layer 들별로 다른 decoder 를 학습. layer 간 절반 fully connected graph 에서 pruning 하고 corr 으로 importance score 이런저런 메트릭 써서 attribution graph. moe 같은 연구 경우엔 expert 간에 causality 를 분석하려는 건 아니니 transcoder 가 굳이 필요할까 싶은 생각이지만, expert 사이 공유되는 attention input 을 output 으로 학습시키는 건 괜찮을 수도있겠다 싶긴 하고요 이게 token 사이 operation 이라 transcoder 로 학습이 안될거같긴 한데. crosscoder 방식은 expert 같에 input output 상관없이 mlp output 을 동일 latent space에 학습시키는 방식으로는 가능.

MoE SAE Circuit

chat

Circuit

Recommendations