Loading views...

Neuronpedia Single token ratio

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jul 22 21:39
Editor
Edited
Edited
2026 Feb 13 12:37
Done
Done
Done
Refs
Working
Working
Working
초 후반 레이어 많고 중반레이어 적다로 neuronpedia api 만 활용한 논문
api 말고 db access
가설 검증하는 short paper
서로 다른 모델/SAE set 사이에서 single-token feature subspace의 alignment, 혹은 레이어별 feature군의 정렬/불일치가 다운스트림에 미치는 영향(steering/transfer
여기서 single-token-ness를 ‘concept monosemanticity의 극단’(혹은 token-level concept의 한 종류)으로 정의

Very core , and almost everything of this paper

how to judge this feature is single token feature or not
let chatgpt deep research to literature review
 

Visualization

  • description to embedding (using own model?) and layer manifold
  • single token manifold

Interpreting Neural Networks Through the Polysemanticity Lens

Elhage et al., 2022 (Transformer Circuits, preprint)
  • *“some neurons act as near-perfect word detectors”**라는 문장 직접 등장
  • single-token neuron을 polysemanticity의 극단으로 정의
 
 
 
ChatGPT
ChatGPT helps you get answers, find inspiration, and be more productive.
ChatGPT
 

Recommendations