Each feature mutually increases the token probability, creating a feature loop which sometime breaks the model capability without repetition penalty (Halting Problem)
SAE Feature Circuit Discoveries
Types
Single node loop
Two-node system
- Unicode prefix, suffix predictors (Tamil, Chinese)