SAE Feature Direction

Creator

Created

2025 Feb 17 1:46

Editor

Edited

2025 May 11 21:54

Refs

Weight Cosine Similarity

Orphan features still shows high interpretability which indicates the different seed may have found the subset of the “idealized dictionary size”.

seed - weight initialization matters →
SAE weight initialization helps to prevent this issue

It sets 1 - cosine similarity matrix to cost matrix and applies

Hungarian Matching to find optimal 1:1 matching