Negative Log-likelihood function
Also not commutative like KL.
posterior prior KL Divergence 를 minimize하는 건 log likelihood를 maximize하는 것과 같다
Cross entropy loss verification
We can calculate the cross-entropy loss achieved by this model in a few cases:
- Suppose the neuron only fires on feature A, and correctly predicts token A when it does. The model ignores all of the other features, predicting a uniform distribution over tokens B/C/D when feature A is not present. In this case the loss is
- Instead suppose that the neuron fires on both features A and B, predicting a uniform distribution over the A and B tokens. When the A and B features are not present, the model predicts a uniform distribution over the C and D tokens. In this case the loss is
Models trained on cross-entropy loss will generally prefer to represent more features polysemantically than to represent monosemantically even in cases where sparsity constraints make superposition impossible. Models trained on other loss functions do not necessarily suffer this problem. This is the reason why Neuron SAE use MSE reconstruction loss and sparsity loss to make monosemantic latent dictionary.
Cross Entropy Notion