Unfortunately, the most natural computational unit of the neural network – the neuron itself turns out not to be a natural unit for human understanding. This is because many neurons are polysemantic(Superposition Hypothesis). Superposition can arise naturally during the course of neural network training if the set of features useful to a model are sparse in the training data.
AI Neuron Activation Notion
Safety features
There are features representing more abstract properties of the input, might there also be more abstract, higher-level actions which trigger behaviors over the span of multiple tokens?
- High-Level Actions
- Planning
- Social Reasoning
- Personas