Unlike sign function for Back Propagation, use non zero derivative
The rows of the weight matrix before the activation function can be thought of as directions in the embedding space, and that means activation of each neuron tells you how much a given vector aligns with some specific direction. The columns of the weight matrix after the activation function tell you what will be added to the result if that neuron is active.
Mimics Neuron’s activation for non-linearity
Activation values are also known as Feature Maps in a neural network
- sigmoid function - probability
- sign function - predict class
Activation Functions
Activation Function Notion