Logistic log loss is convex Becuz log also monotonically increasing argmax easy should maximize likelihood function, so negative value when it is logged
l ( θ ) l(\theta) l ( θ ) is log likelihood and
− l ( θ ) -l(\theta) − l ( θ ) is negative log likelihood
L ( θ ) = L ( θ ; X , y ⃗ ) = p ( y ⃗ ∣ X ; θ ) L(\theta) = L(\theta; X, \vec{y}) = p(\vec{y}|X;\theta) L ( θ ) = L ( θ ; X , y ) = p ( y ∣ X ; θ ) l ( θ ) = l o g L ( θ ) = Σ i = 1 n p ( y ( i ) ∣ x ( i ) ; θ ) l(\theta) = log{L(\theta)} = \Sigma_{i=1}^np(y^{(i)} |x{(i)};\theta) l ( θ ) = l o g L ( θ ) = Σ i = 1 n p ( y ( i ) ∣ x ( i ) ; θ ) it measures how well the parameters fit the observed data. The notation used to represent the likelihood function is L(θ), where θ represents the parameters of the model, and X and y represent the data. The likelihood function is defined as the conditional probability of the observed data given the values of the parameters of the model:
L ( θ ) = p ( y ∣ X ; θ ) L(\theta) = p(y|X;\theta) L ( θ ) = p ( y ∣ X ; θ ) .
j function usually means negative log likelihood
NLL (Negative log likelihood) NLL ( θ ) = − log ∏ i = 1 n p ( Y i ∣ θ ) \text{NLL}(\theta) = - \log \prod_{i=1}^n p(Y_i | \theta) NLL ( θ ) = − log ∏ i = 1 n p ( Y i ∣ θ ) Expanding the probability
p ( Y i ∣ θ ) p(Y_i | \theta) p ( Y i ∣ θ ) = − log ∏ i = 1 n θ 1 ( Y i = 1 ) ( 1 − θ ) 1 ( Y i = 0 ) = - \log \prod_{i=1}^n \theta^{1(Y_i = 1)} (1 - \theta)^{1(Y_i = 0)} = − log ∏ i = 1 n θ 1 ( Y i = 1 ) ( 1 − θ ) 1 ( Y i = 0 ) = − ∑ i = 1 n [ 1 ( Y i = 1 ) log θ + 1 ( Y i = 0 ) log ( 1 − θ ) ] = - \sum_{i=1}^n \left[ 1(Y_i = 1) \log \theta + 1(Y_i = 0) \log(1 - \theta) \right] = − ∑ i = 1 n [ 1 ( Y i = 1 ) log θ + 1 ( Y i = 0 ) log ( 1 − θ ) ] Grouping terms for
Y i = 1 , Y i = 0 Y_i = 1, Y_i = 0 Y i = 1 , Y i = 0 = − ( N 1 log θ + N 0 log ( 1 − θ ) ) = - \left( N_1 \log \theta + N_0 \log(1 - \theta) \right) = − ( N 1 log θ + N 0 log ( 1 − θ ) ) where
N j = ∑ i = 1 n 1 ( Y i = j ) , j = 0 , 1. N_j = \sum_{i=1}^n 1(Y_i = j), \quad j = 0, 1. N j = ∑ i = 1 n 1 ( Y i = j ) , j = 0 , 1.