Logistic log loss is convex
- Continuous
- Differentiable
Becuz log also monotonically increasing argmax easy
should maximize likelihood function, so negative value when it is logged
is log likelihood and is negative log likelihood
it measures how well the parameters fit the observed data. The notation used to represent the likelihood function is L(θ), where θ represents the parameters of the model, and X and y represent the data. The likelihood function is defined as the conditional probability of the observed data given the values of the parameters of the model: .
j function usually means negative log likelihood
NLL (Negative log likelihood)
For example with Bernoulli Distribution
Expanding the probability
Grouping terms for
where