Log-likelihood function

Created
Created
2023 Mar 23 2:13
Creator
Creator
Seonglae Cho
Editor
Edited
Edited
2024 Dec 4 23:56
Refs

Logistic log loss is convex

  • Continuous
  • Differentiable

Becuz log also monotonically increasing argmax easy

should maximize likelihood function, so negative value when it is logged
l(θ)l(\theta) is log likelihood and l(θ)-l(\theta) is negative log likelihood
L(θ)=L(θ;X,y)=p(yX;θ)L(\theta) = L(\theta; X, \vec{y}) = p(\vec{y}|X;\theta)l(θ)=logL(θ)=Σi=1np(y(i)x(i);θ)l(\theta) = log{L(\theta)} = \Sigma_{i=1}^np(y^{(i)} |x{(i)};\theta)
it measures how well the parameters fit the observed data. The notation used to represent the likelihood function is L(θ), where θ represents the parameters of the model, and X and y represent the data. The likelihood function is defined as the conditional probability of the observed data given the values of the parameters of the model: L(θ)=p(yX;θ)L(\theta) = p(y|X;\theta).
 
 
j function usually means negative log likelihood

NLL (Negative log likelihood)

For example with
Bernoulli Distribution
NLL(θ)=logi=1np(Yiθ)\text{NLL}(\theta) = - \log \prod_{i=1}^n p(Y_i | \theta)
Expanding the probability p(Yiθ)p(Y_i | \theta)
=logi=1nθ1(Yi=1)(1θ)1(Yi=0)= - \log \prod_{i=1}^n \theta^{1(Y_i = 1)} (1 - \theta)^{1(Y_i = 0)}=i=1n[1(Yi=1)logθ+1(Yi=0)log(1θ)] = - \sum_{i=1}^n \left[ 1(Y_i = 1) \log \theta + 1(Y_i = 0) \log(1 - \theta) \right] 
Grouping terms for Yi=1,Yi=0Y_i = 1, Y_i = 0
=(N1logθ+N0log(1θ))= - \left( N_1 \log \theta + N_0 \log(1 - \theta) \right)
where Nj=i=1n1(Yi=j),j=0,1.N_j = \sum_{i=1}^n 1(Y_i = j), \quad j = 0, 1.
 
 
 
 
 
 
 

Recommendations