MLE

Created
Created
2023 Mar 23 1:41
Editor
Creator
Creator
Seonglae Cho
Edited
Edited
2025 Feb 4 14:9

Maximum likelihood estimation

MLE is the case where MAP's Prior is a Uniform Distribution (i.e., not considering prior probability)
θ^mlearg maxθp(Dθ)\hat{\theta}_{mle} \in \argmax_\theta{p(\mathcal{D}|\theta)}
We usually assume that training data is
iid
hence
p(Dθ)=i=1np(YiXi,θ)p(\mathcal{D}|\theta) = \prod_{i=1}^np(Y_i|X_i, \theta)
For computational reasons, we work with the NLL logL(θ)-\log \mathcal{L}(\theta) with minimizing it
l(θ)=logp(Dθ)=logL(θ)=i=1nlogp(YiXi,θ)\mathcal{l}(\theta) = \log{p(\mathcal{D}| \theta)} = \log \mathcal{L}(\theta) = \sum_{i=1}^{n}\log p(Y_i|X_i, \theta)
Therefore we search θ^mlearg minθl(θ)\hat{\theta}_{mle} \in \argmin_\theta l(\theta) with
Log-likelihood function

NLL (Negative log likelihood)

For example with
Bernoulli Distribution
NLL(θ)=logi=1np(Yiθ)\text{NLL}(\theta) = - \log \prod_{i=1}^n p(Y_i | \theta)
Expanding the probability p(Yiθ)p(Y_i | \theta)
=logi=1nθ1(Yi=1)(1θ)1(Yi=0)= - \log \prod_{i=1}^n \theta^{1(Y_i = 1)} (1 - \theta)^{1(Y_i = 0)}=i=1n[1(Yi=1)logθ+1(Yi=0)log(1θ)] = - \sum_{i=1}^n \left[ 1(Y_i = 1) \log \theta + 1(Y_i = 0) \log(1 - \theta) \right] 
Grouping terms for Yi=1,Yi=0Y_i = 1, Y_i = 0
=(N1logθ+N0log(1θ))= - \left( N_1 \log \theta + N_0 \log(1 - \theta) \right)
where Nj=i=1n1(Yi=j),j=0,1.N_j = \sum_{i=1}^n 1(Y_i = j), \quad j = 0, 1.
MLE Notion
 
 
 
 
 
 
 

Recommendations