Maximum likelihood estimation
MLE is the case where MAP's Prior is a Uniform Distribution (i.e., not considering prior probability)
We usually assume that training data is iid hence
For computational reasons, we work with the NLL with minimizing it
Therefore we search with Log-likelihood function
NLL (Negative log likelihood)
For example with Bernoulli Distribution
Expanding the probability
Grouping terms for
where
MLE Notion