Empirical Risk Minimization
minθn1∑i=1nℓ(f(xi;θ),yi)+Ω(θ)A generalization of the Maximum Likelihood principle
MLE: replace the log likelihood with any other loss function
l L(θ)=n1∑i=1nl(yi,θ)+λC(θ)When a loss function is computationally difficult to minimize, it is often replaced with convex upper bounds