Back Propagation in assignment 3
Mixture of Factor Analyzers super complex but not that important
- pca
in assignment 4
- mog
Perceptron


Bias & Variance
Bias-Variance Trade-off
- Bias Can be come from intrinsic not due to the lack of data, caused by that the family of models fundamentally cannot approximate
- Variance captures how the random nature of the finite dataset and spurious Pattern Comes from randomness of dataset (the sensitivity of the model to the randomness in the dataset)
If a model is too “simple” and has very few parameters, then it may have large bias (but
small variance), and it typically may suffer from underfittng
If a model is too “complex” and has very many parameters, then it may suffer from large
variance (but have small bias), and thus overfitting

MSE는 위와 같다
bias는 실제 값하고 차이, variance 는 estimator 의 분산
with noise data

Model Generalization
model’s capability to adapt properly to new/unseen data
Model Regularization
Regularization controls the model complexity and prevents overfitting
regularization parameter, regularizer, regularized loss
- Weight Decay perspective of regularized loss
- sparsity of the model has lot zeroed
Data Clustering
- Soft K-means Algorithm using regularizer

Parameter Estimation


- for multi

- Kernel PCA and Isomap require to explicitly describe the distance among data point
- learn a mapping function from the data itself to feature space
Generative models aim to estimate probability density function of data
The decoder of AE can be used for data generation after understanding about the distribution of the plausible latent code
Latent codes capture semantic, domain information
Posterior can be approximated by using encoder network

mu and sigma is a function of x (f_1, f_2)


A is for reconstruction, B is for regularization to make the approximate posterior close to the prior
Decoder

Reparameterization is a key trick to train VAE
Reparametrization Trick
Some random variables can be represented as a function of another variable


Then B



just learn how to generate samples by learning a transformation (generator) from a simple distribution
D tries to distinguish between real and fake output
G tries to fool the discriminator by generating real-looking output

즉 좌항은 실제 데이터에 대해 잘 구별하게 하도록 하는거고
우항은 만든 데이터에 대해서 잘 구별 못하게 하는
max discriminator first and train generator so

Seonglae Cho