midterm 이후 범위 Lecture 5-10
I strongly encourage you to work on the slides and the problem sheets from the TA sessions, make sure you understand all concepts and are familiar with all important derivations. Bear in mind that exams are designed to assess whether you understood what happened in the module (threshold for passing the module) -- the more fluent you are with the concepts and formulas, the higher your final mark.
Lecture 5
- Independent implies uncorrelated: True
- Independent implies correlated: False
- Dependent implies uncorrelated: False
- Dependent implies correlated: False
- Correlated implies dependent: True
- Correlated implies independent: False
- Uncorrelated implies dependent: False
- Uncorrelated implies independent: False
Transfer Formula Law of the Unconscious Statistician
Marginalization
Multivariate Gaussian
Lecture 6
Lecture 7
Graphical Model
In a directed graph, if a node sends an arrow, it goes into the conditional part, and if it receives an arrow, it goes into the marginal part. Multiplication is done as many times as the number of nodes. When , and are predecessors of , while is not a parent of (only B is). In other words, .
where denotes the number of nodes in the graph. For example when , .
where denotes the set of Cliques, and each factor is a non-negative function over the clique . Note that is called the partition function. For example when , .
Without Collider, every directional graphical model is identical.
Note that the direction of arrows indicates causality, not the direction of inference
- The law of Large number
If variable is sum not average, the variance and mean is multiplied by the same sample count
MLE
- Bernoulli distribution of Log-likelihood function for coin toss
- MLE of that is ~
- Give the MLE for the coin toss problem
Lecture 8
use the posterior distribution to represent that uncertainty unlike above only estimate optimal points.
Lets start with the
Posterior Predictive Distribution can be represented as
We can use above Joint distribution
Assuming of and given
And then we got the first form
Conjugacy is the fact that a pair of prior and likelihood results in a closed form posterior, and we say that the prior is conjugate to the likelihood. updating posterior after observing new data is easier.
- Give a unified definition of the inference problem, and two instantiations: MLE and ERM
- Why is regularisation useful?
- What is the CDF of a random variable X?
- Explain Bayesian conjugacy in simple terms – give a couple examples
- Explain inverse transform sampling. Prove it works
Lecture 9
Leveraging the fact that geometric Area is the probability
Lecture 10
- What is the rejection sampling algorithm? How many steps on average are needed to produce one sample?
- State the elementary Monte Carlo identity.
- What is the variance of the Monte Carlo estimate of an integral?
- How can you sample from N(10, 5) if you only have a laptop able to sample N(−1, 1)?
- Stochasticity trick: how can you numerically evaluate the integral
- Explain what the symmetric Metropolis-Hastings algorithm is.
MAP vs MLE
There is no right answer! but Bayesians prefer MAP, since priors allow us to include our prior knowledge in the estimation. The two become equal for large data.
Specifically designed for large dimensions and a widely used special case of the when . Then the acceptance probability simplifies to 1
- Set
- For each in
- Set
The entropy of a probability distribution can be interpreted as a measure of uncertainty, or lack of predictability.
This is the expected number of needed to compress some data samples drawn from using a code based on distribution .
Seonglae Cho