COMP0187 Final

midterm 이후 범위 Lecture 5-10

I strongly encourage you to work on the slides and the problem sheets from the TA sessions, make sure you understand all concepts and are familiar with all important derivations. Bear in mind that exams are designed to assess whether you understood what happened in the module (threshold for passing the module) -- the more fluent you are with the concepts and formulas, the higher your final mark.

Lecture 5

Independent implies uncorrelated: True

Independent implies correlated: False

Dependent implies uncorrelated: False

Dependent implies correlated: False

Correlated implies dependent: True

Correlated implies independent: False

Uncorrelated implies dependent: False

Uncorrelated implies independent: False

Transfer Formula
Law of the Unconscious Statistician

If has

Probability Density Function and is any function for which the integral exists, then

Marginalization

Multivariate Gaussian

Lecture 6

Law of the Unconscious Statistician

Lecture 7

Graphical Model

In a directed graph, if a node sends an arrow, it goes into the conditional part, and if it receives an arrow, it goes into the marginal part. Multiplication is done as many times as the number of nodes. When , and are predecessors of , while is not a parent of (only B is). In other words, .

where denotes the number of nodes in the graph. For example when , .

where denotes the set of

Cliques, and each factor is a non-negative function over the clique . Note that is called the partition function. For example when , .

Without

Collider, every directional graphical model is identical.

Note that the direction of arrows indicates causality, not the direction of inference

The law of Large number

If variable is sum not average, the variance and mean is multiplied by the same sample count

Central Limit Theorem

MLE

Bernoulli distribution of
Log-likelihood function for coin toss

MLE of that is ~

Give the MLE for the coin toss problem

Lecture 8

use the posterior distribution to represent that uncertainty unlike above only estimate optimal points.

Lets start with the

Posterior Predictive Distribution can be represented as

We can use above Joint distribution

Assuming of and given

And then we got the first form

Conjugacy is the fact that a pair of prior and likelihood results in a closed form posterior, and we say that the prior is conjugate to the likelihood. updating posterior after observing new data is easier.

Give a unified definition of the inference problem, and two instantiations: MLE and ERM

Why is regularisation useful?

What is the CDF of a random variable X?

Explain Bayesian conjugacy in simple terms – give a couple examples

Explain inverse transform sampling. Prove it works

Lecture 9

Leveraging the fact that geometric Area is the probability

Lecture 10

What is the rejection sampling algorithm? How many steps on average are needed to produce one sample?

State the elementary Monte Carlo identity.

What is the variance of the Monte Carlo estimate of an integral?

How can you sample from N(10, 5) if you only have a laptop able to sample N(−1, 1)?

Stochasticity trick: how can you numerically evaluate the integral

Explain what the symmetric Metropolis-Hastings algorithm is.

MAP vs MLE

There is no right answer! but Bayesians prefer MAP, since priors allow us to include our prior knowledge in the estimation. The two become equal for large data.

Specifically designed for large dimensions and a widely used special case of the when . Then the acceptance probability simplifies to 1

For each in

The entropy of a probability distribution can be interpreted as a measure of uncertainty, or lack of predictability.

This is the expected number of needed to compress some data samples drawn from using a code based on distribution .

KL Divergence