Multivariate Normal Distribution, Joint Normal Distribution It is a single distribution and GMM is multiple.
Each random variable normally distributed, at the same time joint multi-variable normally distributed
f μ , Σ ( x ) = 1 ( 2 π ) n / 2 ∣ Σ ∣ 1 / 2 exp ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) , X ∼ N ( μ , Σ ) f_{\mu, \Sigma}(x) = \frac{1}{(2\pi)^{n/2} |\Sigma|^{1/2}} \exp\left( -\frac{1}{2} (x - \mu)^T \Sigma^{-1} (x - \mu) \right) , X\sim \mathcal{N(\mu, \Sigma)} f μ , Σ ( x ) = ( 2 π ) n /2 ∣Σ ∣ 1/2 1 exp ( − 2 1 ( x − μ ) T Σ − 1 ( x − μ ) ) , X ∼ N ( μ , Σ ) ( X Y ) ∼ N ( ( μ X μ Y ) , ( σ X 2 Cov ( X , Y ) Cov ( X , Y ) σ Y 2 ) )
\begin{pmatrix}
X \\
Y
\end{pmatrix}
\sim \mathcal{N} \left(
\begin{pmatrix}
\mu_X \\
\mu_Y
\end{pmatrix},
\begin{pmatrix}
\sigma_X^2 & \text{Cov}(X,Y) \\
\text{Cov}(X,Y) & \sigma_Y^2
\end{pmatrix}
\right) ( X Y ) ∼ N ( ( μ X μ Y ) , ( σ X 2 Cov ( X , Y ) Cov ( X , Y ) σ Y 2 ) ) Marginals
p ( X 1 ) = N ( X 1 ∣ μ 1 , Σ 11 ) p ( X 2 ) = N ( X 2 ∣ μ 2 , Σ 22 ) p(X_1) = \mathcal{N}(X_1|\mu_1, \Sigma_{11})
\\
p(X_2) = \mathcal{N}(X_2|\mu_2, \Sigma_{22}) p ( X 1 ) = N ( X 1 ∣ μ 1 , Σ 11 ) p ( X 2 ) = N ( X 2 ∣ μ 2 , Σ 22 ) Posterior
p ( X 1 ∣ X 2 ) = N ( X 1 ∣ μ 1 ∣ 2 , Σ 1 ∣ 2 ) p(X_1|X_2) = \mathcal{N}(X_1|\mu_{1|2} , \Sigma_{1|2}) p ( X 1 ∣ X 2 ) = N ( X 1 ∣ μ 1∣2 , Σ 1∣2 ) where
Conditional Mean How the mean of
X 1 X_1 X 1 shifts when
X 2 X_2 X 2 is given
μ 1 ∣ 2 = μ 1 + Σ 12 Σ 22 − 1 ( X 2 − μ 2 ) \mu_{1|2} = \mu_1 + \Sigma_{12} \Sigma_{22}^{-1} (X_2 - \mu_2) μ 1∣2 = μ 1 + Σ 12 Σ 22 − 1 ( X 2 − μ 2 ) generalized form with minus except notation
μ i ∣ − i = μ i + Σ i , − i Σ − i , − i − 1 ( x − i − μ − i ) \mu_{i|-i} = \mu_i + \Sigma_{i, -i} \Sigma_{-i, -i}^{-1} (x_{-i} - \mu_{-i}) μ i ∣ − i = μ i + Σ i , − i Σ − i , − i − 1 ( x − i − μ − i ) generalized form with range notation
μ i ∣ 1 : i − 1 , i + 1 : n = μ i + Σ i , 1 : i − 1 , i + 1 : n Σ 1 : i − 1 , i + 1 : n , 1 : i − 1 , i + 1 : n − 1 ( x 1 : i − 1 , i + 1 : n − μ 1 : i − 1 , i + 1 : n ) \mu_{i|1{:}i-1, i+1{:}n} = \mu_i + \Sigma_{i, 1{:}i-1, i+1{:}n} \Sigma_{1{:}i-1, i+1{:}n, 1{:}i-1, i+1{:}n}^{-1} (x_{1{:}i-1, i+1{:}n} - \mu_{1{:}i-1, i+1{:}n}) μ i ∣1 : i − 1 , i + 1 : n = μ i + Σ i , 1 : i − 1 , i + 1 : n Σ 1 : i − 1 , i + 1 : n , 1 : i − 1 , i + 1 : n − 1 ( x 1 : i − 1 , i + 1 : n − μ 1 : i − 1 , i + 1 : n ) multi to multi with dimension analysis if we choose k variables for left side
μ A ∣ B = μ A + Σ A , B Σ B , B − 1 ( x B − μ B ) \mu_{A|B} = \mu_A + \Sigma_{A, B} \Sigma_{B, B}^{-1} (x_B - \mu_B) μ A ∣ B = μ A + Σ A , B Σ B , B − 1 ( x B − μ B ) μ A : k × 1 , Σ A , B : k × m , Σ B , B − 1 : m × m , μ A ∣ B : k × 1 \mu_A : k \times 1, \quad \Sigma_{A, B} : k \times m, \quad \Sigma_{B, B}^{-1} : m \times m, \quad \mu_{A|B} : k \times 1 μ A : k × 1 , Σ A , B : k × m , Σ B , B − 1 : m × m , μ A ∣ B : k × 1 Generalized version with a set notation
μ S ∣ S c = μ S + Σ S , S c Σ S c , S c − 1 ( X S c − μ S c ) \mu_{S \mid S^c} = \mu_S + \Sigma_{S, S^c} \Sigma_{S^c, S^c}^{-1} (X_{S^c} - \mu_{S^c}) μ S ∣ S c = μ S + Σ S , S c Σ S c , S c − 1 ( X S c − μ S c ) Conditional Variance Σ 1 ∣ 2 = Σ 11 − Σ 12 Σ 22 − 1 Σ 21 \Sigma_{1|2} = \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} Σ 1∣2 = Σ 11 − Σ 12 Σ 22 − 1 Σ 21 generalized form with minus except notation
Σ i ∣ − i = Σ i i − Σ i , − i Σ − i , − i − 1 Σ − i , i \Sigma_{i|-i} = \Sigma_{ii} - \Sigma_{i, -i} \Sigma_{-i, -i}^{-1} \Sigma_{-i, i} Σ i ∣ − i = Σ ii − Σ i , − i Σ − i , − i − 1 Σ − i , i generalized form with range notation
Σ i ∣ 1 : i − 1 , i + 1 : n = Σ i i − Σ i , 1 : i − 1 , i + 1 : n Σ 1 : i − 1 , i + 1 : n , 1 : i − 1 , i + 1 : n − 1 Σ 1 : i − 1 , i + 1 : n , i \Sigma_{i|1{:}i-1, i+1{:}n} = \Sigma_{ii} - \Sigma_{i, 1{:}i-1, i+1{:}n} \Sigma_{1{:}i-1, i+1{:}n, 1{:}i-1, i+1{:}n}^{-1} \Sigma_{1{:}i-1, i+1{:}n, i} Σ i ∣1 : i − 1 , i + 1 : n = Σ ii − Σ i , 1 : i − 1 , i + 1 : n Σ 1 : i − 1 , i + 1 : n , 1 : i − 1 , i + 1 : n − 1 Σ 1 : i − 1 , i + 1 : n , i multi to multi with dimension analysis if we choose k variables for left side
Σ A ∣ B = Σ A , A − Σ A , B Σ B , B − 1 Σ A , B ⊤ \Sigma_{A|B} = \Sigma_{A, A} - \Sigma_{A, B} \Sigma_{B, B}^{-1} \Sigma_{A, B}^\top Σ A ∣ B = Σ A , A − Σ A , B Σ B , B − 1 Σ A , B ⊤ Σ A , A : k × k , Σ A , B : k × m , Σ B , B − 1 : m × m , Σ A ∣ B : k × k \Sigma_{A, A} : k \times k, \quad \Sigma_{A, B} : k \times m, \quad \Sigma_{B, B}^{-1} : m \times m, \quad \Sigma_{A|B} : k \times k Σ A , A : k × k , Σ A , B : k × m , Σ B , B − 1 : m × m , Σ A ∣ B : k × k Generalized version with a set notation
Σ S ∣ S c = Σ S , S − Σ S , S c Σ S c , S c − 1 Σ S c , S \Sigma_{S \mid S^c} = \Sigma_{S, S} - \Sigma_{S, S^c} \Sigma_{S^c, S^c}^{-1} \Sigma_{S^c, S} Σ S ∣ S c = Σ S , S − Σ S , S c Σ S c , S c − 1 Σ S c , S Inverse matrix If the covariance matrix is singular, it means that the random variables are fully constrained or deterministic, not truly random. For example, if
X 3 = X 1 + X 2 X_3 = X_1 + X_2 X 3 = X 1 + X 2 , then the distribution of
( X 1 , X 2 , X 3 ) (X_1, X_2, X_3) ( X 1 , X 2 , X 3 ) would collapse into a plane, not a proper 3D Gaussian distribution.
2-variables p ( X 1 ∣ X 2 = x 2 ) = N ( μ 1 + ρ σ 1 σ 2 ( x 2 − μ 2 ) , ( 1 − ρ 2 ) σ 1 2 ) p ( X 2 ∣ X 1 = x 1 ) = N ( μ 2 + ρ σ 2 σ 1 ( x 1 − μ 1 ) , ( 1 − ρ 2 ) σ 2 2 ) p(X_1 \mid X_2 = x_2) = \mathcal{N}\left( \mu_1 + \rho \frac{\sigma_1}{\sigma_2}(x_2 - \mu_2),\ (1-\rho^2)\sigma_1^2 \right) \\p(X_2 \mid X_1 = x_1) = \mathcal{N}\left( \mu_2 + \rho \frac{\sigma_2}{\sigma_1}(x_1 - \mu_1),\ (1-\rho^2)\sigma_2^2 \right) p ( X 1 ∣ X 2 = x 2 ) = N ( μ 1 + ρ σ 2 σ 1 ( x 2 − μ 2 ) , ( 1 − ρ 2 ) σ 1 2 ) p ( X 2 ∣ X 1 = x 1 ) = N ( μ 2 + ρ σ 1 σ 2 ( x 1 − μ 1 ) , ( 1 − ρ 2 ) σ 2 2 ) 3-variables p ( X 1 ∣ X 2 = x 2 , X 3 = x 3 ) = N ( μ 1 + ( α 12 σ 3 2 − α 13 α 23 σ 2 2 σ 3 2 − α 23 2 α 13 σ 2 2 − α 12 α 23 σ 2 2 σ 3 2 − α 23 2 ) ⊤ ( x 2 − μ 2 x 3 − μ 3 ) , σ 1 2 − α 12 2 σ 3 2 − 2 α 12 α 13 α 23 + α 13 2 σ 2 2 σ 2 2 σ 3 2 − α 23 2 ) p(X_1 \mid X_2=x_2, X_3=x_3) = \mathcal{N}\left(
\mu_1 +
\begin{pmatrix}
\frac{\alpha_{12} \sigma_3^2 - \alpha_{13} \alpha_{23}}{\sigma_2^2 \sigma_3^2 - \alpha_{23}^2} \\
\frac{\alpha_{13} \sigma_2^2 - \alpha_{12} \alpha_{23}}{\sigma_2^2 \sigma_3^2 - \alpha_{23}^2}
\end{pmatrix}^\top
\begin{pmatrix}
x_2 - \mu_2 \\
x_3 - \mu_3
\end{pmatrix}
,\
\sigma_1^2 - \frac{\alpha_{12}^2 \sigma_3^2 - 2 \alpha_{12} \alpha_{13} \alpha_{23} + \alpha_{13}^2 \sigma_2^2}{\sigma_2^2 \sigma_3^2 - \alpha_{23}^2}
\right) p ( X 1 ∣ X 2 = x 2 , X 3 = x 3 ) = N μ 1 + σ 2 2 σ 3 2 − α 23 2 α 12 σ 3 2 − α 13 α 23 σ 2 2 σ 3 2 − α 23 2 α 13 σ 2 2 − α 12 α 23 ⊤ ( x 2 − μ 2 x 3 − μ 3 ) , σ 1 2 − σ 2 2 σ 3 2 − α 23 2 α 12 2 σ 3 2 − 2 α 12 α 13 α 23 + α 13 2 σ 2 2 p ( X 2 ∣ X 1 = x 1 , X 3 = x 3 ) = N ( μ 2 + ( α 12 σ 3 2 − α 23 α 13 σ 1 2 σ 3 2 − α 13 2 α 23 σ 1 2 − α 12 α 13 σ 1 2 σ 3 2 − α 13 2 ) ⊤ ( x 1 − μ 1 x 3 − μ 3 ) , σ 2 2 − α 12 2 σ 3 2 − 2 α 12 α 23 α 13 + α 23 2 σ 1 2 σ 1 2 σ 3 2 − α 13 2 ) p(X_2 \mid X_1=x_1, X_3=x_3) = \mathcal{N}\left(
\mu_2 +
\begin{pmatrix}
\frac{\alpha_{12} \sigma_3^2 - \alpha_{23} \alpha_{13}}{\sigma_1^2 \sigma_3^2 - \alpha_{13}^2} \\
\frac{\alpha_{23} \sigma_1^2 - \alpha_{12} \alpha_{13}}{\sigma_1^2 \sigma_3^2 - \alpha_{13}^2}
\end{pmatrix}^\top
\begin{pmatrix}
x_1 - \mu_1 \\
x_3 - \mu_3
\end{pmatrix}
,\
\sigma_2^2 - \frac{\alpha_{12}^2 \sigma_3^2 - 2 \alpha_{12} \alpha_{23} \alpha_{13} + \alpha_{23}^2 \sigma_1^2}{\sigma_1^2 \sigma_3^2 - \alpha_{13}^2}
\right) p ( X 2 ∣ X 1 = x 1 , X 3 = x 3 ) = N μ 2 + σ 1 2 σ 3 2 − α 13 2 α 12 σ 3 2 − α 23 α 13 σ 1 2 σ 3 2 − α 13 2 α 23 σ 1 2 − α 12 α 13 ⊤ ( x 1 − μ 1 x 3 − μ 3 ) , σ 2 2 − σ 1 2 σ 3 2 − α 13 2 α 12 2 σ 3 2 − 2 α 12 α 23 α 13 + α 23 2 σ 1 2 p ( X 3 ∣ X 1 = x 1 , X 2 = x 2 ) = N ( μ 3 + ( α 13 σ 2 2 − α 12 α 23 σ 1 2 σ 2 2 − α 12 2 α 23 σ 1 2 − α 12 α 13 σ 1 2 σ 2 2 − α 12 2 ) ⊤ ( x 1 − μ 1 x 2 − μ 2 ) , σ 3 2 − α 13 2 σ 2 2 − 2 α 13 α 23 α 12 + α 23 2 σ 1 2 σ 1 2 σ 2 2 − α 12 2 ) p(X_3 \mid X_1=x_1, X_2=x_2) = \mathcal{N}\left(
\mu_3 +
\begin{pmatrix}
\frac{\alpha_{13} \sigma_2^2 - \alpha_{12} \alpha_{23}}{\sigma_1^2 \sigma_2^2 - \alpha_{12}^2} \\
\frac{\alpha_{23} \sigma_1^2 - \alpha_{12} \alpha_{13}}{\sigma_1^2 \sigma_2^2 - \alpha_{12}^2}
\end{pmatrix}^\top
\begin{pmatrix}
x_1 - \mu_1 \\
x_2 - \mu_2
\end{pmatrix}
,\
\sigma_3^2 - \frac{\alpha_{13}^2 \sigma_2^2 - 2 \alpha_{13} \alpha_{23} \alpha_{12} + \alpha_{23}^2 \sigma_1^2}{\sigma_1^2 \sigma_2^2 - \alpha_{12}^2}
\right) p ( X 3 ∣ X 1 = x 1 , X 2 = x 2 ) = N μ 3 + σ 1 2 σ 2 2 − α 12 2 α 13 σ 2 2 − α 12 α 23 σ 1 2 σ 2 2 − α 12 2 α 23 σ 1 2 − α 12 α 13 ⊤ ( x 1 − μ 1 x 2 − μ 2 ) , σ 3 2 − σ 1 2 σ 2 2 − α 12 2 α 13 2 σ 2 2 − 2 α 13 α 23 α 12 + α 23 2 σ 1 2