Diffusion Model

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2022 Aug 24 14:49
Editor
Edited
Edited
2025 Dec 16 19:19

Diffusion Probabilistic model (DPM), Variational Diffusion Model

Diffusion models progressively add Gaussian noise to data (forward process) and learn to reverse this process step-by-step (reverse process). At high noise levels, the model recovers coarse, low-frequency structure; as noise decreases, it progressively refines the sample by adding higher-frequency details.
Diffusion models generate images by adding noise and then reversing it. Since white noise has energy across all frequencies but natural images have most energy in low frequencies, high-frequency components lose signal-to-noise ratio (SNR) first, making them harder to recover at early denoising steps.
Gradually add Gaussian noise with
Markov Chain
to model increasing noise process. Generate images by sampling from Gaussian noise. After that, the model learns to reverse noise into images by denoising process with modeling noise distribution. Due to Explicit likelihood Modeling, it solves the drawback of
GAN
where it covers less of the generation space.

Marginalization

Forward process

which means
When we consider it as a form of
Fourier Transform
, we can interpret as a
Linear Combination
of and
Gaussian Noise

Reverse process with

variational posterior
Reverse
KL Divergence
for model seeking and to generate more expressive data. The model learns the reverse transition by matching the variational posterior.
is the neural network we are training.
When it is normal distribution

Reparameterization trick

We compute KL of forward process and reverse process to get the Loss. To define the loss, the problem is reparameterized to predict the noise at step t rather than the structure, which empirically demonstrated to improve performance. Also, we leverage forward process's sampled reparameterized to ensure the variational posterior differentiable.
  • is often fixed at 1 regardless of the step
  • is a neural network instead of

Network Architecture

UNet
like CNN image to image model is used as the reverse denoiser, often with
Attention Mechanism
during later compression/decompression stages such as
Cross-Attention
and image patches. Diffusion uses
Positional Embedding
for each time step. which prevents effective extrapolation like transformer do not.

Integral

The statement "denoising = integration" refers to the continuous-time perspective, where diffusion/flow-based generation is formalized as following the solution of a differential equation that continuously moves the state from noise to data. Rather than "removing noise all at once," it involves accumulating "small denoising actions" tens to thousands of times (through integral approximation) to create the final image.
Classic
DDPM
is a "discrete-time"
Markov Chain
that repeats denoising steps (step-by-step updates rather than integration), but from a continuous-time perspective, it ultimately connects to numerically solving the reverse
SDE
.
DDIM
/ probability-flow ODE / flow matching / rectified flow approaches explicitly take the form of directly integrating an ODE from the start.
Diffusion Model Notion
 
 
 
Diffusion Model Usages
 
 
 
Diffusion Models
 
 
 
 

Tutorial

smalldiffusion
yuanchenyangUpdated 2025 Dec 15 19:36

Through noise prediction, we can mathematically prove that the denoiser can be viewed as an "approximate projection" onto the data manifold, equivalent to the gradient of a smoothed distance function (
Moreau envelope
). Gradient of a smoothed distance function to the manifold is equivalent as denoiser output, as a metaphor, trained denoiser generates force vectors that gradually bend towards the data manifold.
Then,
DDIM
can be interpreted as gradient descent, combining momentum and
DDPM
pddtechniques to improve convergence speed and image quality.
 
 

Recommendations