Diffusion Model Theory

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Jan 7 14:7
Editor
Edited
Edited
2026 Jan 15 17:52
Refs
Refs

Marginalization

Forward process

which means
When we consider it as a form of
Fourier Transform
, we can interpret as a
Linear Combination
of and
Gaussian Noise

Reverse process with

variational posterior
Reverse
KL Divergence
for model seeking and to generate more expressive data. The model learns the reverse transition by matching the variational posterior.
is the neural network we are training.
When it is normal distribution

Reparameterization trick

We compute KL of forward process and reverse process to get the Loss. To define the loss, the problem is reparameterized to predict the noise at step t rather than the structure, which empirically demonstrated to improve performance. Also, we leverage forward process's sampled reparameterized to ensure the variational posterior differentiable.
  • is often fixed at 1 regardless of the step
  • is a neural network instead of

ODE Solver

SDE Solver

Flow Matching
with
ODE
→ "following a map and driving in a consistent direction" while
Diffusion Model
with
SDE
→ "following a map, but random wind blows at each segment". Solver: the rule that determines how often and how precisely to apply steering in reverse direction (deterministic for ODE, stochastic for SDE). Note that diffusion models can also be sampled using ODE solvers (e.g., probability flow ODE).
 
 
 

flow-GRPO , NeurIPS 2025

Flow Matching
is ODE-based, making it deterministic (lacking sample diversity), while RL requires stochastic exploration. Additionally, RL data collection is expensive (many denoising steps), making it inefficient. By replacing the
ODE
sampler with an
SDE
that maintains the same marginal distribution, noise is injected (enabling exploration). This makes the policy at each step Gaussian, allowing to be calculated in closed form.
During RL training, even with significantly reduced denoising steps (e.g., T=10), the reward signal is sufficient for effective learning. At inference time, the original steps (e.g., T=40) are restored to maintain final quality → reducing sampling cost by 4×+. SD3.5-M improved significantly on GenEval from 63% → 95%, and text rendering from 59% → 92%. Adding KL regularization suppresses (quality/diversity collapse) while maintaining performance gains.
 
 
 
 

Recommendations