This approach views the training process itself as a pushforward operation that moves distributions, defining the "drift" direction samples should move to train a 1-step generator. Traditional diffusion/flow: inference uses multiple steps to gradually move the distribution toward data (multi-step NFE).
Drifting Models: During training, the sample changes caused by model parameter updates are viewed as "drift," and the model is designed so that when drift becomes 0, the generation distribution q matches the data distribution p → inference is 1-step (1-NFE).
- For generated samples x=f(ϵ), a vector field V_{p,q}(x) is created that simultaneously attracts toward data samples (positive) y⁺∼p and repels from generated samples (negative) y⁻∼q.
- V_{p,q} is designed to be anti-symmetric so that when p=q, V automatically becomes 0 (equilibrium). Ablation studies show that breaking this anti-symmetry degrades performance.
- Training takes the form of stop-grad fixed-point regression that matches "current sample x" to "drifted target x+V" (without needing direct distribution derivatives).
"Instead of denoising multiple times during inference, use fixed-point training during training to make the drift—attracted to data and repelled by generated samples—converge to 0, enabling 1-step high-quality generation".
Generative Modeling via Drifting
arxiv.org
https://arxiv.org/pdf/2602.04770

Seonglae Cho