Classifier-free guidance
A parameter used during the noise removal sampling process to control how closely the model follows the prompt, effectively increasing the generation quality and text prompt fidelity
Guiding a Diffusion Model with a Bad Version of Itself
The primary axes of interest in image-generating diffusion models are image quality, the amount of variation in the results, and how well the results align with a given condition, e.g., a class...
https://arxiv.org/abs/2406.02507

Reinterpret the Classifier-Free Guidance (CFG) mechanism as an RL-style policy-improvement operator, so that at test time you can control the degree of policy optimality via CFGRL. Using , this provides a theoretical and practical bridge that “adds RL-style optimization benefits for free” to existing supervised generative models by revealing a direct connection between CFG and RL’s Policy Improvement.
If we define an improved policy $\pi(a|s)$ as the prior policy multiplied by a monotone increasing function of the advantage , then we can sample policies by tuning the guidance weight . Concretely, , where is the noise predictor, denotes the unconditional condition, denotes the optimal condition, and is a scalar controlling the improvement strength. This can be applied to goal-conditioned behavioral cloning (GCBC) without learning a value function, immediately improving the policy toward higher goal-achievement probability.
Diffusion Guidance Is a Controllable Policy Improvement Operator
At the core of reinforcement learning is the idea of learning beyond the performance in the data. However, scaling such systems has proven notoriously tricky. In contrast, techniques from...
https://arxiv.org/abs/2505.23458


Seonglae Cho