Diffusion Language Model

So fast for simple tasks?

Diffusion Language Models

Bidirectional LM is a Single Text diffusion

BERT is just a Single Text Diffusion Step

A while back, Google DeepMind unveiled Gemini Diffusion, an experimental language model that generates text using diffusion. Unlike traditional GPT-style models that generate one word at a time, Gemini Diffusion creates whole blocks of text by refining random noise step-by-step. I read the paper Large Language Diffusion Models and was surprised to find that discrete language diffusion is just a generalization of masked language modeling (MLM), something we’ve been doing since 2018. The first thought I had was, “can we finetune a BERT-like model to do text generation?” I decided to try a quick proof of concept out of curiosity.

https://nathan.rs/posts/roberta-diffusion/

BERT is just a Single Text Diffusion Step

Diffusion Language Models are Super Data Learners

Better utilizes web data with incomplete causal structures, showed improved performance in iterative learning

Diffusion models are interesting

HN Discussion

https://rnikhil.com/2025/03/06/diffusion-models-eval

arxiv.org

https://arxiv.org/pdf/2502.09992

Block Diffusion: Interpolating Between Autoregressive and Diffusion...

Diffusion language models offer unique benefits over autoregressive models due to their potential for parallelized generation and controllability, yet they lag in likelihood modeling and are...

https://openreview.net/forum?id=tyEyYT267x

Inception Labs API

Inception Labs

We are leveraging diffusion technology to develop a new generation of LLMs. Our dLLMs are much faster and more efficient than traditional auto-regressive LLMs. And diffusion models are more accurate, controllable, and performant on multimodal tasks.

https://www.inceptionlabs.ai/news

Non-autoregressive LLMs like Diffusion/Flow models (DLLMs) learn the joint distribution of prompts and responses, allowing attackers to reverse-sample prompts from given a desired target response to quickly generate jailbreak prompts. This effectively converts expensive discrete prompt search into amortized inference.

Prompts generated on JailbreakBench have low perplexity (natural-sounding) and strong transferability. They transfer particularly well to robustly trained models (LAT, Circuit Breakers, etc.) and proprietary models (GPT-5). Using guidance further increases ASR. As DLLMs become more powerful, the threat of "natural" low-cost jailbreak generators may grow.

arxiv.org

https://arxiv.org/pdf/2511.00203

Diffusion Language Model

Bidirectional LM is a Single Text diffusion

Recommendations