Motivated by Jacobi Decoding
lookahead decoding generates multiple tokens in parallel unlike traditional autoregressive model, which generates tokens one by one


lookahead branch
The lookahead branch maintains a fixed-sized, 2D window to generate n-grams from the Jacobi iteration trajectory.
verification branch
n-grams whose first token matches the last input token are identified.
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding | LMSYS Org
<p><strong>TL;DR:</strong> We introduce <strong>lookahead decoding</strong>, a new, exact, and parallel decoding algorithm to accelerate LLM inference. Look...
https://lmsys.org/blog/2023-11-21-lookahead-decoding/


Seonglae Cho