Multi Token Prediction

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Jul 8 2:57
Editor
Edited
Edited
2025 May 5 23:57

MTP

Multiple head
https://arxiv.org/pdf/2412.19437v1
 

MTP Module

The residual output is sent to a mini single-layer transformer before the head to predict by adding one layer each after 2 tokens. The advantage is that it can reflect backpropagation by considering not only parallel next token prediction along with regular training, but also all relationships between multiple tokens.
 
 
 
 
arxiv.org
facebook/multi-token-prediction · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
facebook/multi-token-prediction · Hugging Face
 
 
 
 

Recommendations