MTA

Creator
Creator
Seonglae Cho
Created
Created
2025 Apr 10 20:2
Editor
Edited
Edited
2025 Apr 10 20:8
Refs
Refs

Multi-Token Attention

Attention-native
Convolutional Layer
good at tasks require searching for information within long contexts since the single token attention bottlenecks the amount of information used in the context
notion image

How 3d convolution

notion image
Multi-head Latent Attention
Unlike this, it is not more efficient, so without significant performance improvements when scaling, it has no market advantage
 
 
 
 
 
 

Recommendations