MatTransformer

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jul 4 22:46
Editor
Edited
Edited
2025 Jul 4 22:50
Refs
MatTransformer introduces a 'matryoshka-like' nested structure to the Feed-Forward Network (FFN), packing multiple sub-models of different sizes within a single large model. During training, it randomly samples different FFN sizes (e.g., 0.5×, 1×, 2×, 4×) at each step for simultaneous optimization. Using Mix'n'Match, different layer sizes can be combined to extract hundreds of new sub-models.
This enables Elastic Inference: selecting the optimal model size on-the-fly based on latency and cost constraints. The shared weights ensure inference consistency, reducing prediction variance between small and large models, which enhances
Speculative Decoding
speed. Without separate compression or teacher models, it provides "multiple optimal models from a single training session", enabling flexible deployment from mobile devices to large clusters.
 
 
 
 
2023
 
 

Recommendations