Free Transformer

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Oct 31 0:39
Editor
Edited
Edited
2025 Oct 31 0:48
Refs
Refs
CVAE
The middle layer KV is converted into a variational latent and injected so that the first half of the layers act as an encoder and the second half of the layers act as a decoder during training. The latent is positioned in the K, V pathway that changes "what to attend to," and when the latent is added to K, V, if the attention pattern itself is modified by the latent decision, the model can select different reasoning branches or decision paths according to the latent. (free)
Z_t is added to Key and Value via (projection): VAE is implemented with discrete categorical latent, and Z_t is a single discrete vector (one-hot, dimension C=2^H=65,536) to enforce high-level decisions preferentially in the representation.
 
 
 
 
 
 

Recommendations