Diffusion Transformer

DiT

Omni

For image tokens apply diffusion loss for transformer omni

bros, DiT is wrong.it's mathematically wrong.it's formally wrong. there is something wrong with it pic.twitter.com/OQZ8IcQfnA— サメQCU (@sameQCU) August 17, 2025

https://x.com/sameQCU/status/1957223774094585872

Traditional DiT (=Diffusion Transformer) relies on outdated VAE encoders → low-dimensional (latent 4ch), complex structure, weak expressiveness. Instead of VAE, using pre-trained representation encoders (DINO, MAE, SigLIP, etc.) + lightweight decoder combination = Representation Autoencoder (RAE). L1+

GAN+

LPIPS loss

arxiv.org

https://arxiv.org/pdf/2510.11690

Diffusion Transformer

DiT

Omni

Backlinks

Recommendations