Language Modeling in a Sentence Representation Space
Since it utilizes the SONAR embedding space (frozen encoder), it is superficially independent of language and modality (since SONAR is a multimodal encoder). While it is fundamentally a Transformer, as a Diffusion-based LCM, it learns the conditional probability distribution of the next sentence embedding using a diffusion model.
Limitation
Since sentence embedding predictions involve too many possible sentence combinations, more training data and sophisticated modeling are needed to generate appropriate next sentences. This presents a limitation that requires expansion to both smaller and larger units beyond the sentence level. Additionally, it shares SONAR's limitations.