Dia

Creator
Creator
Seonglae Cho
Created
Created
2025 Apr 26 16:6
Editor
Edited
Edited
2025 May 30 22:19
dia
nari-labsUpdated 2025 May 30 21:19
, No training script, just TTS model
(laughs), (clears throat), (sighs), (gasps), (coughs), (singing), (sings), (mumbles), (beep), (groans), (sniffs), (claps), (screams), (inhales), (exhales), (applause), (burps), (humming), (sneezes), (chuckle), (whistles)

Delay Pattern

A type of
Positional Embedding
used to distinguish individual channels. Audio is represented and generated as multiple channels (default 9) of code sequences rather than a single stream. For example, if the first channel generates information at time t, the second channel generates at t - delay[1]. When handling multi-channel audio codes, this defines the rules or mechanisms for how each channel references temporal information to generate the next code.
 
 
 
Documentation

Nari Labs

 
 

Recommendations