Delay Pattern

The RVQ (Residual Vector Quantization) tokenizer needs to predict N codebook stages sequentially.

A type of

Positional Embedding used to distinguish individual channels. Audio is represented and generated as multiple channels (default 9) of code sequences rather than a single stream. For example, if the first channel generates information at time t, the second channel generates at t - delay[1]. When handling multi-channel audio codes, this defines the rules or mechanisms for how each channel references temporal information to generate the next code.

Delay Pattern

Backlinks

Recommendations