Sesame AI

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Mar 4 1:22
Editor
Edited
Edited
2025 Jun 15 22:54
Refs
Refs

CSM

csm
SesameAILabsUpdated 2025 Jun 15 18:11
but no training code, actually just well-made
TTS

Compute amortization

To mitigate the limitations of the Delay Pattern, CSM introduces Compute Amortization. The backbone predicts the zeroth codebook (basic semantic information) for all frames, while the decoder learns to predict the remaining N-1 stages by sampling only random 1/16 frames. This enables fast learning with significantly reduced memory and computational burden without loss of voice quality. This approach is similar to how
RNN
limitations were addressed by making it an
Autoregressive Model
with
Next Token Prediction
.
 
 
 
 
Crossing the uncanny valley of conversational voice
At Sesame, our goal is to achieve “voice presence”—the magical quality that makes spoken interactions feel real, understood, and valued.
Crossing the uncanny valley of conversational voice
sesame/csm-1b · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
sesame/csm-1b · Hugging Face
 
 

Recommendations