Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Object/Audio AI/Voice AI/STT/ASR/
Conformer
Search

Conformer

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jul 4 22:41
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2026 Jan 9 1:44
Refs
Refs

Convolution-augmented Transformer for Speech Recognition (
STT
)

 
 
 
 
 
ASR
Scaling Real-Time Voice Agents with Cache-Aware Streaming ASR
A Blog post by NVIDIA on Hugging Face
Scaling Real-Time Voice Agents with  Cache-Aware Streaming ASR
https://huggingface.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents
Scaling Real-Time Voice Agents with  Cache-Aware Streaming ASR
arxiv.org
https://arxiv.org/pdf/2005.08100

Fast Conformer

Redesigned to perform 8× downsampling (10ms → 80ms) early in the input, using depthwise separable convolution for subsampling, with reduced channel count and kernel size
arxiv.org
https://arxiv.org/pdf/2305.05084
nvidia/nemotron-speech-streaming-en-0.6b · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
nvidia/nemotron-speech-streaming-en-0.6b · Hugging Face
https://huggingface.co/nvidia/nemotron-speech-streaming-en-0.6b
nvidia/nemotron-speech-streaming-en-0.6b · Hugging Face
 
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Object/Audio AI/Voice AI/STT/ASR/
Conformer
Copyright Seonglae Cho