Convolution-augmented Transformer for Speech Recognition (STT)
Scaling Real-Time Voice Agents with Cache-Aware Streaming ASR
A Blog post by NVIDIA on Hugging Face
https://huggingface.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents
Fast Conformer
Redesigned to perform 8× downsampling (10ms → 80ms) early in the input, using depthwise separable convolution for subsampling, with reduced channel count and kernel size
nvidia/nemotron-speech-streaming-en-0.6b · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/nvidia/nemotron-speech-streaming-en-0.6b

Seonglae Cho