Conformer

Convolution-augmented Transformer for Speech Recognition (
STT)

https://huggingface.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents

Scaling Real-Time Voice Agents with Cache-Aware Streaming ASR

https://arxiv.org/pdf/2005.08100

Fast Conformer

Redesigned to perform 8× downsampling (10ms → 80ms) early in the input, using depthwise separable convolution for subsampling, with reduced channel count and kernel size

arxiv.org

https://arxiv.org/pdf/2305.05084

nvidia/nemotron-speech-streaming-en-0.6b · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/nvidia/nemotron-speech-streaming-en-0.6b

Conformer

Convolution-augmented Transformer for Speech Recognition (STT)

Fast Conformer

Recommendations

Convolution-augmented Transformer for Speech Recognition (
STT)