Convolution-augmented Transformer for Speech Recognition (STT)
Fast Conformer
Redesigned to perform 8× downsampling (10ms → 80ms) early in the input, using depthwise separable convolution for subsampling, with reduced channel count and kernel size
Seonglae Cho
Seonglae Cho