CTC

Connectionist Temporal Classification

A training and modeling approach designed to learn sequences of different lengths without alignment labels, such as "speech frame sequences (long) → text (short)" where two sequences have different lengths

Strong alignment capability: Learns "which frame corresponds to which character" without explicit alignment labels by summing over all possible alignments.

Stable and parallelizable training: Easy to compute frame-by-frame (linear + softmax on top of encoder output).

However, there are drawbacks: Since each frame prediction is treated almost independently (though the encoder does see context), linguistic context utilization is weak, making it less capable than encoder-decoder models at correcting awkward spelling/word errors.

CTC

Connectionist Temporal Classification

Recommendations