| HN Mirror

You can study CTC in isolation, ignoring all the HMM background. That is how CTC was also originally introduced, by mostly ignoring any of the existing HMM literature. So e.g. look at the original CTC paper. But I think the distill.pub article (https://distill.pub/2017/ctc/) is also good.

For studying HMMs, any speech recognition lecture should cover that. We teach that at RWTH Aachen University but I don't think there are public recordings. But probably you should find some other lectures online somewhere.

You also find a lot of tutorials for Kaldi: https://kaldi-asr.org/

Maybe check this book: https://www.microsoft.com/en-us/research/publication/automat...

The relation of CTC and HMM becomes intuitively clear once you get the concept of HMMs. Often in terms of speech recognition, it is all formulated as finite state automata (FSA) (or finite state transducer (FST), or weighted FST (WFST)), and the CTC FST just looks a bit different (simpler) than the traditional HMMs, but in all cases, you can think about having states with possible transitions.

This is all mostly about the modeling. The training is more different. For CTC, you often calculate the log prob of the full sequence over all possible alignments directly, while for HMMs, people often use a fixed alignment, and calculate framewise cross entropy.

I did some research on the relation of CTC training and HMM training: https://www-i6.informatik.rwth-aachen.de/publications/downlo...

Maybe my PhD thesis gives a good overview: https://www-i6.informatik.rwth-aachen.de/publications/downlo...