Hacker News new | ask | show | jobs
by lunixbochs 1741 days ago
Interesting, maybe instead of my proposed perplexity metric, we measure the difference in both utterance and per-word perplexity between ground truth and output with a strong language model? Ideally it's low - the language model should consider each predicted word to be "about as likely in context" as the closest ground truth words.

In other words, measure LM perplexity on the ground truth words, then on the predicted words, and minimize the difference in perplexities. Ideally with a general model like GPT2 or BERT or something that you aren't using anywhere in your actual ASR.

This may even be more tolerant of errors in the ground truth transcription than raw WER