|
|
|
|
|
by blululu
1740 days ago
|
|
Feels like a lot of the counter examples listed involve contractions and conjugation errors. Saying 'like' and 'liked' are different words is a strong interpretation. Similarly, 'I am' and 'I'm' are really not distinct words so counting that toward an error rate is a bit too literal. The objections could be solved by a decent parser. That said, weighting insertions and deletions equally is clearly a problem. Certain words ought to have more weight in a model. Weighting words by something like 1/log(frequency) might be a good start since less common words tend to be more important for meaning. |
|
In other words, measure LM perplexity on the ground truth words, then on the predicted words, and minimize the difference in perplexities. Ideally with a general model like GPT2 or BERT or something that you aren't using anywhere in your actual ASR.
This may even be more tolerant of errors in the ground truth transcription than raw WER