Hacker News new | ask | show | jobs
by jturpin 1772 days ago
Wow you're right. This is conflicting as many of the words are not pronounced properly at all. Maybe it doesn't matter to the accuracy of the speech-to-text system, but it feels like training it with bad data.
2 comments

That's the point! When the postal service has to OCR mailing addresses, they need to do the messy scribbles more than the professionally printed labels.
That's fair, I'll have to think about that.
Different accents isn't bad data. Your vision of the world of "english is only spoken with an american accent" is what leads to horrendous speech recognition APIs, like Google's.

If your ML model can't handle multiple accents, it is worthless.

There's a difference between an accent and pronouncing words wrong. I would expect an English speech recognition system to handle the various accents there are in the world (the US has several accents of course), but it shouldn't handle incorrect pronunciation of syllables if it comes at the expense of recognizing clean data. If it doesn't come at its expense then I guess it's fine.
Unfortunately, there's always a trade-off. You want both quality data for your use case, but you also want lots of data so it generalizes well. Those are conflicting goals.

Fortunately, splitting models into separate accent-specialized variants and helping them out with language model training will often help in case the model doesn't cope well enough with the cognitive dissonance.

"english is only spoken with an american accent"

Which american accent?