Hacker News new | ask | show | jobs
by hacker_9 2066 days ago
Because humans can do it
2 comments

No we can't.

In so far as we can understand people with different accents it's because we have been trained on them. Even if they are not common around us we've had some exposure, from occasional visitors, travels, or media. When we hear an accent we've really never been exposed to we aren't likely to understand it. A good example is foreign speakers trying to speak our native language... even if they've learned our language in school for years, their even slightly off pronunciation can make it very difficult to understand what they are saying.

Indeed. Or even native speakers speaking our native language. A fond memory of mine is having to translate a Glaswegian colleague when we were on business in Kentucky. By which I mean I basically just repeated what he said in my "standard" English accent for the benefit of a couple of locals we were talking to, because their reaction when he first said something was along the lines of "Hell, I didn't understand a word he just said!"
That is a well-known effect that has nothing to do with speech recognition:

https://www.youtube.com/watch?v=oLt5qSm9U80

But computers must have access to far more than a single (fluent) person's exposure to Glaswegian/Scots.
Yes, BUT that's not taken into account in these tests. It is given beforehand what you get to train on, and shall way say "there are known problems" on this front.

So the benchmarks say how well model X does on this exact transcription taks given this exact training data, and no other knowledge.

Even basic things, like female/male voices in train vs test set don't match.

I listened to some Glaswegian on YouTube and didn't understand much.