|
|
|
|
|
by mstoehr
5885 days ago
|
|
Actually most research effort in speech is more on the language side rather than the signal processing of the speech signal. So I think many people have a similar intuition as yourself. Bear in mind though, that humans significantly outperform machines in tasks where isolated or streams of non-sense syllables are said: i.e.
"badagaka" is said and humans can pick out the syllables whereas computers can have a lot of difficulty (in noise in particular). Computers start approaching human performance most when there is a lot of linguistic context to an utterance. So it appears that humans are doing something other than using semantics. |
|
Another thing I keep wondering about is why so little emphasis is put on dialog. When humans don't understand something, they ask, or offer an interpretation and ask whether it's the right one.
Speech recognition systems don't seem to do that. They say "Sorry, I could not understand what you said. Please repeat". That's not very helpful for the computer of course. It should say: "Huh, Peas? Why would anyone rest in peas for heaven's sake??". Then the human could sharpen his SS and say "PeaCCCEE!!! not peas. I'm not talking about food, I'm talking about dying!".