|
|
|
|
|
by mstoehr
5837 days ago
|
|
Most commenters are focusing on relatively high level features of decoding speech. It is important to also be aware that there is still great debate about what are the acoustic correlates of linguistic events in speech. It seems that our words are composed of subunits (usually taken to be phones out of the IPA--but there's work on alternatives) but what exact acoustics correspond to the phones are is still unsettled: lots of debate and mediocre recognition performance Undoubtedly there is much room for improvement on these higher-level features but computers are still well behind humans in large vocabulary isolated keyword spotting: this is a task where one word from a very large corpus of words is spoken and the human or computer has to guess what that word was. Computers do poorly relative to humans (particularly in noise), which suggests that many of the mistakes that computers make is in not being able to interpret the acoustics correctly. |
|