| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mstoehr 5884 days ago
	Most commenters are focusing on relatively high level features of decoding speech. It is important to also be aware that there is still great debate about what are the acoustic correlates of linguistic events in speech. It seems that our words are composed of subunits (usually taken to be phones out of the IPA--but there's work on alternatives) but what exact acoustics correspond to the phones are is still unsettled: lots of debate and mediocre recognition performance Undoubtedly there is much room for improvement on these higher-level features but computers are still well behind humans in large vocabulary isolated keyword spotting: this is a task where one word from a very large corpus of words is spoken and the human or computer has to guess what that word was. Computers do poorly relative to humans (particularly in noise), which suggests that many of the mistakes that computers make is in not being able to interpret the acoustics correctly.

1 comments

algorias 5884 days ago

And to make this problem worse, there's a great deal of variation among different dialects of the same language, which means that there's no 1:1 mapping of IPA phones to written letters.