Hacker News new | ask | show | jobs
by woodson 5891 days ago
The concept of phonemes isn't undisputed either. When analyzing actual speech it becomes clear that there are no real steady states, but much coarticulation between the "segments". Of course, part of it could be attributed to the fact that speech sounds are produced by articulatory gestures, which necessarily overlap in time. On the other hand, these coarticulation patterns are not language-independent. So, a purely (articulatory/auditory) phonetical explanation of why these differences exists is rather unlikely.. I know this seems rather off-topic with regard to speech recognition, but the question of the basic building blocks of language is kind of at the heart of the problem.
1 comments

I agree that its at the heart of it (and I'm presently writing a paper where I'm using articulatory-phonetic features rather than phonemes). Unfortunately, there is no large-vocabulary speech recognizer that uses articulatory phonetics (yet!). Every large scale speech recognizer and most small scale use phonemes and are trained using speech that has been transcribed into phonemes. There is almost no data that is annotated with articulatory phonetics (a problem I'm working on right now).
I guess that's in part because it's even more difficult to (manually) transcribe speech into articulatory-phonetic elements based on the acoustic signal (laryngeal gestures?? Clearly they are there in articulation, but their acoustic correlates are masked to some extent).

Automatic alignment methods are probably quite hard to implement, given the various coarticulation patterns in the signal depending on context/prosodic position etc.

Could you provide a link to papers or other materials dealing with articulatory features in speech recognition?

I guess I should take another look at Browman/Goldstein's Articulatory Phonology