Hacker News new | ask | show | jobs
by natch 2691 days ago
LOL! You are asking the right person, because I used to work on both Chinese and English speech recognition systems, including the first large vocabulary continuous speech recognition system to deal well with Chinese tones. I can say they are essentially the same phenomenon under the hood, although linguists haven't grappled with this reality yet apparently.

However, I don't have any more evidence than you do, just my assertions to yours. So I'll wrap up with a fitting quote from Frederick Jelinek: "Every time I fire a linguist, the performance of the speech recognizer goes up."

1 comments

Yeah, so I work a bit on both sides (though not on sound stuff), on theoretical things and on getting algorithms to do useful things with language, and so (to follow through with the Jelinek quote) I will point out that getting the performance of your speech recogniser to go up doesn't mean that you gain any understanding of the underlying phenomenon.

So Mandarin lexical tone and English lexical stress are quite clearly functionally equivalent in many ways, and I certainly would be unsurprised if an ML algorithm treated them as representationally similar. But that's still different from English stress and Mandarin tone being the same phenomenon in phonetic terms --- again, in terms of the actual acoustic signal.

Again, stress is different. In the contract example the all-caps is the stress. The other part, which we resort to accents to indicate, is what in Chinese is called tone.
Okay - it's confusing because what you're using accents to indicate is usually referred to as 'stress' in English, and what you're using all-caps for is usually 'emphasis' or 'focus prosody' or the like. I'm very interested in the ALLCAPS phenomenon, but I think it's largely irrelevant here (other than being acoustically similar to Mandarin tone).

So, the áccent thing, is acoustically different from Mandarin tone. So, using the digest example:

a. dígest (such as a compilation of summaries)

b. digést (such as a creature processing food to extract nutrition from it)

The (a) one is usually realised as /ˈdaɪdʒɛst/, while the (b) one as /dəˈdʒɛst/. So not only does the first syllable in (a) have a different vowel than in (b), but first syllable in (b) will have a drastically shorter duration than the first syllable of (a). These acoustic correlates in English are very different from what occurs with tone in Mandarin, which doesn't affect syllable duration or vowel quality in the same fashion. You can visualise the acoustic wave-forms of the accent-thing in English and compare it against the acoustic wave-forms of the tone-thing in Mandarin and see that they involve different acoustic properties. (So no need for any linguistic theory etc.)

I guess you’re not well versed with mandarin. Those accents are tone marks, at least in the context of talking about Chinese tones or tones akin to Chinese tones.

But... you introduced the accents first in this conversation! So you get to decide what you meant by them.

I guess we can’t communicate since were using a different language. Learn how pinyin works with tone markers and then we can talk.