Hacker News new | ask | show | jobs
by thaumasiotes 812 days ago
In this case we do know if the effect generalizes to other languages. It cannot fail to; the larynx, lips, tongue, and jaw are almost all there is. For example, vowels are conventionally defined by jaw position ("height"), tongue position ("frontness"), and lip configuration ("rounded" or not).

You might miss some things like creaky voice or ejectives, you'll probably miss aspiration, but all that does is give you a worst-case scenario analogous to a native speaker trying to understand someone with a foreign accent. Extremely high accuracy will be possible.

2 comments

This is a reasonable hypothesis but if only English has been studied then it would be unscientific to extrapolate at this time.
Sure, in the same sense that it would be "unscientific" to conclude that someone's amputated leg didn't regenerate by chance, because the sample size is only 1.

If you know how you're recognizing English, and you know that other languages do not differ from English in relevant ways, then you know you can recognize those other languages. Pretending you don't know something you do know is not scientific.

This seems like damned-either-way. If they had only tested English and asserted that it was universally applicable to all languages, it’s likely you (or someone else) would rightfully object that it’s annoying when English speakers assume that’s all there is.
That's not a similar claim. Anyone can be annoyed by anything; the idea that it's "unscientific" to state that a method of recognizing English by measuring the positions of the lips, tongue, and jaw alongside the activity of the larynx will apply to every other spoken language in the world is ludicrous on its face. It will, because those measurements capture nearly every dimension of phonetic variation that exists. No one could believe otherwise, except apparently for metabagel.
Is absolute belief in one’s one ability to estimate how every human language could possibly work terribly scientific?

Me, I like scoping claims to what is measured.

You say that like no one's ever bothered to measure what kinds of sounds can be used in human languages.

The opposite is the case; this is not a lightly studied field.

You don't know, though. You have a good working hypothesis and you can make reasoned predictions, but it remains untested. The core principle of science is that we test our hypotheses.
Other languages have different sounds which aren’t present in English.
So? They don't have sounds that are produced in a manner other than arranging the lips, tongue, and jaw.

(Actually, they do. So does English; I already mentioned aspiration. But those are minor elements.)

They're minor elements in English – and even then, you can construct sentences where the meaning changes based on aspiration.
Well, no, they're minor elements everywhere. You don't need to be able to capture every phonemic distinction in a language to get a near-perfect transcription, as witnessed by the fact that people understand foreign accents without difficulty. The much larger problem in understanding foreign speech is the odd word choices and lack of grammaticality, but those problems don't arise when you're transcribing native speech.

For some comparisons, think about the fact that Semitic languages are traditionally written without bothering to indicate the vowels, or that while modern English has a phonemic distinction between voiced and unvoiced fricatives, this has a very uneven correspondence to the same distinction as it exists in the writing system. In the case of the interdental fricatives, the writing system does not even contemplate a distinction. And there's nothing particularly problematic about this; if you delete all the voicing information from a stretch of English speech, it stays about as intelligible as it was before. (A voicing difference in stops is not even audible to English speakers. It's audible in fricatives, but no one is going to be confused.)

x1798DE captured my intent well. For example, tonal languages like Mandarin or Cantonese may be more difficult to decode if vocal cords aren’t vibrating, and languages with more phonemes that have both a voiced and unvoiced version might be more difficult. I still think decoding will be possible for general language, but that’s a hypothesis whereas I know it’s true for English.
> and languages with more phonemes that have both a voiced and unvoiced version might be more difficult.

I had the understanding that English is unusually rich in phonemes that occur in both a voiced and unvoiced version. But as I've mentioned sidethread, this just isn't very significant as far as transcribing English goes.

English has an almost full series of stop and fricative phonemes that exhibit voicing contrasts:

- Bilabial, alveolar, and velar stops /p, b, t, d, k, g/, though the distinction between /t/ and /d/ disappears intervocalically in American English. [In practice, English speakers differentiate these phonemes more by the contrast of aspiration than by the contrast of voicing.]

- Interdental, labiodental, alveolar, palatal, but generally not velar, fricatives /θ, ð, f, v, s, z, ʃ, ʒ/, along with palatal affricates /tʃ, dʒ/.

- Nasals and approximants are always voiced.

Compare a language like Mandarin Chinese, where there are between zero and one pairs of phonemes that contrast by voicing (the sound represented by pinyin "r" may be a voiced fricative otherwise equivalent to "sh", or it may be an approximant; there is no contrasting voiceless approximant), or Spanish, where only the stops feature this contrast.

What are the languages that have more voicing contrasts than English does? It would almost be necessary for such a language to distinguish between voiced and unvoiced vowels. (Some quick research suggests that Icelandic at least has a comparable number of voicing contrasts, but it is not obviously more than English and appears to be actively shrinking.)

> tonal languages like Mandarin or Cantonese may be more difficult to decode if vocal cords aren’t vibrating

More difficult, yes, but in the sense that decoding may take more computation, not that the error rate will go up.

Again, we can already observe that e.g. Mandarin speakers do not have trouble understanding text that carries no information about tone, nor do they have trouble understanding songs, where lexical tone is overridden by the melody of the song.

(What happens here depends what you mean. If you want to decode speech into pinyin with tone marks omitted, the lack of ability to measure tones will fail to be a problem by definition. If you want to decode into Chinese characters, you'll need a robust model of the language, at which point lack of tones will also fail to be a problem - the language model will cover for it. If you want to decode into pinyin with tone marks, you won't be able to do that without using a language model.)