Hacker News new | ask | show | jobs
by Pabloski80 930 days ago
For some languages (like Polish) it is nearly sufficient. I imagine for others it will basically suffice (perhapes e.g. Malay and Indonesian). Good luck with Chinese and other tonal languages, as well as Japanese (no word borders), Korean (dual and triple phonemes on a single grapheme and the order of phonemes on them), and Hebrew without nikud (little friendly smirk).

As of English, we come to funny question here, as in it the mapping from graphemes to phonemes is extremely irregular.

3 comments

> As of English, we come to funny question here, as in it the mapping from graphemes to phonemes is extremely irregular.

IME, it's mostly regular; there are patterns (pronunciation of "ou" in the middle of a word, "tion" at the end of a word, adding "ing" to a word), but there are also exceptions (plural of mouse vs plural of spouse).

Just by pattern recognition alone you'd likely cover 90% of common English words. Of course, that involves recognising the pattern: "lon" in "alone" and "along" is not the pattern, while "lo" is the pattern ("*[aeiou][^aeiou]e" is a pattern)

They way I've explained it to my toddler is by giving the middle vowel a different pronunciation when a word ends in a consonant followed by 'e' (lace, eke, fine, alone, mule) than when the 'e' is missing (can, ten, tin, don, pun).

I've also switched a little to teaching syllables: 'tr' and 'am' can be pieced together. So can 'tr' and 'ai' and 'n'.

Still using the distar alphabet though.

> IME, it's mostly regular; there are patterns (pronunciation of "ou" in the middle of a word, "tion" at the end of a word, adding "ing" to a word), but there are also exceptions (plural of mouse vs plural of spouse).

I’d like to give you a _tour_ of my _doubts_ _about_ this, but the _courier_ has just arrived with _four_ _doughnuts_ (and their _colour_ is popular in Britain)

Not to mention the _lower_ _tower_ I see across the street.

We get used to these things at an early age, but compared to many other languages, English is highly irregular.

I have the happy experience of seeing my son learning Japanese and English at the same time, and it’s bizarre how different it is. Learning the kana might be a bit harder, but now that he’s there he can read any word out loud correctly.

In English? Hoo boy. Knowing how you spell a word doesn’t help you at all with how it’s said. Then on top of that the names of the letters are completely different from how you sound them. It’s no surprise that phonics is something ‘important’ in the English language when the whole concept doesn’t exist elsewhere.

> I’d like to give you a _tour_ of my _doubts_ _about_ this, but the _courier_ has just arrived with _four_ _doughnuts_ (and their _colour_ is popular in Britain)

Right, those are the exceptions I mentioned. There's more, but even in that list, those are not singular exceptions, they're different patterns.

They only stop being patterns when, as you did, match the shortest subsequence and not the longest subsequence.

Even in the exception list, you have patterns: about and doubt rhyme. If you're Canadian, they also rhyme with dough. four and the 'cour' in courier rhyme. colour and honour rhyme.

If we're using regexes, for example, we match the longest subsequence, not the shortest, so "ough" is the pattern in "dough", not "ou". Then it rhymes with though, furlough.

The examples I gave, like `tion` as a suffix, should have been clear that I meant the matching the longest pattern (otherwise it would be matched as 'ti' and 'on').

> We get used to these things at an early age, but compared to many other languages, English is highly irregular.

Sure, but I didn't dispute that, I contended that 90%+ of common usage is pattern recognition, like doubling of consonants, words ending in `e`, or starting with `in`, etc.

An english reader encountering `shibboleth` for the first time will pronounce it correctly, and I claim that that is true for 90% of words in common english usage, because even the simplest words have differing patterns and so readers are forced to learn pattern recognition as a very basic and foundational part of english.

It is not as dire as phrase "English is highly irregular" would suggest. To my mind, a highly irregular language would have at least half the words following no pattern, for example rhyming "moot" with "dad". Examples of non-patterns like that are, to my knowledge, not in english.

I mean, you could claim that "caught" and "court" are pronounced exactly the same, and I'd point out that both are parts of larger patterns - 'taught', 'aught' and 'caught' are a pattern, while 'court', 'pour', 'rigour' are a different pattern, hence they are both examples of patterns, they just not in the same group of patterns.

Look at your final example - lower and tower: lower, grower, mower are all part of one pattern. tower, bower and shower are all also part of a pattern, but it's a different pattern to the previous pattern.

You will not, in english, easily spell a word that is not part of some pattern[1].

[1] Although, if you're up to the challenge, I welcome examples of spelling that is not part of any pattern ... :-)

Your “at least half the words” requirement is an strong English bias. I suspect no language is highly irregular by that requirement.

In many languages though, the irregulars are at single digit percentage - sometimes even zero.

And there are easily some that are not part of a pattern: “colonel” (pronounced “kernel”). American “herbal” (pronounced “erbal”), autophagy (with the emphasis on “to”, unlike any other word that starts with “auto”).

And there are ambiguous ones which in fact fit multiple patterns - e.g. “route”, british more like “flute”, American like “house”. Not to mention to-mate-o to-ma-to and either. And injured vs insured.

I don’t think anyone whose first language is regular (like German, or Japanese) would agree with your claim that English is not highly irregular.

If you need an order of magnitude more patterns to properly pronounce words (and you do) it’s a difference in quality, not just quantity.

TLDR: I agree with everything you said - on the spectrum of regularity, english is at the extreme end of irregular. The exceptions are words from other languages that are part of english. Pointing out that UK english differs from US english is not an example of irregularity. Someone who learned one of them, learned one of them.

============================

But, that being said, it still has mostly patterns. After all, we started this conversation with you throwing out examples of what you thought were non-regular words, which all turned out to be pattern-based anyway.

You had make multiple attempts to find a non-pattern word.

IOW, you are still learning patterns, mostly - you found 1 exception in colonel below; I offer 2 more with the words 'soldier' and 'lieutenant' (mostly to demonstrate that, yes, I agree with you that english has some non-regular words).

> And there are easily some that are not part of a pattern: “colonel” (pronounced “kernel”).

This is a good example of an actual non-regular word. All the other english words that are borrowed from other languages probably are each an example of a non-regular word (for example, rendezvous).

There's nothing you can do about this sort of thing. The only alternatives I can think of are:

1. Keep the language pure and not borrow any words from other languages,

2. Make up new words.

In this regard, borrowing seems to be the better option, with the result that non-regularity is introduced.

> American “herbal” (pronounced “erbal”),

Still a pattern: honor, homage, heir, all with silent 'h' for US english and and non-silent in UK english.

Even for something with a larger pronunciation difference, such as 'solder' ('sodder' vs 'solder'), 'sodder' still fits some pattern - a silent 'l' (yolk, salmon, walk, talk).

> autophagy (with the emphasis on “to”, unlike any other word that starts with “auto”).

Autonomy/Autonomous, Automaton. There may be more, but that's certainly a pattern.

> If you need an order of magnitude more patterns to properly pronounce words (and you do) it’s a difference in quality, not just quantity.

Speaking as someone who is bilingual, I don't think it's even the number of patterns that matter (for someone speaking a language, the difference between knowing 10 patterns and 100 patterns is negligible - ask any native english speaker if they have problems with communication with other english speakers).

For example, in Kanji, for common usage, you still need to memorise around 3000 patterns. Native english speakers get by on maybe 300 patterns.

The problem isn't the number, I think, it's the ambiguity: which pattern to use for a specific word. It's still only a few patterns compared to a highly regular language like Kanji, but the ambiguity means that a little native language knowledge is necessary to determine the specific pattern.

Anyway, I think we've both said enough on this topic, so Cheers :-)

I think my main issue is with your choice of the word "pattern", as something you can match against .... Because the patterns are sometimes the entire word (lower vs. tower, cough vs dough). That's not the meaning I usually associate with pattern (in the context of pattern matching).

If you had used "classes", I probably wouldn't have bothered responding in the first place... "cough" falls in the same class "rough", and "dough" does not. And those classes each match a terser pattern ("ough"). But having matches the terser pattern, you are not better off knowing how to pronounce it than knowing the entire word.

Thanks for an interesting discussion, and cheers !

Yeah, I honestly can't think of any language where spelling is as difficult as English. Japanese: the 2 phonetic alphabets (hiragana and katakana) are extremely easy to learn IMO, nearly all letters are either vowels or a consonant-vowel pair. Russian: very phonetic, there are some exceptions to how things are pronounced but the rules are very regular. Even French, which has tons of silent letters and heck, silent syllables, is very regular, and I nearly always know how to pronounce something if I read it.
For Japanese, the choose to talk only about the phonetic system is doing a lot of work, as it is (mostly) just used for grammatical purposes, with most content words being mostly or entirely composed of the non-phonetic Kanji. A typical Japanese student is still learning to write until high school. Presumably, Chinese is similarly difficult for the same reason, but I'm not familiar with that.

Hebrew's writing system has vowels, but they are typically omitted.

Are phonemes relevant when reading?
Was extremely important for someone who learned as a native to read Polish around age of three. First part was learning a mapping grapheme to phoneme for single symbol. Then came a second part - for a day or three picking out that symbol from raw text of newspapers, trying to pronounce its sound every time it was found. Then as more and more symbols were learned, looking up one symbol for a while, then another for some time, then when nearing learning a whole alphabet mixing the patterns to lookup wildly. After memorizing some 20/26 symbols, learning the connecting the sounds, and you could read simple words by then. But again polish is extremely regular language in that sense but a very few of irregular short symbol sequences.
Ok, what I really meant was "are phonemes required for reading?" given that was what was implied by the above thread.

Deaf people learn how to read (in every language), so clearly they are not required.

Yes. Subvocalisation is a fundamental part of reading. Even in scripts that (ostensibly) have no ruleset for phonemic mapping (e.g. Chinese), written language has a "voice" for accomplished readers.