Hacker News new | ask | show | jobs
by lucasoshiro 341 days ago
> excessive writing marks

In English I need to find how each word is pronounced individually. What the hell is the difference between "men" and "man"? What's the difference between "bitch" and "beach"? Why "though" sounds closer to "throw" than "through" or "thought"? Those differences are encoded in a unclear way that there are more exceptions than rules.

Portuguese (my native language) is not perfect in that sense, but at least it has more rules than exceptions. Part of that is because we use the diacritic marks.

Then, I prefer excessive writing marks than excessive unclear special cases

7 comments

Rules exist, but most are never taught and instead only learned through exposure. It's why "ghoti" is a trick - you have to break several rules of English pronunciation to get "fish" out of that.

Here's a page where someone tried to reconstruct as many of those rules as possible: https://www.zompist.com/spell.html - obviously it can't eliminate all exceptions but it does surprisingly well.

Rules 6-8 are relevant to one of your examples, including the explanation afterwards.

The complexity of these rules, and the number of exceptions that you need to learn notwithstanding the rules, can be roughly estimated for any given language by training a language model on word <-> IPA correspondence for that language (using a subset of the vocabulary as a training set), and then seeing how well it can predict the remaining words. You can run it in either direction, too, to separately measure the difficulty of reading (word -> IPA) and writing (IPA -> word) that language.

This was actually done for a number of languages including English:

https://arxiv.org/abs/1912.13321

You can see how languages with true phonemic spellings tend to be in the >90% range on both reading and writing, with Esperanto at 99%. Spanish and German are in 60-80% range. English is dismal at ~30% for both, though, with only French and Chinese being harder to write, and all other languages tested being easier to read.

Nice!
I couldn't help to look and see if the company behind commercials that are burned into my brain from 40 years ago are still a thing, and lo, Hooked on Phonics is still going strong!

This page[1] walks through the basics of phonemic awareness that children need to learn via exposure & repetition in order to learn to apply that aural learning to reading.

It makes me wonder if a program like this, aimed at English-speaking children, might help those adults learning to speak & read English if they could put up with being addressed as if they were a child.

[1] https://www.hookedonphonics.com/reading/phonemic-awareness/

> how each word is pronounced individually. What the hell is the difference between "men" and "man"? What's the difference between "bitch" and "beach"?

From what I could easily research, Portuguese has a pretty wide variety of vowel sounds, but it still pales in comparison to the Germanic languages that English took from; and across the spectrum of English dialects and accents you can end up hearing pretty much anything vowel-like that the human voice apparatus can generate. The strength of the difference between "men" and "man" will depend on who's speaking, but it's generally less than Portuguese phonology can accommodate. The "e" sound here should be familiar; the "a" sound not so much. Spanish (and, say, Japanese) learners of English will have much the same problem, but more so; their natural "e" is a bit off.

(From what Wikipedia is telling me, many Brazilian Portuguese dialects will use the right /ɪ/ sound for "bitch" in unstressed syllables. But then, my local accent contrasts /ɪ/ with /i/ quite strongly.)

On the flip side, I struggled with pronouncing Dutch when I made a brief attempt to pick it up; the individual sounds are all straightforward enough, but certain combinations are really unnatural.

> What the hell is the difference between "men" and "man"? What's the difference between "bitch" and "beach"?

Those words all have completely different vowels in English; they're not irregular spellings. If you can't tell the difference, you probably just haven't listened to enough English or have said them incorrectly too much to tell the difference.

I think that's probably more because English uses etymological orthography.

So spelling rules are based on four distinct "primary" systems of phonics that can be used depending on whether the word or morpheme has a Germanic, Greek, Latin or French origin. (Yes I know French comes from Latin origin, but the spelling rules differ depending on whether the word was imported directly from Latin, or came in via Norman French.) And then the Germanic and French origin words can get even messier because their spelling was standardized before the Great Vowel Shift. And then whenever we take loanwords from other languages that use the Latin alphabet, we preserve that language's spelling. Which creates a whole mess of special cases where the spelling doesn't follow any of the regular phonetic rules.

If you look at languages where the writing system is famously difficult to learn, a common element they all share is etymological orthography.

>but the spelling rules differ depending on whether the word was imported directly from Latin, or came in via Norman French

In fact it can be even more complicated because in English the words can come from Norman dialects and "typical" French simultaneously. For example, warden and guardian come from the same word in Old French, the former is closer to how Normans pronounce it and the latter is closer to its modern French pronunciation.

How can writing marks help in this regard? I can imagine a language with both a lot of exceptions and writing marks.
In Portuguese, they indicate that a syllable is stressed and alternate ways to say the vowels. e.g. "país" is stressed in "i" and means "country", while "pais" is stressed in "a" and means "parents". Tilde (~) indicates that the vowel is nasal, e.g. the "ã" in "São Paulo" means that it sounds like the "u" in "sun"; the default sound of "a" in Portuguese is the same as in "car".
Accent marks give additional phonetic information.
because you know the stress syllable by looking at the word. take Desert and Dessert, do we say DES-ert or des-ERT. Also in portuguese, at least, I can know which "e" sound [1] each "e" in the word makes by knowing this (well, almost, but not completely, but much better than English.)

[1]: https://en.wikipedia.org/wiki/IPA_vowel_chart_with_audio

Maybe Jazz Emu is onto something: https://www.youtube.com/watch?v=zJ69ny57pR0
Do men/man and bitch/beach sound the same to you? I am kinda confused here, these words have distinct meanings and sounds.
> Do men/man and bitch/beach sound the same to you?

Not exactly the same, but I differentiate them more based on the context than in the pronunciation.

Giving an example for Portuguese that has about the same difference: "roupa de lá" (clothes from there) and "roupa de lã" (wool clothes). If you write them in Google Translate or similar you'll see the difference, which is very subtle for non-Portuguese speakers but sounds completely different to us.

Portuguese has a ton of such examples.

"O meu canto" can mean "My corner" and "My singing".

"Conselho" means "advice" and "Concelho" means "council".

"Aço" means steel, and "Asso" means "I roast".

All of these pairs sound exactly the same.