Hacker News new | ask | show | jobs
by stephen_g 606 days ago
Very true - and every demonstration of “English is hard to spell/pronounce” focuses directly on the exceptions which exaggerates the problem. One analysis I’ve seen puts it that with a single set of rules, 59% of a sample corpus of 5000 English words can be pronounced perfectly from the spelling (of course, there will be regional accent and dialect differences so that percentage will be a bit different for each one) and up to 85% can be pretty close with only slight errors.

Then there’s a percentage where they’re just direct borrowings from other languages and you need to have an idea of how that language pronounces words (especially French), so really only 10-15% or so of English words end up being true exceptions.

1. https://www.zompist.com/spell.html

5 comments

> a single set of rules, 59% of a sample corpus of 5000 English words can be pronounced perfectly from the spelling

To do this you need to know 56(!) rules.

I think this actually demonstrates how complex English pronunciation actually is.

And you still only get 59% of the way to the correct pronunciation.

As a non native speaker of English, and a native speaker of a phonetic language, I strongly object to the notion that it's easy to guess English word pronunciation by just reading it.

And that's another reason why there are so many English speakers who don't know how to read properly. It is so much harder to read compared to more sensible languages line German (and many others).
Those numbers are very bad, given that proper phonemic orthographies can give you a 90+% confidence with far fewer rules.

There's a simple and consistent way to compare languages in this way too, too: train a neural net to map spelling to pronunciation on one half of the dictionary, then test it on the other half. The more complicated and less consistent the orthography is, the more mistakes it'll make. People have in fact done this exact experiment, and English scores extremely poorly in it; for spelling, closer to Chinese, in fact, than many other European languages: https://aclanthology.org/2021.sigtyp-1.1/

Maybe it's the right time to once again quote this poem :

https://jochenenglish.de/misc/dearest_creature.pdf

The joy of English pronunciation

George Nolst Trenit´e (1870–1946)

1 The text

Dearest creature in creation

Studying English pronunciation,

I will teach you in my verse

Sounds like corpse, corps, horse and worse.

I will keep you, Susy, busy,

Make your head with heat grow dizzy;

Tear in eye, your dress you’ll tear;

Queer, fair seer, hear my prayer.

Pray, console your loving poet,

Make my coat look new, dear, sew it!

Just compare heart, hear and heard,

Dies and diet, lord and word.

Sword and sward, retain and Britain

(Mind the latter how it’s written).

Made has not the sound of bade,

Say—said, pay—paid, laid but plaid.

Now I surely will not plague you

With such words as vague and ague,

But be careful how you speak,

Say: gush, bush, steak, streak, break, bleak,

Previous, precious, fuchsia, via,

Recipe, pipe, studding-sail, choir;

Woven, oven, how and low,

Script, receipt, shoe, poem, toe.

Say, expecting fraud and trickery:

1

Daughter, laughter and Terpsichore,

Branch, ranch, measles, topsails, aisles,

Missiles, similes, reviles.

... (7 pages of pain follow) ...

and the the Oxford and US pronunciation (at the time, it has changed since) in phonetic.

Huge difference is: English is pretty much THE language that you can butcher and still have people perfectly understand (and hopefully politely correct) you. Even other European (stay mad) languages don't hold up to just how flexible English is in this regard.
Well yes, that's (I believe) the reason English actually works as an international language, despite being horrible in so many respects (pronunciation, tons of exceptions, etc etc): It also has so much redundancy that even if you get all the grammar wrong the meaning is still there. "I is strongs". When someone knows a tiny bit of English it's often easier to communicate in English than in that person's language, even if you're studying said language. Unfortunately, kind of, but that's how it is.
Yeah exactly. "Me arms big power" would make me go "Oh yeah you do have mighty biceps my dude".

And to the latter point I got that all the time in Japan, but I think main reasons are: they wanna practice, but even more they wanna practice with a native English speaker bc it's a novel experience for em!

Oh hurrah, I think that link is what I've been looking for for nearly a decade. I ran across it, or something like it, a long time ago and could never find it again. I don't remember all the special syntax, I think the one I found was written more in plain English with more examples (and I don't think the one I found back then mentioned ghoti either), but can't be sure it's been so long - maybe it was just that page and I don't remember it. It does have around the same number of rules I remember though.
This is satire, right? 56 rules to get 59% correct pronunciation on a corpus of 5000 words? And these rules don't even include the base sounds - it doesn't tell you how to actually pronounce "m", or "e". So in fact there are more than 70 rules required to get to a base pronunciation (you need to add at least one rule for each letter).