Hacker News new | ask | show | jobs
by lelanthran 931 days ago
> I’d like to give you a _tour_ of my _doubts_ _about_ this, but the _courier_ has just arrived with _four_ _doughnuts_ (and their _colour_ is popular in Britain)

Right, those are the exceptions I mentioned. There's more, but even in that list, those are not singular exceptions, they're different patterns.

They only stop being patterns when, as you did, match the shortest subsequence and not the longest subsequence.

Even in the exception list, you have patterns: about and doubt rhyme. If you're Canadian, they also rhyme with dough. four and the 'cour' in courier rhyme. colour and honour rhyme.

If we're using regexes, for example, we match the longest subsequence, not the shortest, so "ough" is the pattern in "dough", not "ou". Then it rhymes with though, furlough.

The examples I gave, like `tion` as a suffix, should have been clear that I meant the matching the longest pattern (otherwise it would be matched as 'ti' and 'on').

> We get used to these things at an early age, but compared to many other languages, English is highly irregular.

Sure, but I didn't dispute that, I contended that 90%+ of common usage is pattern recognition, like doubling of consonants, words ending in `e`, or starting with `in`, etc.

An english reader encountering `shibboleth` for the first time will pronounce it correctly, and I claim that that is true for 90% of words in common english usage, because even the simplest words have differing patterns and so readers are forced to learn pattern recognition as a very basic and foundational part of english.

It is not as dire as phrase "English is highly irregular" would suggest. To my mind, a highly irregular language would have at least half the words following no pattern, for example rhyming "moot" with "dad". Examples of non-patterns like that are, to my knowledge, not in english.

I mean, you could claim that "caught" and "court" are pronounced exactly the same, and I'd point out that both are parts of larger patterns - 'taught', 'aught' and 'caught' are a pattern, while 'court', 'pour', 'rigour' are a different pattern, hence they are both examples of patterns, they just not in the same group of patterns.

Look at your final example - lower and tower: lower, grower, mower are all part of one pattern. tower, bower and shower are all also part of a pattern, but it's a different pattern to the previous pattern.

You will not, in english, easily spell a word that is not part of some pattern[1].

[1] Although, if you're up to the challenge, I welcome examples of spelling that is not part of any pattern ... :-)

1 comments

Your “at least half the words” requirement is an strong English bias. I suspect no language is highly irregular by that requirement.

In many languages though, the irregulars are at single digit percentage - sometimes even zero.

And there are easily some that are not part of a pattern: “colonel” (pronounced “kernel”). American “herbal” (pronounced “erbal”), autophagy (with the emphasis on “to”, unlike any other word that starts with “auto”).

And there are ambiguous ones which in fact fit multiple patterns - e.g. “route”, british more like “flute”, American like “house”. Not to mention to-mate-o to-ma-to and either. And injured vs insured.

I don’t think anyone whose first language is regular (like German, or Japanese) would agree with your claim that English is not highly irregular.

If you need an order of magnitude more patterns to properly pronounce words (and you do) it’s a difference in quality, not just quantity.

TLDR: I agree with everything you said - on the spectrum of regularity, english is at the extreme end of irregular. The exceptions are words from other languages that are part of english. Pointing out that UK english differs from US english is not an example of irregularity. Someone who learned one of them, learned one of them.

============================

But, that being said, it still has mostly patterns. After all, we started this conversation with you throwing out examples of what you thought were non-regular words, which all turned out to be pattern-based anyway.

You had make multiple attempts to find a non-pattern word.

IOW, you are still learning patterns, mostly - you found 1 exception in colonel below; I offer 2 more with the words 'soldier' and 'lieutenant' (mostly to demonstrate that, yes, I agree with you that english has some non-regular words).

> And there are easily some that are not part of a pattern: “colonel” (pronounced “kernel”).

This is a good example of an actual non-regular word. All the other english words that are borrowed from other languages probably are each an example of a non-regular word (for example, rendezvous).

There's nothing you can do about this sort of thing. The only alternatives I can think of are:

1. Keep the language pure and not borrow any words from other languages,

2. Make up new words.

In this regard, borrowing seems to be the better option, with the result that non-regularity is introduced.

> American “herbal” (pronounced “erbal”),

Still a pattern: honor, homage, heir, all with silent 'h' for US english and and non-silent in UK english.

Even for something with a larger pronunciation difference, such as 'solder' ('sodder' vs 'solder'), 'sodder' still fits some pattern - a silent 'l' (yolk, salmon, walk, talk).

> autophagy (with the emphasis on “to”, unlike any other word that starts with “auto”).

Autonomy/Autonomous, Automaton. There may be more, but that's certainly a pattern.

> If you need an order of magnitude more patterns to properly pronounce words (and you do) it’s a difference in quality, not just quantity.

Speaking as someone who is bilingual, I don't think it's even the number of patterns that matter (for someone speaking a language, the difference between knowing 10 patterns and 100 patterns is negligible - ask any native english speaker if they have problems with communication with other english speakers).

For example, in Kanji, for common usage, you still need to memorise around 3000 patterns. Native english speakers get by on maybe 300 patterns.

The problem isn't the number, I think, it's the ambiguity: which pattern to use for a specific word. It's still only a few patterns compared to a highly regular language like Kanji, but the ambiguity means that a little native language knowledge is necessary to determine the specific pattern.

Anyway, I think we've both said enough on this topic, so Cheers :-)

I think my main issue is with your choice of the word "pattern", as something you can match against .... Because the patterns are sometimes the entire word (lower vs. tower, cough vs dough). That's not the meaning I usually associate with pattern (in the context of pattern matching).

If you had used "classes", I probably wouldn't have bothered responding in the first place... "cough" falls in the same class "rough", and "dough" does not. And those classes each match a terser pattern ("ough"). But having matches the terser pattern, you are not better off knowing how to pronounce it than knowing the entire word.

Thanks for an interesting discussion, and cheers !