Hacker News new | ask | show | jobs
by abeppu 1065 days ago
I think the concept of "simplest naturalistic language" may be intrinsically broken -- a "naturalistic language" is not simple. Natural languages balance between regular rules (e.g. in English, we often add -ed to make the past tense of a verb) and exceptions especially for common cases ("went", "was", "had", "made", "did" because going, being, having, making, doing are all so common). This tension is partly about how much a language user must know/consider when speaking/listening and how efficiently you can say things.

I cannot find a citation quickly, but I recall years ago reading a paper about simulated agents "evolving" a language in a game context where agents had to indicate items to one another, by sending messages which were subject to a noisy channel. Items had multiple attributes (think "small red square", "big green triangle" etc), and experimenters could vary both the noise in the channel, and the entropy of the distribution over items. Naturally if "small red square" is 99% of the things you have to communicate, and there is low noise, agents invent an abbreviation for it. If there's a huge amount of noise and a relatively even distribution over items, then "small small green green triangle triangle" or similar becomes more likely. Languages very naturally reflect both the things people discuss and the environment in which they discuss them.

5 comments

Your general point is a good one but I don't think irregular verbs are the best example of error correcting redundancy, or evolved shortcutting. In most cases they are just a relic of genealogy, and don't serve those purposes:

> Most English irregular verbs are native, derived from verbs that existed in Old English. Nearly all verbs that have been borrowed into the language at a later stage have defaulted to the regular conjugation.

https://en.wikipedia.org/wiki/English_irregular_verbs#Develo...

Irregular verbs (go/went, and so on) congugate (change according to tense and subject) using rules just like regular verbs, except that they have different rules. The irregular verbs use Germanic conjugations (cf. man/men, child/children) whereas the regular verbs use grammatical constructions from other source languages.
In every language, for every word, there will be some history and source. And one can always declare a "different rule" around exceptional cases ... but that's kind of vacuous, and speakers have to remember which words are subject to a minority "rule", so claiming they aren't "exceptions" seems disingenuous.

But if you look at the words in English for which we have "different rules", and you look at which words in other languages which have "different rules" ... they typically line up with frequency. You'll note that the small list of verbs listed above also happen to be irregular verbs in a lot of languages.

That assertion about frequency really needs some data to support it, because the only example I can think of that is very common in that regard is the word "to be", which is special in many ways other than its frequency.
While completely true, I think this misses the point which makes minimal "natural" language interesting. Sure you don't use one of these constructed languages in practice the same way you don't build your websites with Turing machine tapes. The question of interest is not one of practice but of theory, what is the equivalent of Turing completeness for natural language? What is the minimum criteria of grammar and vocabulary needed to span the space of conversational ability? In other words, what is the minimum needed for a language to even theoretically be "naturalistic" (even if no naturally occurring language ever looks like it in practice)?
not saying those papers are wrong, but 136 years and millions of speakers from _most_ countries and Esperanto's speakers seem just fine without adding irregular verbs.
Turkish might be a better example: it's a real, natural language and highly regular.
Same with Japanese, apparently a completely unrelated, differently structured language.
Fun fact: A subset of the linguistic community conjectures that Japanese and Korean are part of the same family as the Turkic languages. https://en.wikipedia.org/wiki/Altaic_languages
Yes, but it's a really remote relationship, if it exists.
> Natural languages balance between regular rules (e.g. in English, we often add -ed to make the past tense of a verb) and exceptions especially for common cases ("went", "was", "had", "made", "did" because going, being, having, making, doing are all so common).

Yes, but different natural languages resolve this tension differently.

For example, Turkish is much more regular in its verbs (and in general) than English or German.