| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by enderm 1685 days ago

I like the design of your website!

What do you mean when you say words are disentangled, standalone concepts? I see words as being very much related to each other.

I assume I may be misinterpreting what you mean by "disentangled, standalone concepts”.

Barbara Tversky's research seems to contradict linguistic relativism. I definitely don’t think language is the foundation of cognition.

1 comments

ericjang 1685 days ago

Thanks!

Words are considered a "discrete unit of meaning", i.e. 3/4 of a word doesn't really mean much. So words like "red" and "grass" are "standalone" in the sense that the mean something by themselves. I agree that words are very much related to each other, in the sense that you can combine them.

I was trying to draw a connection that the "disentangled representations" ML folks often talk about are but a special few-word case of grammars for combining distinct concept.

solarmist 1685 days ago

Unfortunately, words aren't that simple, but it's close. Prefixes, suffixes, in-fixes, endings, etc., all have discrete meaning as well. And going into Asian language, this is much more obvious.

The discrete unit of meaning level is generally somewhere between a syllable and a word, with a few exceptions for shorter modifiers.

Unfortunately, in linguistics, the concept of a "word" is only as well defined as "planet" was pre-pluto losing its status.

Similarly when you look at riddles and crossword puzzle clues the idea of words being discrete also falls apart. Words, very much like variables in algebra only have meaning in relation to the other pieces of the context they are attached to.

While the mechanics (all the pieces of language, syntax and semantics are not discretizable. Just talk to anyone working on a dictionary.) you talk about don't seem to hold, I do think the idea you're talking about does hold.

ericjang 1685 days ago

Fair enough, I agree that if we really examine the comment "word as a discrete unit of meaning", the edge cases start to accumulate and the semantics rapidly break down. But barring things like prefixes/suffixes/modifiers/composite word characters in traditional Chinese, words are fairly discrete and generally regarded as the primary layer for expressing singular units of "meaning"

solarmist 1685 days ago

They are, but only because we don't have better language to express them. Similar to a lot of the problems with Chomsky's works the composability of language is only a subset of the whole breadth of what is expressable in a given language.

Or in other words, I believe the surface area of "edge cases" has a similar surface area as the rest of the language. The difference being they aren't invoked nearly as often because they require more effort and creativity.

Just look at the rise of words like "hangry". There are types of mashups that show up in creative uses of language that defy nearly any rule for any language you can come up with. In many languages, if you choose any of those supposed rules you can probably construct an algorithm to generate odd, but understandable words that defy that rule.

Mezzie 1684 days ago

> Or in other words, I believe the surface area of "edge cases" has a similar surface area as the rest of the language. The difference being they aren't invoked nearly as often because they require more effort and creativity.

Edge cases or exceptions do tend towards being highly used; this is because language is more likely to change the more it's used, so the most highly used words/phrases/sentences/etc tend to accumulate changes. One example of this is that if a language has verb conjugation and irregular verbs, then odds are some of its most common verbs will be irregular.

> Just look at the rise of words like "hangry". There are types of mashups that show up in creative uses of language that defy nearly any rule for any language you can come up with. In many languages, if you choose any of those supposed rules you can probably construct an algorithm to generate odd, but understandable words that defy that rule.

There are rules for that that would work, weirdly enough. There are just a ton of them.

solarmist 1685 days ago

My only point here is that any framework for generalization needs to be able to account for and incorporate these kinds of "exception-seeking" cases. Similar to the same way that mathematics uses counter-examples to strengthen and reinforce the definitions chosen.

ericjang 1685 days ago

I agree with your comment "In many languages, if you choose any of those supposed rules you can probably construct an algorithm to generate odd, but understandable words that defy that rule." - it comes many forms, from Goodhart's Law to the "hot dog vs. sandwich" debate.

I do mention this in my blog post - although I think Generalization is Language, I don't think it's possible to create a formal framework of language, for precisely because of "adversarial examples" that can be supplied for any formal definition.

Natural language itself, ignorant of formality, is able to account for these exceptions insofar as language is sufficient for people to convey a bare minimum of meaning. I am proposing to define language and generalization via the implicit understanding of large language models, in the same way you might use an image classifier to define "cat images" or "hot dogs"

rvense 1685 days ago

The problem with "word", as with many terms in linguistics, is that it's a prescientific unit of analysis.

I certainly think most linguistic typologists would say that there is no cross-linguistic unit that corresponds to our intuitive understanding of word, which is really grounded mostly in orthography.

And I think it's fairly easy to show that orthography should not have much say in this matter, though. Of course you can't get around it in language didactics, but in scientific description we need to be very careful with it. Bob Dixon and Alexandra Aikhenvald give some examples from Bantu languages in their Word: A cross-linguistic typology. In Sotho, the sentence "We will skin it with his knife" is written "Re tlo e bua ka thipa ya gagwe", while in the orthographies for Zulu and Xhosa, the same sentence would be rendered as "Retloebua kathipa yagagwe". You really need to look at each language to find a sensible set of analytical categories, and be very explicit about your criteria, be they syntactic, semantic or phonological.

Mezzie 1684 days ago

Linguistics has the distinction for what you're talking about: Morpheme versus word. Morphology is the study of this area. I freaking loved my Morphology classes.

rvense 1684 days ago

While I think there's a generally accepted definition of morpheme (as the smallest distinctive unit), that doesn't give you a good definition of the word. (Because there isn't one.)

Funny you use the term morphology like that. To me it's basically synonymous with inflection, very traditional, where morpheme is very much a structuralist term. But all my teachers were cognitive-functional linguists, so everything was cut rather different and sometimes it's hard to talk.

Mezzie 1684 days ago

Yeah, my morphology teacher was a structuralist, and this was quite a while ago, so I have no doubt I'm biased there. (I actually preferred the cognitive stuff I was introduced to; I really liked working with metaphor in their systems and syntax/phonology/morphology were less my thing than semantics and sociolinguistics.)

You're definitely right that the definitions aren't cut-and-dried and that makes typology rather difficult.

leobg 1685 days ago

And there’s also multi word expressions (MWE), where the meaning of the whole is different than that of the sum of its parts. E.g. “out of the blue”, “bite the bullet”.

solarmist 1685 days ago

Yup. Going the other direction is a thing as well.

Mezzie 1684 days ago

Actually, the discrete unit of meaning, linguistically, is the morpheme. It's a small difference, but it matters. Some words are morphemes, but not all, and not all morphemes are words.

Language, man. It's weird.

enderm 1685 days ago

I can see how this could work in English. I’m not sure if there are other languages in which 3/4 of a word carries more meaning. (I’m a primary English speaker, so this concern could be unfounded.)

PeterisP 1684 days ago

In many languages you have literally 3/4 of the word carry the meaning of the actual word and the remaining 1/4 sounds or letters devoted to grammatical markers for the gender/case/number/etc.

Using a classic Latin example from Monty Python, Romani ite domum / Romanes eunt domus;

the "Roman" part of of Romanes/Romani actually carries very much meaning and the -es/-i has information that's largely orthogonal to that.

canjobear 1685 days ago

All languages have something analogous to words in this way, although it can be hard to know where to draw the boundaries sometimes.

Technically the smallest indivisible unit that bears meaning is the morpheme, not the word. For example the word “cats” in English consists of two morphemes, cat+s. The first morpheme can stand on its own as a word, but the second can’t.

solarmist 1685 days ago

I agree, but I think the trickier part is that the semantics of words are even blurrier/more ambiguous than the syntax.

canjobear 1685 days ago

Yeah, hence the turn away from dictionary definitions and things like WordNet towards continuous distributional vector representations in NLP.

I don’t think you could really give an uncontroversial symbolic definition for any natural word.