Hacker News new | ask | show | jobs
by fernly 453 days ago
I find the OP very difficult to comprehend, to the point that I question whether it has content at all. One difficulty is in understanding their use of the word "embedding", defined (so to speak) as "internal representations (embeddings)", and their free use of the word to relate, and even equate, LLM internal structure to brain internal structure. They are simply assuming that there is a brain "embedding" that can be directly compared to the matrix of numerical weights that comprise an LLM's training. That seems a highly dubious assumption, to the point of being hand-waving.

They mention a profound difference in the opening paragraph, "Large language models do not depend on symbolic parts of speech or syntactic rules". Human language models very obviously and evidently do. On that basis alone, it can't be valid to just assume that a human "embedding" is equivalent to an LLM "embedding", for input or output.

3 comments

> They are simply assuming that there is a brain "embedding" that can be directly compared to the matrix of numerical weights that comprise an LLM's training.

If there were no such structure, then their methods based on aligning neural embeddings with brain "embeddings" (really just vectors of electrode values or voxel activations) would not work.

> They mention a profound difference in the opening paragraph, "Large language models do not depend on symbolic parts of speech or syntactic rules". Human language models very obviously and evidently do. On that basis alone, it can't be valid to just assume that a human "embedding" is equivalent to an LLM "embedding", for input or output.

This feels like "it doesn't work the way I thought it would, so it must be wrong."

I think actually their point here is mistaken for another reason: there's good reason to think that LLMs do end up implicitly representing abstract parts of speech and syntactic rules in their embedding spaces.

>They mention a profound difference in the opening paragraph, "Large language models do not depend on symbolic parts of speech or syntactic rules. "Human language models very obviously and evidently do.

Honestly do they ? To me, they clearly don't. Grammar is not how language works. It's useful fiction. Language even in humans seems to be a very statistical process.

Yes! As somebody who speaks 2 languages, and sort of reads/understands 2 more, I cannot agree more. Human spoken languages do not follow any grammars. Grammars are just simplified representations of reality that is probabilistic in nature.

This is something that Chomsky got very wrong, and the statistical/ML crowd got very right.

But still, grammars are a very useful model.

Languages definitely follow grammars. They don't follow the grammars that were written by observing them, but you can discover unwritten grammatical structures that are nevertheless followed by everyone who speaks a language, and who if asked wouldn't even be able to articulate the rules that they are following. It's the following that defines the grammar, not the articulation of the rules.

Statistics are just another way to record a grammar, all the way down to the detail of how one talks about bicycles, or the Dirty War in Argentina.

If a grammar is defined as a book that enumerates the rules of a language, then of course language doesn't require following a grammar. If a grammar is defined as a set of rules for communicating reasonably well with another person who knows those same rules, then language follows grammars.

> Languages definitely follow grammars

But it's the other way around! Grammars follow languages. Or, more precisely, grammars are (very lossy) language models.

They describe typical expectations of an average language speaker. Grammars try to provide a generalized system describing an average case.

I prefer to think of languages as a set of typical idioms used by most language users. A given grammar is an attempt to catch similarities between idioms within the set and turn 'em into a formal description.

A grammar might help with studying a language, and speed up the process of internalizing idioms, but the final learning stage is a set of things students use in certain situations aka idioms. And that's it.

> Statistics are just another way to record a grammar

I almost agree.

But it should be "record a language". These are two approaches to the problem of modeling human languages.

Grammars are an OK model. Statistical models are less useful to us humans but given the right amount of compute they do show much better (see LLMs).

This is a terminological difference. Linguists use "grammar" as a technical term for a speaker's implicit knowledge of how their language works. That knowledge could be statistical or rule-based in nature, although most linguistic theories say that it's rule-based. You're using grammars to mean human-produced descriptions of that knowledge.
That's correct.

Grammars the way I understand them are are a family of human language models. Typically discrete in nature. The approach was born out of Chomsky's research culminating in the Universal Grammar idea.

This is just wrong. Languages follow certain inviolable rules, most notably, hierarchical structure dependence. There are experiments (Moro, the subject "Chris") that show that humans don't process synthetic languages that violate these rules the same as synthetic languages that do (specifically it takes them longer to process and they use non-language parts of the brain to do so).
This does not mean that language in humans isn't probabilistic in nature. You seem to think that because there is structure then it must be rule based but that doesn't follow at all.

When a group of birds fly, each bird discovers/knows that flying just a little behind another will reduce the amount of flaps it needs to fly. When you have nearly every bird doing this, the flock form an interesting shape.

'Birds fly in a V shape' is essentially what grammar is here - a useful fiction of the underlying reality. There is structure. There is meaning but there is no rule the birds are following to get there. No invisible V shape in the sky constraining bird flight.

What exactly is wrong? The fact that grammars are very limited models of human languages? My key thesis is that human languages operate in a way that non-probabilistic models (i.e. grammars) can only describe it in a very lossy way.

Sure, LLMs are also lossy but also much more scalable.

I've spent quite a lot of time with 90s/2000s papers on the topic, and I don't remember any model useful in generating human language better than "stohastic parrots" do.

Moro is apparently a reference to Andrea Moro, but I can't find any writing of his titled 'The Subject "Chris"'.
How do you explain syntactic islands, binding rules or any number of arcane linguistic rules that humans universally follow? Children can generalise outside of their training set in a way that LLMs simply cannot (e.g. Nicaraguan sign language or creolization)
Linguists however know that grammar is, indeed, important for linguistic comprehension. For example, the German "Ich sehe die Frau mit dem Fernglas" (I see the woman with the binoculars) is _unambiguous_ because "die Frau" and "mit dem Fernglas" match in both gender and case. If this weren't the case, it could be either "I see (the woman with the binoculars)" or "I see (the woman) with [using] the binoculars". Even in German you might encounter this e.g. if you instead had to say "Ich sehe das Mädchen mit dem Fernglas", as das Mädchen (the girl) is neuter rather than feminine in gender.
Both example sentences are equally ambiguous. The gender of the sentence's object is irrelevant. It does not affect the prepositional phrase.
Am German, can confirm. If there's a rule here, it exists only in the heads of linguists.
My point is that Grammar is to language what Newton was to gravity i.e useful fiction that works well enough for most scenarios, not that language has no structure.

The first 5 minutes of this video do good job of explaining what i'm getting at - https://www.youtube.com/watch?v=YNJDH0eogAw

Wow a 1 hour video by some crank, guess all of linguistics and cognitive science has been a waste of time.
I said you need only watch the first 5 minutes to see what I was getting at.

You would also think emphasizing grammar's usefulness would make it plain that I do not think it is a waste of time.

> For example, the German "Ich sehe die Frau mit dem Fernglas" (I see the woman with the binoculars) is _unambiguous_ because "die Frau" and "mit dem Fernglas" match in both gender and case. If this weren't the case, it could be either "I see (the woman with the binoculars)" or "I see (the woman) with [using] the binoculars".

My German is pretty rusty, why exactly is it unambiguous?

I don't see how changing the noun would make a difference. "Ich sehe" followed by any of these: "den Mann mit dem Fernglas", "die Frau mit dem Fernglas", "das Mädchen mit dem Fernglas" sounds equally ambiguous to me.

It is indeed ambiguous. I don't understand which alternative the parent is implying.
Die Frau and dem Fernglass don’t bind tightly though.

In my view, this phrase is only unambiguous to those who feel the preposition tradition, and all the heavy lifting is done here by “mit” (and “durch” in the opposite case, if one wants to make it clear). Articles are irrelevant and are dictated by the verb and the preposition, whose requirements are sort of arbitrary (sehen Akk., mit Dat.) and fixed. There’s no article-controlled variation that could change meaning, to my knowkedge it would be simply incorrect.

I’m also quite rusty on Deutsch, aber habe es nicht völlig vergessen, it seems.

I don’t disagree with any of your particular points, but I think you’re missing the forest here: their argument is primarily based in empirical results, not a theoretical framework/logical deduction. In other words, they’re trying to explain why LLMs work so well for decoding human neural content, not arguing that they do!

I think any reasonable scientist would a-priori react the same way to these claims as claims that neural networks alone can possibly crack human intuition: “that sounds like sci-fi speculation at best”. But that’s the crazy world we live in…