| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tripletao 174 days ago

> LLMs make no prediction at all as to whether or not natural languages should have wh-islands: they’ll happily learn languages with or without such constraints.

The human-designed architecture of an LLM makes no such prediction; but after training, the overall system including the learned weights absolutely does, or else it couldn't generate valid language. If you'd prefer to run in the opposite direction, then you can feed in sentences with correct and incorrect wh-movement, and you'll find the incorrect ones are much less probable.

That prediction is commingled with billions of other predictions, which collectively model natural language better than any machine ever constructed before. It seems like you're discounting it because it wasn't made by and can't be understood by an unaided human; but it's not like the physicists at the LHC are analyzing with paper and pencil, right?

> There is no reason to think that a perfect theory in this domain would be of any particular help in generating plausible-looking text.

Imagine that claim in human form--I'm an expert in the structure of the Japanese language, but I'm unable to hold a basic conversation. Would you not feel some doubt? So why aren't you doubting the model here? Of course it would have been outlandish to expect that of a model five years ago, but it isn't today.

I see your statement that Chomsky isn't attempting to model the "many non-linguistic cognitive systems", but those don't seem to cause the LLM any trouble. The statistical modelers have solved problem after problem that was previously considered impossible, and the practical applications of that are (for better or mostly worse) reshaping major aspects of society. Meanwhile, every conversation I've had with a Chomsky supporter seems to reduce to "he is deliberately choosing not to produce any result evaluable by a person who hasn't spent years studying his theories". I guess that's true, but that mostly just makes me regret what time I've already spent.

1 comments

foldr 174 days ago

> The human-designed architecture of an LLM makes no such prediction; but after training, the overall system including the learned weights absolutely does, or else it couldn't generate valid language.

It makes a prediction about whatever language(s) are in the training data, but it doesn’t make any (substantial) predictions about general constraints on human languages. It really seems that you’re missing the absolutely fundamental goal of Chomsky’s research program here. Remember that whole “universal grammar” thingy?

> -I'm an expert in the structure of the Japanese language, but I'm unable to hold a basic conversation. Would you not feel some doubt?

I expect anyone learning Japanese as a second language will get a chuckle out of this one. It’s in fact a common scenario. You can learn a lot about the grammar of a language, but conversation requires the ability to use that knowledge immediately and fluidly in a wide variety of situations. It is like the difference between “knowing how to solve a differential equation” and being able to answer 50 questions within an hour in a physics exam.

> I see your statement that Chomsky isn't attempting to model the "many non-linguistic cognitive systems", but those don't seem to cause the LLM any trouble.

Of course they don’t, because researchers creating LLMs are (in the vast majority of cases) not attempting to model any particular cognitive system; they have engineering goals, not scientific ones. You seem to be stuck in the view that Chomsky is somehow trying and completely failing to do the thing that LLMs do successfully. This certainly makes for a good straw man (if Chomsky had the same goals, then yeah, he never got anywhere), but it’s a misunderstanding of his research program.

> "he is deliberately choosing not to produce any result evaluable by a person who hasn't spent years studying his theories"

You could say this of many perfectly respectable fields. Andrew Wiles has not produced any result evaluable by me or by almost anyone else. It would certainly take me a lot more than “a few years” of study to evaluate his work.

I’m afraid there are no intellectual shortcuts. If you want to evaluate Chomsky’s work, you will have to at least read it, and maybe even think about it a bit too! It seems a bit churlish to whine about that. All you are being deprived of by opting out of this time investment is the opportunity to make informed criticisms of his work on the internet.

(The good news is that generative linguistics is actually pretty accessible, and one year of part time study would probably be enough to get the lay of the land.)

link

tripletao 174 days ago

> Andrew Wiles has not produced any result evaluable by me or by almost anyone else.

Fermat wrote the theorem in the margin long before Wiles was born. There is no question that many people tried and failed to prove it. There is no question that Wiles succeeded, because the skill required to verify a proof is much less than the skill required to generate it. I haven't done so myself; but lots of other people have, and there is no dispute by any skilled person that his proof is correct. So I believe that Wiles has accomplished something significant.

I don't think Chomsky has any similar accomplishment. I roughly understand the grandiose final goal; I just see no evidence that he has made any progress towards it. Everything that I'd see as an interesting intermediate goal is dismissed as out of scope, especially when others achieve it. On the rare occasion that Chomsky has made externally intelligible predictions on the range of human language, they've been falsified anthropologically. I assume you followed the dispute on Pirahã, which I believe clarified that features like recursion were in fact optional, rendering the theory safely non-falsifiable again.

So what's his progress? Everything that I see turns inward, valuable only within the framework that he himself constructed. Anyone can build such a framework, so that's not an accomplishment. Convincing others to spend years of their lives on that framework is a sort of an achievement, but it's not a scientific one--homeopathy has many practitioners.

> I expect anyone learning Japanese as a second language will get a chuckle out of this one. It’s in fact a common scenario.

I think this view is just as wrong applied to a human as to a model. A beginning language student probably knows a lot more grammar rules than a native speaker, but their inability to converse doesn't come from their inability to quickly apply them. It comes from the fact that those rules capture only a small amount of the structure of natural language. You seem to acknowledge this yourself--if nothing Chomsky is working on would help a machine generate language, then it wouldn't help a human either. This also explains my teachers' usual advice to stop studying and converse as best I could, watch movies, etc.

Humans clearly learn language in a more structured way than LLMs do (since they don't need trillions of tokens), but they learn primarily from exposure, with partial structure but many exceptions. I don't think that's surprising, since most other things "designed" in an evolutionary manner have that same messy form. LLMs have succeeded spectacularly in modeling that, taking the usual definition in ML or other math for "modeling".

It's thus strange to me to see them dismissed as a source of insight into natural language. I guess most experts in LLMs are busy becoming billionaires right now; but if anything resembling Chomsky's universal grammar ever does get found to exist, then I'd guess it will be extracted computationally from models trained on corpora of different languages and not any human insight, in the same way that the Big Five personality traits fall out of a PCA.

link

foldr 173 days ago

> So what's his progress? Everything that I see turns inward, valuable only within the framework that he himself constructed.

It's really not true that the whole of generative linguistics is just some kind of self-referential parlor game. A lot of what we take for granted today as legitimate avenues of research in cognitive science were opened up as a direct consequence of Chomsky's critique of behaviorism and his insight that the mind is best understood as a computational system. Ironically, any respectable LLM will be perfectly happy to cover this in more detail if you probe it with some key terms like "behaviorism", "cognitive revolution" or "computational theory of mind".

> Pirahã

It's very unlikely that Everett's key claims about Pirahã are true (see e.g. https://dspace.mit.edu/bitstream/handle/1721.1/94631/Nevins-...). But anyway, the universality of recursive clausal embedding has never been a central issue in generative linguistics. Chomsky co-authored one speculative paper late in his career suggesting that recursion in some (vague) sense might be the core computational innovation responsible for the human language faculty. Everett latched on to that claim and the dispute went public, which has given a false impression of its overall centrality to the field.

> So what's his progress?

I don't see how we can discuss this question without getting into specifics, so let me try to push things in that direction. Here is a famous syntax paper by Chomsky: https://babel.ucsc.edu/~hank/On_WH-Movement.pdf It claims to achieve various things. Do you disagree, and if so, why?

> Japanese

A generative linguist studying Japanese wouldn't claim to be an expert on the structure of Japanese in your broad sense of the term. One thing to bear in mind is that generative linguistics is entirely opportunistic in its approach to individual languages. Generative linguists don't don't study Japanese because they give a fuck about Japanese as such (any more than physicists study balls rolling down inclined planes because balls and inclined planes are intrinsically fascinating). The aim is just to find data to distinguish competing hypotheses about the human language faculty, not to come to some kind of total understanding of Japanese (or whatever language).

> I guess most experts in LLMs are busy becoming billionaires right now; but if anything resembling Chomsky's universal grammar ever does get found to exist, then I'd guess it will be extracted computationally from models trained on corpora of different languages and not any human insight, in the same way that the Big Five personality traits fall out of a PCA.

This is a common pattern of argumentation. First, Chomsky's work is critically examined according to the highest possible scientific standards (every hypothesis must be strictly falsifiable, etc. etc.) Then when we finally get to see the concrete alternative proposal, it turns out to be nothing more than a promissory note.

link

tripletao 173 days ago

> It's very unlikely that Everett's key claims about Pirahã are true

Everett achieved something unequivocally difficult--after twenty years of failed attempts by other missionaries, he was the first Westerner to learn Pirahã, living among the people and conversing with them in their language. In my view, that gives him significantly greater credibility than academics with no practical exposure to the language (and I assume you're aware of his response to the paper you linked).

I understand that to Chomsky's followers, Everett's achievement is meaningless, in the same way that LLMs saturating almost every prior benchmark in NLP is meaningless. But what achievements outside the "self-referential parlor game" are meaningful then? You must need something to ground yourself in outside reality, right?

> Then when we finally get to see the concrete alternative proposal, it turns out to be nothing more than a promissory note.

I'm certainly not claiming that statistical modeling has already achieved any significant insight into how physical structures in the brain map to an ability to generate language, and I don't think anyone else is either. We're just speculating that it might in future.

That seems a lot less grandiose to me than anything Chomsky has promised. In the present, that statistical modeling has delivered some pretty significant, strictly falsifiable, different but related achievements. Again, what does Chomsky's side have?

> I don't see how we can discuss this question without getting into specifics, so let me try to push things in that direction. Here is a famous syntax paper by Chomsky: https://babel.ucsc.edu/~hank/On_WH-Movement.pdf

And when I asked that before, you linked a sixty-page paper, with no further indication ("various things"?) of what you want to talk about. If you're trying to argue that Chomsky's theories are anything but a tarpit for a certain kind of intellectual curiosity, then I don't think that's helping.

link

foldr 173 days ago

Believe Everett if you want to, but it doesn’t make much difference to anything. Not every language has to exploit the option of recursive clausal embedding. The implications for generative linguistics are pretty minor. Yes, Everett responded to the paper I linked, and then there were further papers in the chain of responses (e.g. http://lingphil.mit.edu/papers/pesetsk/Nevins_Pesetsky_Rodri...).

> And when I asked that before, you linked a sixty-page paper, with no further indication ("various things"?) of what you want to talk about.

I was suggesting that we talk about the central claim of the paper (i.e. that the answer to question (50) is ‘yes’).

I don’t see how it’s reasonable to ask for something smaller than a paper if you want evidence that Chomsky’s research program has achieved some insight. That’s the space required to argue for a particular viewpoint rather than just state it.

In other words, if I concisely summarize Chomsky’s findings you’ll just dismiss them as bogus, and if I link to a paper arguing for a particular result, you’ll say it’s too long to read. So, essentially, you have decided not to engage with Chomsky’s work. That is a perfectly legitimate thing to do, but it does mean that you cannot make informed criticisms of it.

link

tripletao 173 days ago

> So, essentially, you have decided not to engage with Chomsky’s work. That is a perfectly legitimate thing to do, but it does mean that you cannot make informed criticisms of it.

Any criticism that I'd make of homeopathy would be uninformed by the standards of a homeopath--I don't know which poison to use, or how many times to strike the bottle while I'm diluting it, or whatever else they think is important. But to their credit they're often willing to put their ideas to the external test (like with an RCT), and I know that evidence in aggregate shows no benefit. I'm therefore comfortable criticizing homeopathy despite my unfamiliarity with its internals.

I don't claim any qualifications to criticize the internals of Chomsky's linguistics, but I do feel qualified to observe the whole thing appears to be externally useless. It seems to reject the idea of falsifiable predictions entirely, and if one does get made and then falsified then "the implications for generative linguistics are pretty minor". After dominating academic linguistics for fifty years, it has never accomplished anything considered difficult outside the newly-created field. So why is this a place where society should expend more of its finite resources?

Hardy wrote his "Mathematician's Apology" to answer the corresponding question for his more ancient field, explicitly acknowledging the uselessness of many subfields but still defending them. He did that with a certain unease though, and his promises of uselessness also turned out to be mistaken--he repeatedly took number theory as his example, not knowing that in thirty years it would underly modern cryptography. Chomsky's linguists seem to me like the opposite of that, shouting down anyone who questions them (he called Everett a "charlatan") while proudly delivering nothing to the society funding their work. So why would I want to join them?

link