Hacker News new | ask | show | jobs
by FridgeSeal 812 days ago
“Chomsky is wrong because I say he is, and because he doesn’t like the LLM I like”

Mmmhhmmm yes very convincing argument there. I’d be willing to bet that Chomsky’s arguments don’t rely on such spurious arguments. Automated plagiarism reads like a fantastic description of current LLM’s to me.

1 comments

Do you disagree that LLMs appear to have learnt language? Whether they plagiarize or not is irrelevant as long as they are not quoting verbatim. If they are coming up with their own sequence of words, making sense, and remaining grammatical, then how can you argue that they have not learnt language?

Since they do at least (whatever other shortcomings they may have) appear to have learnt language, just by "listening to" a bunch of examples, and since they are based on a transformer, not some specialized Chompskian "language organ", then the thesis that a "language organ" is needed is disproved.

Even assuming LLMs have learnt language (whatever that means) it is completely irrelevant to what Chomsky is studying which is how the human language ability works. As an analogy training a neural network to successfully predict the weather doesn't falsify physics-based models of the weather.
Sure, but for our cortex not to be able to learn language without becoming specialized for it, while a transformer can, implies that a transformer is a more powerful prediction engine than our cortex, which seems unlikely.

If you look at the cortex as a whole, what seems most striking is how uniform it is - same basic architecture of six layers of neurons with a specific pattern of inter-layer connectivity. It seems nature has come up with a universal prediction architecture. Maybe Wernicke's areas is fine-tuned for language, but to characterize it as a specialized "language organ" seems a bit of a stretch. Let's note too that language is only a million years old, while the cortex itself is 100's of millions, yet has this mostly uniform architecture that evidentially works just as well for vision as hearing, etc, etc.

So, sure, the ability of the ridiculously simple transformer architecture to learn language (and many other prediction tasks you throw at it), doesn't PROVE that the brain didn't have to evolve a highly specialized way to do it, but it certainly seems highly suggestive of it.

Since we now have an existence proof that a very simple architecture, not specialized for language, can learn language, it seems the onus is now on Chompsky to put some meat on the bones of his claim (without evidence) that the general cortical architecture is incapable of this without a high degree of specialization.

> Sure, but for our cortex not to be able to learn language without becoming specialized for it, while a transformer can, implies that a transformer is a more powerful prediction engine than our cortex, which seems unlikely.

On the contrary, it's extremely likely that we can engineer something better than random evolution. And we have: calculators are better at adding and subtracting that humans are, bikes are faster than running, etc.

> If you look at the cortex as a whole, what seems most striking is how uniform it is - same basic architecture of six layers of neurons with a specific pattern of inter-layer connectivity. It seems nature has come up with a universal prediction architecture. Maybe Wernicke's areas is fine-tuned for language, but to characterize it as a specialized "language organ" seems a bit of a stretch. Let's note too that language is only a million years old, while the cortex itself is 100's of millions, yet has this mostly uniform architecture that evidentially works just as well for vision as hearing, etc, etc.

Language is ~100,000 years old, not a million. The brain is in fact not uniform: If you damage Broca's area you lose language capability. And to say that all cognitive function is the same general algorithm ignores the obvious fact that the brain performs different functions and doesn't answer the question of how and why this is so. There are lots of cognitive behaviour that is not understood, if you want to explain those behaviors you implicitly have to distinguish them from one another.

> So, sure, the ability of the ridiculously simple transformer architecture to learn language (and many other prediction tasks you throw at it), doesn't PROVE that the brain didn't have to evolve a highly specialized way to do it, but it certainly seems highly suggestive of it.

It doesn't. Moro showed in a series of experiments that humans have difficulty learning non-hierarchical languages and use different parts of the brain to do so, which is highly suggestive that language is specialized.

> Since we now have an existence proof that a very simple architecture, not specialized for language, can learn language, it seems the onus is now on Chompsky to put some meat on the bones of his claim (without evidence) that the general cortical architecture is incapable of this without a high degree of specialization.

He has provided evidence and arguments, some of which I have pointed to above. Maybe you should actually read or listen to him.

> Moro showed in a series of experiments that humans have difficulty learning non-hierarchical languages and use different parts of the brain to do so, which is highly suggestive that language is specialized.

The cortex is built for hierarchical processing, because that's what's needed to model the world we live in. Physical objects are localized and have hierarchical detail, and larger visual scenes are the same. The kind of sequential (temporal) patterns relevant to us are also hierarchical, whether in visual, auditory or other domains.

The type of connectionist architecture needed to recognize hierarchical patterns is a layered one where the receptive field and hierarchical level of abstraction grows as you ascend the layers. In our brain those layers come from different patches of our cortical sheet being connected. This is the reason that neural network architectures like CNNs and transformers also work to recognize hierarchical patterns in visual and temporal domains - because they both also uses these layered architectures, which is all that is needed.

The reason why the function of damaged brain areas can't always be taken over by other areas is largely due to plasticity. Our brain peaks in it's ability to form new synapses in the first few years of life. If you haven't learned language by age of 3, then you will never be able to learn more than a crude type of pidgin language, depsite all your "language areas" being intact. The same would be true of different part of you brain trying to learn language as an adult - the plasticity is no longer there.