Hacker News new | ask | show | jobs
by karmasimida 1773 days ago
> The "language models don't really understand anything"

This is still true. By all account, human doesn't need to read 159GB of Python code to write Python, or we simply can't.

But it doesn't necessarily indicate language models aren't useful.

3 comments

Considering the sum total of data and computation that goes in to creating an intelligent human mind, including the forces of natural selection in creating our innate structure and dispositions, it's not obvious that any conclusions can be drawn from the fact that so much data and compute goes into training these models.
Has this transfer of knowledge from one domain to another really been demonstrated by these models/learning processes? I know transfer learning is a thing (I have a couple books on my shelf on it). But it seems far from what you are describing.
The AlphaZero algorithm swapped between board games pretty easily. OpenAI could also have been gesturing at this when they named the GPT paper "Language Models are Few-Shot Learners".
DALL-E + CLIP models show a deep understanding of the relation between images and text.
they mention in the demo video that the inspiration for codex came from GPT-3 users training it to respond to queries with code samples. I saw some pretty impressive demos of the original model creating SQL queries from plain questions. I'm not sure if that counts as switching domains, but it's something?
The problem with this (very popular) argument is that you can't give a CS course to a baby and expect them to get at programming.

By the time we see our first line of code, most of us have seen a ridiculous amount of data. We've been trained in problem solving, logical reasoning, maths, natural language processing, ... Hell, we've been trained as pattern matchers since we've been born.

By my account, humans actually need a large amount of training data. It might be the knowledge federation and generalisation that we're good at, but I don't think we're a clear winner in data efficiency.

Taking 11Mbps [1] as the raw uncompressed incoming data, and assuming 16 hours of waking environment consumption on average (likely high for children), a 13yo has taken in less than 400 TB of information (I used 11 * 60 * 60 * 16 * 365 * 13 / 8.) That's... surprisingly low.

[1] https://www.britannica.com/science/information-theory/Physio...

Are we still limiting to visual cues and not the auditory,smell,taste,touch data which we get exposed to?
Visual input is so dense it's basically not worth tracking the other senses from a data rate pov.
I would argue humans ingest a lot more than 159GB before they can write code. Most of it isn't Python, and humans currently transfer knowledge a lot more efficiently than NNs, but I suspect that'll change as incorporating more varied data sources becomes feasible.
We generalize pretty well. One could say: "it took you 20 years to learn python!", but actually I learned python, Java, c#... Software engineering, machine learning... How to play guitar, how to cook.. .How to speak Portuguese, how to speak English... And thousands and thousands of different things which build on each other.

You can give a programmer a few kb of code in a new language and that will give him a small grasp of how it works.