As apposed to humans who all derive the physics of heat transfer independently when given a question like this?
Not picking on you - this brings up something we could all get better at:
There should be a "First Rule of Critiquing Models": Define a baseline system to compare performance against. When in doubt, or for general critiques of models, compare to real world random human performance.
Without a real practical baseline to compare with, its to easy to fall into subjective or unrealistic judgements.
"Second Rule": Avoid selectively biasing judgements by down selecting performance dimensions. For instance, don't ignore difference in response times, grammatical coherence, clarity of communication, and other qualitative and quantitative differences. Lack of comprehensive performance dimension coverage is like comparing runtimes of runners, without taking into account differences in terrain, length of race, altitude, temperature, etc.
It is very easy to critique. It is harder to critique in a way that sheds light.
> As apposed to humans who all derive the physics of heat transfer independently when given a question like this?
Isn't that the difference between learning and memorizing, though? If you were taught Newton's Law of Cooling using this example and truly learned it, you could apply it to other problems as well. But if you only memorized it, you might be able to recite it when asked the same question, yet still be unable to apply it to anything else.
If an LLM has only that knowledge and nothing else (pieces of text saying that heat transfer is proportional to some function of the temp difference) such that is not trained on any texts that give problems and solutions in this area, it will not work this out, since it has nothing to generate tokens from.
Also, your knowledge doesn't come from anywhere near having scanned terabytes of text, which would take you multiple lifetimes of full time work.
Lecun's argument is based off a bad interpretation of how data is processed by the optic nerve, we don't receive that much raw data.
What we do have, is billions of years of evolution that has given a lot of innate knowledge which means we are radically more capable than LLMs despite having little data.
Not picking on you - this brings up something we could all get better at:
There should be a "First Rule of Critiquing Models": Define a baseline system to compare performance against. When in doubt, or for general critiques of models, compare to real world random human performance.
Without a real practical baseline to compare with, its to easy to fall into subjective or unrealistic judgements.
"Second Rule": Avoid selectively biasing judgements by down selecting performance dimensions. For instance, don't ignore difference in response times, grammatical coherence, clarity of communication, and other qualitative and quantitative differences. Lack of comprehensive performance dimension coverage is like comparing runtimes of runners, without taking into account differences in terrain, length of race, altitude, temperature, etc.
It is very easy to critique. It is harder to critique in a way that sheds light.