Compression is the reason why these Models are able to learn and understand.
My brain is doing the exact same thing.
I learned enough to compress concepts like a bike and what a bike does and for what i can use a bike.
Ask a LLM and it will answer you similiar to humans.
Blind people learn concepts of bikes too and in a smiliar way: by description.
LLMs just have so much data in form of text available and are able to ingest all of this, that the LLM compression algorithm doesn't has to be that good/finetuned than ours.
But I would assume that Yann LeCun's JEPA or other breakthroughs in the next few years will get us there.
The man posits that clicking is instinctual for blind people but they are told to quiet down in class and most never develop their echolocation abilities
A blind person has touched warm and hot things and gotten burned before, and then they are told lava is this molten liquid that is even hotter than anything they have touched. That is enough for them to understand.
A blind person that never touched a hot object wouldn't really know though, there is a reason we dismiss talk from people who lack experience.
You don't know that. Yo don't know what someone would think if you tell them the general concept of cold and warm.
The reaction you should have, the feeling etc.
I asked chatgpt how it would describe a scene without mentioning temperature. It was very good in describing what a human would describe.
I'm aware of the bias we have against LLMs but I think people just underestimate how much data is there.
I'm not saying a robot wouldn't be better with this information or an LLM and they actually use temperature sensors for robots so they can control movement speed and dexterity with overheating elements but the gap is small.
I don’t think this analogy holds. The whole way through the processing pipeline in the brain, different sensory data is ingested separately and processed separately; and we still don’t understand how that data is then integrated into a cohesive experience.
LLMs have the same fundamental input regardless of modality, tokens. There is a preprocessing step before the “brain”, which is more akin to some super-synesthesia where all senses are translated into sound before becoming experience.
Can't you say the same about the connectivity between the brain and your senses? Your eyes do 'preprocessing', but in the end the connection to your brain is just through electrical impulses in the end. All senses get translated to some sort of electrical signal, just like in an LLM with tokens.