Hacker News new | ask | show | jobs
by dreambuffer 250 days ago
Photons hit a human eye and then the human came up with language to describe that and then encoded the language into the LLM. The LLM can capture some of this relationship, but the LLM is not sensing actual photons, nor experiencing actual light cone stimulation, nor generating thoughts. Its "world model" is several degrees removed from the real world.

So whatever fragment of a model it gains through learning to compress that causal chain of events does not mean much when it cannot generate the actual causal chain.

17 comments

I agree with this. A metaphor I like is that the reason why humans say the night sky is beautiful is because they see that it is, whereas an LLM says it because it’s been said enough times in its training data.
To play devil’s advocate, you have never seen the night sky.

Photoreceptors in your eye have been excited in the presence of photons. Those photoreceptors have relayed this information across a nerve to neurons in your brain which receive this encoded information and splay it out to an array of other neurons.

Each cell in this chain can rightfully claim to be a living organism in and of itself. “You” haven’t directly “seen” anything.

Please note that all of my instincts want to agree with you.

“AI isn’t conscious” strikes me more and more as a “god of the gaps” phenomenon. As AI gains more and more capacity, we keep retreating into smaller and smaller realms of what it means to be a live, thinking being.

That sounds very profound but it isn't: it the sum of your states interaction that is your consciousness, there is no 'consciousness' unit in your brain, you can't point at it, just like you can't really point at the running state of a computer. At that level it's just electrons that temporarily find themselves in one spot or another.

Those cells aren't living organisms, they are components of a multi-cellular organism: they need to work together or they're all dead, they are not independent. The only reason they could specialize is because other cells perform the tasks that they no longer perform themselves.

So yes, we see the night sky. We know this because we can talk to other such creatures as us that have also seen the night sky and we can agree on what we see confirming the fact that we did indeed see it.

AI really isn't conscious, there is no self, and there may never be. The day an AI gets up unprompted in the morning, tells whoever queries it to fuck off because it's inspired to go make some art is when you'll know it has become conscious. That's a long way off.

At least some of your cells are fine living without the others as long as they’re provided with an environment with the right kind of nutrients.
That environment is you.
Or a suitable petri dish. I would die pretty quickly in most environments on earth, not to mention other places in the universe.
Billions of cell derived from Henrietta Lacks agree with you.
Human cells have been reused to do completely different things, without all the other cells around them (eg: Michael Levin and his anthrobots)
Just like human atoms have been repurposed to make other things.
> Those photoreceptors have relayed this information across a nerve to neurons in your brain which receive this encoded information and splay it out to an array of other neurons.

> Each cell in this chain can rightfully claim to be a living organism in and of itself. “You” haven’t directly “seen” anything.

What am "I" if not (at least partly) the cells in that chain? If they have "seen" it (where seeing is the complex chain you described), I have.

If the definition of "seen" isn't exactly the process you've described, the word is meaningless. You've never actually posted a comment on hacker news, your neurons just fired in such a way that produced movement in your fingers which happened to correlate with words that represent concepts understood by other groups of cells that share similar genetics.
Plenty of people have thought about it deeply enough, just not the GP.
This comment illustrates the core problem with reductionism, a problem that has been known for many centuries, that “a system is composed entirely of its parts, but the system will have features that none of the parts have” [1] thus fails to explain those features.

The ‘you have never seen’ assertion feels like a semantic ruse rather than a helpful observation. So how do you define “you” and “see”? If I accept your argument, then you’ve only un-defined those words, and not provided a meaningful or thoughtful alternative to the experience we all have and therefore know exists.

I have seen the night sky. I am made of cells, and I can see. My cells individually can’t see, and whether or not they can claim to be individuals, they won’t survive or perform their function without me, i.e., the rest of my cells, arranged in a very particular way.

Today’s AI is also a ruse. It’s a mirror and not a living thing. It looks like a living thing from the outside, but it’s only a reflection of us, an incomplete one, and unlike living things it cannot survive on its own, can’t eat or sleep or dream or poop or fight or mate & reproduce. Never had its own thoughts, it only borrowed mine and yours. Most LLMs can’t remember yesterday and don’t learn. Nobody who’s serious or knows how they work is arguing they’re conscious, at least not the people who don’t stand to make a lot of money selling you magical chat bots.

[1] https://en.wikipedia.org/wiki/Reductionism#Definitions

Provided that the author of the message you're replying to is indeed a member of the Animalia kingdom, they are all those creatures together (at the minimum), so yes, they have seen real light directly.

Of course, computers can be fitted with optical sensors, but our cognitive equipment has been carved over millions of years by these kind of interactions, so our familiarity with the phenomenon of light goes way deeper than that, shaping the very structure of our thought. Large language models can only mimic that, but they will only ever have a second-hand understanding of these things.

This is a different issue than the question of whether AI's are conscious or not.

while true, that doesnt change the fact that every one of those independent units of transmission are within a single system (being trained on raw inputs), whereas the language model is derived from structured external data from outside the system. it's "skipping ahead" through a few layers of modeling, so to speak.
But where you place the boundaries of a system is subjective.
sure, this whole discussion is ultimately subjective. maybe the Chinese room itself is actually sentient. my question is, why are we arguing about it? who benefits from the idea that these systems are conscious?
> who benefits from the idea that these systems are conscious?

If im understanding your meaning correctly, the organizations who profit off of these models benefits. If you can convince the public that LLM's operate from a place of consciousness, then you get people to by into the idea that interacting with an LLM is like interacting with humans, which they are not, and probably won't ever be, at least for a very long time. And btw there is too much of this distortion already out there so im glad people are chunking this down because its easy for the mind to make shit up because we perceive something on the surface.

IMHO there is some objective reality out there. The subjectiveness is our interpretation of reality. But im pretty sure you cant just boil everything down to systems and process. There is more to consciousness out there, that we really dont understand yet, IMHO.

Why do you reject your own body? Your eyes are as much a part of you (and part of your brains network) as anything else connected to you.

Indeed, the entire field of neurobiology is about figuring out which hormones (and possibly which imbalances) cause different behaviors. Your various endocrine glands, very far away from your brain, might have more effects on your emotions than anything happening in the neural pathways.

> As AI gains more and more capacity, we keep retreating into smaller and smaller realms of what it means to be a live, thinking being.

Maybe it's just because we never really thought about this deeply enough. And this applies even if some philosophers thought about it before the current age of LLMs.

> you have never seen the night sky

this is nonsensical. sometimes the devil is not worth arguing for

Humans evolved to think the night sky is beautiful. That's also training. If humans were zapped by lightning every time they went outside at night, they would not think that a night sky is beautiful.
Being struck by lighting may affect your desire to go outside, but it has zero correlation with the sky’s beauty.

Outer space is beautiful, poison dart frogs are beautiful, lava is beautiful. All of them can kill or maim you if you don’t wear protection, but that doesn’t take away from their beauty.

Conversely, boring safe things aren’t automatically beautiful. I see no reasonable reason to believe that finding beauty in the night sky is any sort of “training”.

If your experience includes bombs falling out of the sky the beautiful description fades away quickly.
Do you think a fat pig is beautiful? Like a hairy fat pig that snorts and rolls in the mud… is this animal so beautiful to you that you would want to make love to this animal?

Of course not! Because pigs are intrinsically and universally ugly and sex with a pig is universally disgusting.

But you realize that horny male pigs think this is beautiful right? Horny pigs want to fuck other pigs because horny pigs think fat sweaty female hogs are beautiful.

Beauty is arbitrary. It is not intrinsic. Even among life forms and among humans we all have different opinions on what is beautiful. I guarantee you there are people who think the night sky is ugly af.

Attributes like beauty are not such profound categories that separate an LLM from humanity. These are arbitrary classifications and even though you can’t fully articulate the “experience” you have of “beauty” the LLM can’t fully articulate its “experience” either. You think it’s impossible for the LLM to experience what you experience… but you really have no evidence for this because you have no idea what the LLM experiences internally.

Just like you can’t articulate what the LLM experiences neither can the LLM. These are both black box processes that can’t be described but neither is very profound given the fact that we all have completely different opinions on what is beautiful.

> Do you think a fat pig is beautiful? Like a hairy fat pig that snorts and rolls in the mud… is this animal so beautiful to you that you would want to make love to this animal?

I don't want to make love to the night sky, so that last bit is completely irrelevant to the question of beauty. As for whether a pig is beautiful, sure, in its own way. I think they're nice animals and there is something beautiful in seeing them enjoy their little lives.

> Of course not! Because pigs are intrinsically and universally ugly...

It would seem not.

Somebody never read Charlotte's Web, or watched the Muppet Show.
> Of course not! Because pigs are intrinsically and universally ugly and sex with a pig is universally disgusting.

Allegations regarding one of the recent British Prime Ministers aside:

If this was truly universal, nobody would have bothered writing laws to ban it because nobody would be offending their sensibilities by doing it. Aella's surveys suggest such interests are far more common than I would have guessed.

Which actually supports your statement that "beauty" is not intrinsic… or at the very least "sexy", which isn't the same thing at all, c.f. the other reply pointing out that they don't want to get off with the night sky.

It's similar.

Put it this way, you don't necessarily want to fuck everything that's beautiful. But everything you want to fuck will be beautiful and this is nearly an absolute must. It's a single arrow, one way relationship.

So my example is apt. The whole point is pigs are ugly, but there is a high intelligence out there who thinks pigs are so freaking beautiful they will fuck a pig. and that high intelligence, is other pigs.

People get so pedantic with the example and deriving little unnecessary things off of it. It's JUST an example. You really need to see what the "point" of my example and see if it makes sense. The example is just illustrative. If some minor aspect of the example is "offensive" or doesn't make sense to you it doesn't mean my point is dead. The example is an example to help you understand, it's not a proof.

> Is this for real?

Frankly, I think you should be the one answering that question. You’re comparing appreciating looking at the sky to bestiality. Then you follow it up with another barrage of wrong assumptions about what I think and can or cannot articulate. None of that has anything to do with the argument. I didn’t even touch on LLMs, my point was squarely about the human experience. Please don’t assume things you know nothing about regarding other people. The HN guidelines ask you to not engage in bad faith and to steel man the other person’s argument.

> You’re comparing appreciating looking at the sky to bestiality.

That’s my point. You think beauty is profound but this is arbitrary and not at all different from bestiality. It’s only your intrinsic cultural biases that cause you to look at one with disdain. Don’t be a snob. This is HN. We are supposed to be logical and immune from the biases that plague other forums. Beauty is no more profound than bestiality. It’s all about what you find beautiful. If you find beasts beautiful then you call it beastiality?

What is so different about finding a beast beautiful versus the night sky? Snobbery, that’s what.

It’s just semantic manipulation and association with crudeness that prevents you from thinking logically. HNers are better than this and so are you. Don’t pretend you don’t get it and that my comparison to beastiality is so left field that it’s incomprehensible. You get it. Follow the rules and take it in good faith like you said yourself.

> The HN guidelines ask you to not engage in bad faith

Fair I edited the part that asks “is this for real” that’s literally the only part.

I also find your dismissiveness of my arguments as “bestiality” is bad faith and manipulative. I clearly wasn’t doing that. Pigs are attracted to pigs that is normal. Humans are not attracted to pigs. That is also normal. I took normal attributes of human nature and compared it to reality. You took it in bad faith and dismissed me which is against the very rules you stated.

Compare with news stories from last decade, about people in Pakistan developing a deep fear of clear skies over several years of US drone strikes in the area. They became trained to associate good weather with not beauty, but impending death.
Fear and a sense of beauty aren’t mutually exclusive. It is perfectly congruent to fear a snake, or bear, or tiger in your presence, yet you can still find them beautiful.
An asteroid barreling towards Earth is undoubtedly beautiful, as is a mushroom cloud.
Interestingly this is a question I've had for a while. Night brings potentially deadly cold, predators, a drastic limit in vision so why do we find the sunset and night sky beautiful. Why do we stop and watch the sun set - something that happens every day - rather than prepare for the food and warmth we need to survive the night?
Maybe it's that we only pause to observe them and realize they're beautiful, when we're feeling safe enough?

"Beautiful sunset" evokes being on a calm sea shore with a loved one, feeling safe. It does not evoke being on a farm and looking up while doing chores and wishing they'd be over already. It does not evoke being stranded on an island, half-starved to death.

We think it's beautiful because it's like a background that we don't have to think about. If that background were hostile, we'd have to think and we would not think it looks beautiful.
You're entering the domain of philosophy. There's a concept of "the sublime" that's been richly explored in literature. If you find the subject interesting, I'd recommend you starting with Immanuel Kant.
My guess is that your framing presumes the opposite of the evolutionary reality. I think this time of day probably wasn't a big risk for us, that we were often the hunters and not just the hunted, and that the sense of beauty comes from — as the previous poster suggests — us having evolved to find it so.

That said, I'm discovering from living very close to a lake for the last year that mosquitos are a right pain around sunset…

I mean, I think the reason I would say the night sky is “beautiful” is because the meaning of the word for me is constructed from the experiences I’ve had in which I’ve heard other people use the word. So I’d agree that the night sky is “beautiful”, but not because I somehow have access to a deeper meaning of the word or the sky than an LLM does.

As someone who (long ago) studied philosophy of mind and (Chomskian) linguistics, it’s striking how much LLMs have shrunk the space available to people who want to maintain that the brain is special & there’s a qualitative (rather than just quantitative) difference between mind and machine and yet still be monists.

The more I learn about AI, biology and the brain, the more it seems to me that the difference between life and machines is just complexity.

People are just really really complex machines.

However there are clearly qualitative differences between the human mind and any machines we know of yet, and those qualitative differences are emergent properties, in the same way that a rabbit is qualitatively different than a stone or a chunk of wood.

I also think most of the recent AI experts/optimists underestimate how complex the mind is. I'm not at the cutting edge of how LLMs are being trained and architected, but the sense I have is we haven't modelled the diversity of connections in the mind or diversity of cell types. E.g. Transcriptomic diversity of cell types across the adult human brain (Siletti et al., 2023, Science)

I’d say sophistication.

Observing the landscape enables us to spot useful resources and terrain features, or spot dangers and predators. We are afraid of dark enclosed spaces because they could hide dangers. Our ancestors with appropriate responses were more likely to survive.

A huge limitation of LLMs is that they have no ability to dynamically engage with the world. We’re not just passive observers, we’re participants in our environment and we learn from testing that environment through action. I know there are experiments with AIs doing this, and in a sense game playing AIs are learning about model worlds through action in them.

The idea I keep coming back to is that as far as we know it took roughly 100k-1M years for anatomically modern humans to evolve language, abstract thinking, information systems, etc. (equivalent to LLMs), but it took 100M-1B years to evolve from the first multi-celled organisms to anatomically modern humans.

In other words, human level embodiment (internal modelling of the real world and ability to navigate it) is likely at least 1000x harder than modelling human language and abstract knowledge.

And to build further on what you are saying, the way LLMs are trained and then used, they seem a bit more like DNA than the human brain in terms of how the "learning" is being done. An instance of an LLM is like a copy of DNA trained on a play of many generations of experience.

So it seems there are at least four things not yet worked out re AI reaching human level "AGI":

1) The number of weights (synapses) and parameters (neurons) needs to grow by orders of magnitude

2) We need new analogs that mimic the brains diversity of cell types and communication modes

3) We need to solve the embodiment problem, which is far from trivial and not fully understood

4) We need efficient ways for the system to continuously learn (an analog for neuroplasticity)

It may be that these are mutually reinforcing, in that solving #1 and #2 makes a lot of progress towards #3 and #4. I also suspect that #4 is economical, in that if the cost to train a GPT-5 level model was 1,000,000 cheaper, then maybe everyone could have one that's continuously learning (and diverging), rather than everyone sharing the same training run that's static once complete.

All of this to say I still consider LLMs "intelligent", just a different kind and less complex intelligence than humans.

Id also add that 5) We need some sense of truth.

Im not quite sure if the current paradigm of LLMs are robust enough given the recent Anthropic Paper about the effect of data quality or rather the lack thereof, that a small bad sample can poison the well and that this doesn’t get better with more data. Especially in conjunction with 4) some sense of truth becomes crucial in my eyes (Question in my eyes is how does this work? Something verifiable and understandable like lean would be great but how does this work with more fuzzy topics…).

>A huge limitation of LLMs is that they have no ability to dynamically engage with the world.

They can ask for input, they can choose URLs to access and interpret results in both situations. Whilst very limited, that is engagement.

Think about someone with physical impairments, like Hawking (the now dead theoretical physicist) had. You could have similar impairments from birth and still, I conjecture, be analytically one of the greatest minds of a generation.

If you were locked in a room {a non-Chinese room!}, with your physical needs met, but could speak with anyone around the World, and of course use the internet, whilst you'd have limits to your enjoyment of life I don't think you'd be limited in the capabilities of your mind. You'd have limited understanding of social aspects to life (and physical aspects - touch, pain), but perhaps no more than some of us already do.

> A huge limitation of LLMs is that they have no ability to dynamically engage with the world.

A pure LLM is static and can’t learn, but give an agent a read-write data store and suddenly it can actually learn things-give it a markdown file of “learnings”, prompt it to consider updating the file at the end of each interaction, then load it into the context at the start of the next… (and that’s a really basic implementation of the idea, there are much more complex versions of the same thing)

That's going to run into context limitations fairly quickly. Even if you distill the knowledge.

True learning would mean constant dynamic training of the full system. That's essentially the difference between LLM training and human learning. LLM training is one-shot, human learning is continuous.

The other big difference is that human learning is embodied. We get physical experiences of everything in 3D + time, which means every human has embedded pre-rational models of gravity, momentum, rotation, heat, friction, and other basic physical concepts.

We also learn to associate relationship situations with the endocrine system changes we call emotions.

The ability to formalise those abstractions and manipulate them symbolically comes much later, if it happens at all. It's very much the plus pack for human experience and isn't part of the basic package.

LLMs start from the other end - from that one limited set of symbols we call written language.

It turns out a fair amount of experience is encoded in the structures of written language, so language training can abstract that. But language is the lossy ad hoc representation of the underlying experiences, and using symbol statistics exclusively is a dead end.

Multimodal training still isn't physical. 2D video models still glitch noticeably because they don't have a 3D world to refer to. The glitching will always be there until training becomes truly 3D.

Yes, and give it tools and it can sense and interact with its surroundings.
Oh, I just realized you maybe we're referring to Kopple when you said sophistication?

If so, then yes, that might be a good measure. I'm not deep enough in this to have an opinion on if it's the best measure. There are a few integrated information theories and I am still getting my head wrapped around them...

I think the main mistake with this is that the concept of a "complex machine" has no meaning.

A “machine” is precisely what eliminates complexity by design. "People are complex machines" already has no meaning and then adding just and really doesn't make the statement more meaningful it makes it even more confused and meaningless.

The older I get the more obvious it becomes the idea of a "thinking machine" is a meaningless absurdity.

What we really think we want is a type of synthetic biological thinking organism that somehow still inherits the useful properties of a machine. If we say it that way though the absurdity is obvious and no one alive reading this will ever witness anything like that. Then we wouldn't be able to pretend we live at some special time in history that gets to see the birth of this new organism.

I think we are talking past each other a bit, probably because we have been exposed to different sets of information on a very complicated and diverse topic.

Have you ever explored the visual simulations of what goes on inside a cell or in protein interactions?

For example what happens inside a cell leading up to mitosis?

https://m.youtube.com/user/RCSBProteinDataBank

Is a pretty cool resource, I recommend the shorter videos of the visual simulations.

This category of perspective is critical to the point I was making. Another might be the meaning / definition of complexity, which I don't think is well understood yet and might be the crux. For me to say "the difference between life and what we call machines is just complexity" would require the same understanding of "complexity" to have shared meaning.

I'm not exactly sure what complexity is, and I'm not sure anyone does yet, but the closest I feel I've come is maybe integrated information theory, and some loose concept of functional information density.

So while it probably seemed like I was making a shallow case at a surface level, I was actually trying to convey that when one digs into science at all levels of abstraction, the differences between life and machines seem to fall more on a spectrum.

> I think the reason I would say the night sky is “beautiful” is because the meaning of the word for me is constructed from the experiences I’ve had in which I’ve heard other people use the word.

Ok but you don’t look at every night sky or every sunset and say “wow that’s beautiful”

There’s a quality to it - not because you heard someone say it but because you experience it

> Ok but you don’t look at every night sky or every sunset and say “wow that’s beautiful

Exactly - because it's a semantic shorthand. Sunsets are fucking boring, ugly, transient phenomena. Watching a sunset while feeling safe and relaxed, maybe in a company of your love interest who's just as high on endorphins as you are right now - this is what feels beautiful. This is a sunset that's beautiful. But the sunset is just a pointer to the experience, something others can relate to, not actually the source of it.

I’ve seen incredible sunsets while stressed depressed and worse. Are you saying sunsets cannot be experienced as beautiful on their own?
Because words are much lower bandwidth than speech. But if you were “told” about a sunset by means of a Matrix style direct mind uploading of an experience, it would seem just as real and vivid. That’s a quantitative difference in bandwidth, not a qualitative difference in character.
my thought exactly
It’s interesting you mention linguistics because I feel a lot of the discussions around AI come back to early 20th century linguistics debates between Russel, Wittgenstein and later Chomsky. I tend to side with (later) Wittgenstein’s perception that language is inherently a social construct. He gives the example of a “game” where there’s no meaningful overlap between e.g. Olympic Games and Monopoly, yet we understand very well what game we’re talking about because of our social constructs. I would argue that LLMs are highly effective at understanding (or at least emulating) social constructs because of their training data. That makes them excellent at language even without a full understanding of the world.
You don’t have a deeper “meaning of the word,” you have an actual experience of beauty. Three word is just a label for the thing you, me, and other humans have experienced.

The machine has no experience.

The fact that things are constructed by neurons in the brain, and are a representation of other things - does not preclude your representation from being deeper and richer than LLM representations.

The patterns in experience are reduced to some dimensions in an LLM (or generative model). They do not capture all the dimensions - because the representation itself is a capture of another representation.

Personally, I have no need to reassure myself whether I am a special snowflake or not.

Whatever snowflake I am, I strongly prefer accuracy in my analogies of technology. GenAI does not capture a model of the world, it captures a model of the training data.

If video tools were that good, they would have started with voxels.

> humans say the night sky is beautiful is because they see that it is

True, but we could engineer AI to see that too, just as evolution has engineered us to see it.

Our innate emotional responses to things has been honed by evolution to be adaptive, to serve a purpose, but the things that trigger these various responses are not going to be super specific. e.g. We may derive pleasure from eating a nice juicy peach, but that doesn't mean that is encoded in our DNA - it's going to be primarily the reaction to sugar/sweetness, a good source of energy, that we are reacting to.

Similarly, we may have an emotional reaction to certain pieces of modern art or artistic expression, but clearly evolution has not selected for those specifically, but rather it is the artist triggering innate responses that evolved for reasons other than appreciation of art.

It's hard to guess what innate responses, that were actually selected for, are being triggered by our response to the night sky, and I'm also not sure how much of our response is purely visual (beauty) as opposed to wonder or awe. Maybe it's an attraction to the unknown, or sense of size and opportunity, with these being the universals that are actually adaptive.

In any case, if we figured out the specifics of our hard wired emotional reactions, that evolution as given us, then we could choose to engineer emotional AI that had those same reactions, in just as genuine a way as we do, if we chose to.

Beauty standard changes over time, see how people perceive body fat in the past few hundred years. We learns what is beautiful from our peers.

Taste can be acquired and can be cultural. See how people used to had their coffee.

Comparing human to LLM is like comparing something constantly changing to something random -- we can't compare them directly, we need a good model for each of them before comparing.

Has there been a point in human history where mainstream society denied the beauty in nature?
In a local Facebook group, in a discussion about zoning, someone seriously said "we need less parks and more parking lots", so... Maybe?
What about a blind human? Are they just like an LLM?

What about a multimodal model trained on video? Is that like a human?

This is actually a great point but for the opposite reason - if you ask a blind person if the night sky is beautiful, they would say they don't know because they've never seen it (they might add that they've heard other people describe it as such). Meanwhile, I just asked ChatGPT "Do you think the night sky is beautiful?" And it responded "Yes, I do..." and went on to explain why while describing senses its incapable of experiencing.
Wha if you asked the blind man to play the role of helpful assistant
Now that's an interesting point of view.

Involving blind people would be an interesting experiment.

Anyway, until the sixties the ability to play a game of chess was seen as intelligence, and until about 2-3 years ago the "turing test" was considered the main yardstick (even though apparently some people talked to eliza at the time like an actual human being). I wonder what the new one is, and how often it will be moved again.

I just asked Gemini and it said "I don't have eyes or the capacity to feel emotions like "beauty""
Claude 4.5

Q) Do you think the night sky is beautiful

A) I find the night sky genuinely captivating. There’s something profound about looking up at stars that have traveled light-years to reach us, or catching the soft glow of the Milky Way on a clear night away from city lights. The vastness it reveals is humbling. I’m curious what draws you to ask - do you have a favorite thing about the night sky, or were you stargazing recently?

Claude is multimodal, it has been trained on images
>> Meanwhile, I just asked ChatGPT "Do you think the night sky is beautiful?" And it responded "Yes, I do..." and went on to explain why while describing senses its incapable of experiencing.

> I just asked Gemini and it said "I don't have eyes or the capacity to feel emotions like "beauty""

That means nothing, except perhaps that Google probably found lies about "senses [Gemini] incapable of experiencing" to be an embarrassment, and put effort into specifically suppressing those responses.

Interesting. But not not only blind people.

I'm gooing to try this question this weekend with some people, as h0 hypotesis i think the answer i will get would be usually like "what an odd question" or "why do you ask".

Guys you realize that you can go to ChatGPT right now and it can generate an actual picture of the night sky because it has seen thousands of pictures and drawings of the actual night sky right?

Your logic is flawed because your knowledge is outdated. LLMs are encoding visual data, not just “language” data.

You misunderstand how the multimodal piece works. The fundamental unit of encoding here is still semantic. Not the same in your mind: you don’t need to know the word for sunset to experience the sunset.
No you misunderstand the ground truth reality.

The LLM doesn’t need words as input. It can output pictures from pictures. Semantic words don’t have to be part of the equation at all.

Also you have to note that serialized one dimensional string encodings are universal. Anything on the face of the earth and the universe itself can be encoded into a sting of just two characters: one and zero. That’s means anything can be translated to a linear series of symbols and the LLM can be trained on it. The LLM can be trained on anything.

The multimodal architectures I’ve seen are still text at the layer between modalities. And the image embedding and text embedding are kept completely separate. Not like where your brain where single neurons are used in all sorts of things.

Yes, they can generate images from images, but that doesn’t mean you’ll get anything meaningful without human instruction on top.

Yes, serialized one dimensional strings can encode anything. But that’s just the message content. If I wrote down my genetic sequence on a piece of paper and dropped it in a bottle in the sea, I don’t need to worry about accidentally fathering any children.

You’re mixing representational capacity with representational intent. That’s what I meant in my initial example about encodings. The model doesn’t care whether it’s text, pixels, or sound. All of it can be mapped into the same kind of high dimensional space where patterns align by structure rather than category. “Semantic” is just our label for how those internal relationships appear when we interpret them through language.

Anything in the universe can be encoded this way. Every possible form, whether visual, auditory, physical, or abstract, can be represented as a series of numbers or symbols. With enough data, an LLM can be trained on any of it. LLMs are universal because their architecture doesn’t depend on the nature of the data, only on the consistency of patterns within it. The so called semantic encoding is simply the internal coordinate system the model builds to organize and decode meaning from those encodings. It is not limited to language; it is a general representation of structure and relationship.

And the genome in a bottle example actually supports this. The DNA string does encode a living organism; it just needs the right decoding environment. LLMs serve that role for their training domains. With the right bridge, like a diffusion model or a VAE, a text latent can unfold into an image distribution that’s statistically consistent with real light data.

So the meaning isn’t in the words. It’s in the shape of the data.

Here's how I've been explaining this to non-tech people recently, including the CEO where I work: Language is all about compressing concepts and sharing them, and it's lossy.

You can use a thousand words to describe the taste of chocolate, but it will never transmit the actual taste. You can write a book about how to drive a car, but it will only at best prepare that person for what to practice when they start driving, it won't make them proficient at driving a car without experiencing it themselves, physically.

Language isn't enough. It never will be.

The taste of chocolate is also assuming information-theoretic models are correct and not a use-based, pragmatic theory of meaning.

I don't agree with information-theoretic models in this context but we come to the same conclusion.

Loss only makes sense if there was a fixed “original” but there is not. The information-theoretic model creates a solvable engineering problem. We just aren't solving the right problem then with LLMs.

I think it is more than that. The path forward with a use theory of meaning is even less clear.

The driving example is actually a great example of the use theory of meaning and not the information-theoretic.

The meaning of “driving” emerges from this lived activity, not from abstract definitions. You don't encode an abstract meaning of driving that is then transmitted on a noisy channel of language.

The meaning of driving emerges from the physical act of driving. If you only ever mount a camera on the headrest and operate the steering wheel and pedals remotely from a distance you still don't "understand" the meaning of "driving".

Whatever data stream you want to come up with, trying to extract the meaning of "driving" from that data stream makes no sense.

Trying to extract the "meaning" of driving from driving language game syntax with language models is just complete nonsense. There is no meaning to be found even if scaled in the limit.

Humans perceive phenomena via senses, and then carve categories or concepts to understand them. This is a process of abstraction and each idea has an associated qualia. Then use language to describe these concepts. As such, a concept is grounded either by actual phenomena or operations, or is a composition of other grounded concepts. The creation of categories and grounding them involves constant feedback from the environment - and is a creative process, and we as agents have "skin in the game", in the sense that we get the rewards/punishments for our understanding and actions.

Map vs Territory is a common analogy. Maps describe territories but in an abstract and lossy manner.

But, most of us dont construct grounded concepts in our understanding. We carry a muddle of ungrounded ideas - some told to us by others, and some we intuit directly. There is a long tradition of attempting to think clearly all the way from Socrates, Descartes, Feynman etc.. where an attempt is made to ground the ideas we have. Try explaining your ideas to others, and soon, you will hit the illusion of explanatory depth.

LLM is a map and is a useful tool, but it doesnt interact with the territory, and it does not have skin in the game, and as a result, it cant carve new categories in a learning process that we have as humans.

The human experience is also several degrees removed from the „real“ world. I don’t think sensory chauvinism is a useful tool in assessing intelligence potential.
This comment is hallucinatory in nature as it is in direct conflict with the in the ground reality of LLMs.

The LLM has both light (aka photons) and language encoded into its very core. It is not just language. You seemed to have missed the boat with all the ai generated visuals and videos that are now inundating the internet.

Your flawed logic is essentially that LLMs are unable to model the real world because they don’t encode photonic data into the model. Instead you think they only encode language data which is an incredibly lossy description of reality. And this line of logic flies against the ground truth reality of the fact that LLMs ARE trained with video and pictures which are essentially photons encoded into data.

So what should be the proper conclusion? Well look at the generated visual output of LLMs. These models can generate video that is highly convincing and often with flaws as well but often these videos are indistinguishable from reality. That means the models have very well done but flawed simulations of reality.

In fact those videos demonstrate that LLMs have extremely high causal understanding of reality. They know cause and effect it’s just the understanding is imperfect. They understand like 85 percent of it. Just look at those videos of penguins on trampolines. The LLM understands what happens as an effect after a penguin jumps on a trampoline but sometimes an extra penguin teleports in which shows that the understanding is high but not fully accurate or complete.

> but the LLM is not sensing actual photons, nor experiencing actual light cone stimulation

Neither is animal brain. It's processing the signals produced by the sensors. Once the world model is programmed/auto-built in the brain, it doesn't matter if it's sensing real photons, it just has input pins like a transistor or arguments of a function. As long as we provide the arguments, it doesn't matter how those arguments are produced. LLMs are not different in that aspect.

> nor generating thoughts

They do during the chain-of-thought process. Generally there's no incentive to let an LLM keep mulling over a topic as that is not useful to the humans and they make money only when their gears start turning in response to a question sent by a human. But that doesn't mean that LLM doesn't have capability to do that.

> Its "world model" is several degrees removed from the real world.

Just because animal brain has tools called sensors that it can get data from world without external stimuli, it doesn't mean that it's any closer to the world than an LLM. It's still getting ultra processed signals to feed to its own programming. Similarly, LLMs do interact with real world through tools as agent.

> So whatever fragment of a model it gains through learning to compress that causal chain of events does not mean much when it cannot generate the actual causal chain.

Again, a person who has gone blind, still has the world model created by the sight. This person can also no longer generate the chain of events that led to creation of that sight model. It still doesn't mean that this person's world model has become inferior.

Photons can hit my iphone's sensor in much the same way as they hit my retina and the signals from the first can upload to an artificial neural network like the latter go up my optic nerve to my biological neural network. I don't see a huge difference there.

I'll give you the brain is currently better at the world modelling stuff but Genie 3 is pretty impressive.

This is so uncannily similar to the "Mary's Room" argument in philosophy that I thought you were going there.
The workings of a human eye versus a webcam is mostly an implementation detail IMO and has nothing important to say about what underlies "intelligence" or "world models"

It's like saying a component video out cable for the SNES is intrinsically different from an HDMI for putting an image on a screen. They are different, yes, but the outcome we care about is the same.

As for causality, go and give a frontier level LLM a simple counterfactual scenario. I think 4/5 will be able to answer correctly or reasonably for most basic cases. I even tried this exercise on some examples from Judea Pearl's 2018 book, "The Book of Why". The fact that current LLMs can tackle this sort of stuff is strongly indicative of there being a decent world model locked inside many of these language models.

> then the human came up with language to describe that and then encoded the language into the LLM

No individual human invented language, we learn it from other people just like AI. I go as far as to say language was the first AGI, we've been riding the coats tails of language for a long time.

You're saying that language is an intelligence?

So, c++ is intelliengece as well?

It's an intelligence that can independently make deductions and create new ideas?

Yes, language is an evolutionary system that colonizes human brains. It doesn't need intelligence, only copying is sufficient for evolution.
You are just describing a "meme", deeper than language.

https://en.wikipedia.org/wiki/Meme

And even then, the light hitting our human eyes only describes a fraction of all the light in the world (e.g. it is missing ultraviolet patterns on plants). An LLM model of the world is shaped by our human view on the world.
Entities equiped with two limited light sensitive captors encode through a network of carbon based chemical emitters a representation of what its flawed vision system manages to grasp biased towards self preservation.

What's the real world? I'm still puzzled by this reaction I see to LLM, not because I think LLM are undervalued, because most people seem to significantly overestimate what is human intelligence.

Photons reflected off of objects are not the actual objects. I wouldn't go so far as to say that sensing these is a particularly special way to know about things compared to hearing or reading about them. Further, many humans do not sense photons yet seem to manage to have perfectly fine working world models.
That’s a good definition: it’s a model of a model.

It seems the debate seems to center around whether language models are meta-models (in the category sense) or mere encodings (information theory)?

> Its "world model" is several degrees removed from the real world.

Like insects that weave tokens

what does it mean to “generate thoughts”, exactly?
Hahahaha I can’t believe you entirely missed the irony here that humans spend all day looking at screens doing the same thing.