Hacker News new | ask | show | jobs
by solid_fuel 25 days ago
> I am getting tired of hearing "next token predictor" from carbon-based facial expression predictors.

That's not even a clever swipe, and it's tiring seeing such a knee-jerk reaction to a completely accurate description. LLMs are next token predictors. People are not. Humans have an inner world and subjective experience. Humans learn through their experiences, not just backprop.

Token predictors are lesser, they are not alive and will never be alive.

3 comments

That is not knowledge, that is assumption.

Let's assume we have infinite memory with constant time lookups. With a sufficiently large lookup table, you could exactly replicate the behavior of any person. You could encode it as a next-token predictor: you have precomputed every possible prefix and assigned it a next token. This is a Chinese room, but it is completely indistinguishable from an intelligent, sentient person. There is no experiment you can design to slip a piece of paper (a prompt) under the door to determine whether it is Bob or the lookup table clone of Bob inside the room.

Does that make the lookup table conscious or alive? Undefined. It's the wrong question. Or it's not a question science can address.

So we cannot dismiss on it's face the idea that next token predictors "are not and never will be alive" unless by "alive" you simply mean "biological," but that's not really what's debatable.

The argument is also very brittle because they are not in fact all next token predictors. I doubt people making this argument would be willing to concede that diffusion models are more likely to be conscious than causal models (which I do not believe but is an implication of the argument).

I'm not saying that they are conscious or sentient to be clear, but the reductionist argument that they are next token predictors and therefore don't have some property humans have is not an argument. That's going from A directly to Z. You need to flesh out the bit in the middle because that doesn't follow.

Right. Humans are a biological computer. They have a state and they compute an output. I had to look this up (and use AI) but an estimate for the state of a human mind is about 5 peta-bits (10^15) and the estimated processing power is about 1 exa-FLOP (10^18). Compare this to the largest models at ~5 tera-bits (10^12) of state space and ~2 x 10^14 FLOPS (for one session with some reasonable token rate).

Assuming the above is anywhere near true (I think there's a lot of debate about the capacity of the human mind, where data is actually stored, and where compute happens) then we are talking about 3 orders of magnitude win for humans in state and 4 orders of magnitude in compute. And we're doing all that pretty energy efficient as well.

The other big difference in humans is that we learn and the model only "learns" in context. Out "learn" space is much larger than the 1M tokens that frontier models struggle with.

Anyways, point is that a computer can appear to be alive. If we simulate the human brain perfectly and train it like a human then we'll have something that has human capabilities. LLMs have interesting capabilities but at least at this point not fully human ones (and the delta-state/compute would be a hint that there is still a large gap to cover).

human context/memory could just be an Agents.md file too that gets read instantly before your next token prediction runs. The AI can make multiple such memory files and read on demand depending on what the topic is, kind of like how as a human when you try to remember a math problem you don't go to your childhood bicycling Agents.md file either.
>LLMs are next token predictors

The point is that this is no more relevant, informative, or even accurate than "carbon-based facial expression predictors". Any phenomenon in the Universe can be described by a simple and/or insulting short phrase. In other comments you've also shouted out "autocomplete!" and "Markov chain!", as if these phrases are a knock-down argument.

"Pachinko machine", "avalanche", and "game of mad libs" has also been used:

https://news.ycombinator.com/item?id=47916405

>Humans learn through their experiences, not just backprop.

Sure, sure. And humans move through the act of walking, not just terrestrial locomotion.

>Token predictors are lesser, they are not alive and will never be alive.

And on and on it goes...

Which means what the real world? What are we supposed to see now or in the near-future? I assume you've been saying all of this stuff since at least the launch of ChatGPT. Probably longer than that.

> People are not. Humans have an inner world and subjective experience. Humans learn through their experiences, not just backprop.

https://en.wikipedia.org/wiki/Philosophical_zombie

But this is complicated and takes us sideways. Let's say somehow we can determine if LLM has inner world or/and subjective experience. Will this new gathered piece of information affect your estimate of upper bounds of LLM capabilities? It does not affect my estimate.

The Philosophical Zombie thought process is dumb, because zombies don't exist, so the entire premise depends on something that quire frankly might be impossible for the very reason it is arguing against.