Hacker News new | ask | show | jobs
by fnordpiglet 8 days ago
However it’s disingenuous to say the inference is on the next token because it’s actually not, it’s in the models parameter space across a set of nonlinear activation functions then effectively projected into the token. The idea its predictive of the token isn’t actually the case, it really is a much more complex and more semantic relationship that ends in the series of tokens through the attention mechanism.

The article also makes this assertion that it replays everything over and over again to create each character one at a time as some way to demonstrate the autoregressive self attention mechanism but it’s really not accurate at all, and it trivializes what is going on.

I’m am not asserting LLMs are aware or conscious that’s on the surface profoundly absurd. And I do understand your point that the fact it emits in words something that seems to speak to us gives to the air of humanity that’s isnt real. However there is a very real emergent reality that our language alone appears to lead to embedding a form of thought and understanding that is latent in our use of language in communicating that is in fact coming through the model. It is not regurgitating its corpus and pattern matching because the patterns you input and it emits are not where the inference is operating, its within this enormous vector space through these complex non linear activation functions with learned residuals not in the language corpus.

It is not conscious or aware. It is something else, not human. But if you can not see it as amazing you have lost the capacity to dream.

3 comments

> But if you can not see it as amazing you have lost the capacity to dream.

I completely disagree. I think if you think these things are amazing, your dreams are incredibly limited and boring.

I remember the first time I talked to a chatbot. Not an LLM, just a regular chatbot, like ELIZA or any other dumb bot.

For a few seconds, it felt magical, like I was talking to a computer that understood me, as it made replies that were sensible to what I was saying. Then it said something incredibly stupid and jarring that made no sense, and that took the magic away. Oh, this is just a dumb computer program.

I remember the first time I talked to an LLM-powered chatbot. It was the exact same thing, except the magic feeling lasted a tiny little bit longer and was a tiny little more convincing. But it went away in the exact same way, for the exact same reason. Once you've seen the emperor without clothes, nothing brings back the magic.

That’s if you believe what’s magical is the computer is actually literally thinking and is a little human stuffed in a box. What’s magical is we’ve cracked NLP -and- abductive reasoning on a Turing machine. For those who have the capacity to dream the fact ELIZA wasn’t a person in a box didn’t break anything of the magic but inspired them with the fact that this mess of wires and cores could perform complex workflows of logic, not merely compute trigonometric functions. It could process text in protocols. It’s what inspired people to go forth and do things like email, phone, ytalk, irc, gopher, etc.

If your dreams begin and end with making a box with a person inside that’s the point I made. You can make a person in a box in 9 months, that’s not interesting or cool. What we can build with an LLM and genai in general is MUCH MUCH cooler than a metal baby. (And I think a conscious mind we control and manufacture but don’t recognize the rights of is a scarier horror movie than all the AI feat movies ever made put together, fwiw, so I’m glad your dreams were broken)

I wonder if you think and feel the same level of boredom and complete lack of magic of the Apollo module once you learn how incredibly limited and not able to explore the whole solar system this tech was.
>it’s disingenuous to say the inference is on the next token because it’s actually not, it’s in the models parameter space across a set of nonlinear activation functions then effectively projected into the token. The idea its predictive of the token isn’t actually the case, it really is a much more complex and more semantic relationship

Do you, or anyone reading, have any worthwhile links that make a strong case for this (that there is a stronger semantic relationship than simply next token prediction)? I would like to read more about this.

LLMs are interesting tech wrapped in a movement that's so toxic and destructive that it's hard to separate the baby from the bathwater. I think chatting to a statistical model is neat and sometimes useful, but not at the cost of mass societal chaos. This isn't "disruption", this is more "billionaires forcing something on us that most people hate so they can finally recreate feudalism because they live under the delusion they're the best people suited to pilot the world"