Hacker News new | ask | show | jobs
by sigbottle 52 days ago
I think that a lot of models have to sprinkle in a lot of "fluff" in their thinking to stay within the right distribution. They only have language as their only medium; the way we annotate context is via brackets and then training them to hopefully respect the brackets. I'd imagine that either top labs explicitly train, or through the RL process the models implicitly learn, to spam tokens to keep them 'within distribution' since everything's going through the same channel and there's no fine grained separation between things.

Philosophically, it's not like you're a detached observer who simply reasons over all possible hypotheses. Ever get stuck in a dead end and find it hard to dig yourself out? If you were a detached observer, it'd be pretty easy to just switch gears. But it's not (for humans).

1 comments

Language really only exists at the input and output surfaces of the models. In the middle it's all numerical values. Which you might be quick in relating to just being a numeric cypher of the words, which while not totally false, it misses that it is also a numeric cypher of anything. You can train a transformer on anything that you can assign tokens to.
That's not my point. I'm talking about something far more mundane - transformers do inference over raw tokens and perform an n^2 loop over tokens, but tokens are itself the context. So it's better to have more raw tokens in your input that all nudge it to the right idea space, even if technically it doesn't need all those tokens. ICL and CoT have a lot of study into them at this point, these are well known phenomena.

This applies to any transformer-based architecture including JEPA which tries to make the tokens predict some kind of latent space (in which I've separately heard arguments as to why the two are equivalent, but that's a different discussion.)

Similarly, none of our comments actually exist as language on Hacker News—just numerical values from the ASCII table. We're deluding each other into thinking we're using language.
I believe it's reasonably clear that our thought processes generally occur outside of language. We do use language during explicit reasoning, but most thinking occurs heuristically. It's on par with the thinking of animals that don't use language but do complex behavior.

It not clear to me how well that maps onto LLMs. Our wetware predates language, and isn't derived from it. Language is built on top. LLMs are derived from language. I think that means that the intermediate layers are very different from the brain neurons, but I don't know. It's eerie how well the former emulates the latter.

There’s an interesting thing there that I believe varies person to person. My understanding is that some people do think in a more symbolic/heuristic way, some rely very heavily on their inner monologue to make sense of things (I am in the latter camp, and only have a single core language processor so pretty much cannot come up with coherent thoughts if I’m concentrating on what someone else is saying)

Even more interesting, and getting off on a bit of a tangent, there is also a mode that I use for revealing emotions that I don’t have words for (alexythmia): I open up a text editor, stare off into space, and let my fingers type without “observing” the stream of words coming out. I then go back and read what I “wrote” and often end up understanding how I’m feeling much better than I did. It’s weird.

Edit: also, playing with local models through e.g. llama-cpp in “thinking mode” is super fascinating for me. The “thought process” that comes out before the real answer often feels pretty familiar when I reflect on my own inner monologue, although sometimes it’s frustrating for me because I see where their “thinking” went off the rails and want to correct it.

"The great enemy of communication, we find, is the illusion of it" —William H. Whyte