| Related to ways of understanding neural networks, I've seen these views expressed a lot, which to me seem like misconceptions: - LLMs are basically just slightly better `n-gram` models - The idea of "just" predicting the next token, as if next-token-prediction implies a model must be dumb (I wonder if this [1] popular response to Karpathy's RNN [2] post is partly to blame for people equating language neural nets with n-gram models. The stochastic parrot paper [3] also somewhat equates LLMs and n-gram models, e.g. "although she primarily had n-gram models in mind, the conclusions remain apt and relevant". I guess there was a time where they were more equivalent, before the nets got really really good) [1] https://nbviewer.org/gist/yoavg/d76121dfde2618422139 [2] https://karpathy.github.io/2015/05/21/rnn-effectiveness/ [3] https://dl.acm.org/doi/pdf/10.1145/3442188.3445922 |
The whole discourse of "stochastic parrots" and "do models understand" and so on is deeply unhealthy because it should be scientific questions about mechanism, and people don't have a vocabulary for discussing the range of mechanisms which might exist inside a neural network. So instead we have lots of arguments where people project meaning onto very fuzzy ideas and the argument doesn't ground out to scientific, empirical claims.
Our recent paper reverse engineers the computation neural networks use to answer in a number of interesting cases (https://transformer-circuits.pub/2025/attribution-graphs/bio... ). We find computation that one might informally describe as "multi-step inference", "planning", and so on. I think it's maybe clarifying for this, because it grounds out to very specific empirical claims about mechanism (which we test by intervention experiments).
Of course, one can disagree with the informal language we use. I'm happy for people to use whatever language they want! I think in an ideal world, we'd move more towards talking about concrete mechanism, and we need to develop ways to talk about these informally.
There was previous discussion of our paper here: https://news.ycombinator.com/item?id=43505748