| Since this post is based on my 2014 blog post (https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/ ), I thought I might comment. I tried really hard to use topology as a way to understand neural networks, for example in these follow ups: - https://colah.github.io/posts/2014-10-Visualizing-MNIST/ - https://colah.github.io/posts/2015-01-Visualizing-Representa... There are places I've found the topological perspective useful, but after a decade of grappling with trying to understand what goes on inside neural networks, I just haven't gotten that much traction out of it. I've had a lot more success with: * The linear representation hypothesis - The idea that "concepts" (features) correspond to directions in neural networks. * The idea of circuits - networks of such connected concepts. Some selected related writing: - https://distill.pub/2020/circuits/zoom-in/ - https://transformer-circuits.pub/2022/mech-interp-essay/inde... - https://transformer-circuits.pub/2025/attribution-graphs/bio... |
- LLMs are basically just slightly better `n-gram` models
- The idea of "just" predicting the next token, as if next-token-prediction implies a model must be dumb
(I wonder if this [1] popular response to Karpathy's RNN [2] post is partly to blame for people equating language neural nets with n-gram models. The stochastic parrot paper [3] also somewhat equates LLMs and n-gram models, e.g. "although she primarily had n-gram models in mind, the conclusions remain apt and relevant". I guess there was a time where they were more equivalent, before the nets got really really good)
[1] https://nbviewer.org/gist/yoavg/d76121dfde2618422139
[2] https://karpathy.github.io/2015/05/21/rnn-effectiveness/
[3] https://dl.acm.org/doi/pdf/10.1145/3442188.3445922