Hacker News new | ask | show | jobs
by ImHereToVote 840 days ago
I wonder if there is a wave formulation for LLM's and transformers in general?
3 comments

This paper [1] models some simple (r) NN as ODEs, and uses ODE tools to train and for inference. It’s a start.

[1] https://arxiv.org/abs/1806.07366

I don't know if this is exactly what you are thinking about, but there are some physicists working to understand what happens in transformers: https://proceedings.neurips.cc/paper_files/paper/2023/file/b...
Is it really true that we don't really understand why transformers work so well?

I mean we obviously understand how they work at a pure mechanical level, and we have this analogy with lookup (keys, queries, values) and "attention," but do we really get it? Can someone explain to me why that design works so much better than lots of other things like RNNs?

Or did we just tinker a lot (a method known as "graduate student descent") guided by mathematical hunches and loose analogies with biological brains until we found something that kinda worked?

It wouldn't be the first time. AFAIK we got the idea of wings from birds and figured out how to fly with them before we had a really solid fluid mechanical understanding of how and why wings work the way they do. We just thought "hmm so birds fly, so lets try stuff that looks a bit like that..."

We really don't have a mathematical theory for large complexity. We are kinda in alchemy stage for this "science".
You can probably write down a differential equation which models them but I doubt such a thing would be particularly interesting.
Perhaps neat to visualize.