| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ImHereToVote 840 days ago
	I wonder if there is a wave formulation for LLM's and transformers in general?

3 comments

rfonseca 840 days ago

This paper [1] models some simple (r) NN as ODEs, and uses ODE tools to train and for inference. It’s a start.

[1] https://arxiv.org/abs/1806.07366

link

mikk14 840 days ago

I don't know if this is exactly what you are thinking about, but there are some physicists working to understand what happens in transformers: https://proceedings.neurips.cc/paper_files/paper/2023/file/b...

link

api 840 days ago

Is it really true that we don't really understand why transformers work so well?

I mean we obviously understand how they work at a pure mechanical level, and we have this analogy with lookup (keys, queries, values) and "attention," but do we really get it? Can someone explain to me why that design works so much better than lots of other things like RNNs?

Or did we just tinker a lot (a method known as "graduate student descent") guided by mathematical hunches and loose analogies with biological brains until we found something that kinda worked?

It wouldn't be the first time. AFAIK we got the idea of wings from birds and figured out how to fly with them before we had a really solid fluid mechanical understanding of how and why wings work the way they do. We just thought "hmm so birds fly, so lets try stuff that looks a bit like that..."

link

ImHereToVote 840 days ago

We really don't have a mathematical theory for large complexity. We are kinda in alchemy stage for this "science".

link

eigenket 840 days ago

You can probably write down a differential equation which models them but I doubt such a thing would be particularly interesting.

link

ImHereToVote 840 days ago

Perhaps neat to visualize.

link