|
|
|
|
|
by aprilthird2021
766 days ago
|
|
It's very commonly understood by those of us who actually produce and consume AI research that "knowing how" LLMs (and Neural Nets for that matter) work doesn't mean knowing how to build one. It means mathematically proving and understanding "how" the steps we put the LLM through when training are able to produce the output we get when testing. We know how to build it. We don't understand how it's producing the output it does based off what we give it |
|
We know exactly how llms work (relatively simple maths), and to a large extent even why they work (backpropagation updates weights to more closely approximate the desired function). There are open questions relating to LLMs of course - we don't understand what the space of potential LLM-like things looks like and how the features in that space relate to subjective performance (although note that transformers were designed based on a theory that they would perform better, not just randomly generated or inspired by the muse). We also don't know to what extent the output of LLMs can be approximated by simpler symbolic systems, or how to extract such systems from LLMs when they do exist. Those are really interesting questions, but they're not questions about 'how LLMs work'.
I dislike the 'LLMs are magic' framing that seems to be taking over the world. Nobody thinks that Taylor expansion is magical, but LLMs are doing the same sort of thing - approximating a function through a bunch of weights on a bunch of simpler functions. Just because the function we're approximating (intelligent output) is not known in advance (but can be sampled), and multi-dimensional does not fundamentally change how mysterious the process is.