Just because you can make something doesn't mean you know why it's made.
There are thousands of people around the world trying to reverse engineer what is going on in the billions or trillions of parameters in an LLM.
It's a field called "Mechanistic Interpretability." The people who do the work jokingly call it "cursed" because it is so difficult and they have made so little progress so far.
Literally nobody can predict before they are released what capabilities new models will have in them.
And then, months after a model is released, people discover new abilities in it, such as decent chess playing.
I predict that this is largely an illusion staged by the lack of publishing of the datasets and training regime used.
Also an artefact of how evals have been done on a pass fail basis. So that an LLM that gets 90% of a question right is just as much a failure as one that gets 0% of the question.
So that skills appear to emerge suddenly and surprisingly only due to the flawed way that we are forced to study them. Consider the training regime, and partial success towards a goal, and emergence is far less prevalent. There was a paper on that recently, I'll see if I can find.
Until <5 years ago, AI was almost entirely a purely academic field, theoretical at that.
Those same academics admit themselves that they're surprised at how well LLMs do considering how simple(?) rudimentary(?) the logic underneath is.
I don't quite understand what you're saying. That these academics were being lazy by not properly investigating/publishing their findings? That doesn't seem right.
They may be black boxes but that doesn't change that they are operating on statistics. I see no evidence that "AI" so far is anywhere near to cracking reasoning. It doesn't matter how magical their inner-workings. They have been trained to spit out plausible text and images (and more limited, video).
It's very commonly understood by those of us who actually produce and consume AI research that "knowing how" LLMs (and Neural Nets for that matter) work doesn't mean knowing how to build one. It means mathematically proving and understanding "how" the steps we put the LLM through when training are able to produce the output we get when testing.
We know how to build it. We don't understand how it's producing the output it does based off what we give it
This still doesn't make sense to me. As far as I'm concerned the gold standard of understanding something is being able to construct a program that replicates it, which is exactly what we can do with LLMs.
We know exactly how llms work (relatively simple maths), and to a large extent even why they work (backpropagation updates weights to more closely approximate the desired function). There are open questions relating to LLMs of course - we don't understand what the space of potential LLM-like things looks like and how the features in that space relate to subjective performance (although note that transformers were designed based on a theory that they would perform better, not just randomly generated or inspired by the muse). We also don't know to what extent the output of LLMs can be approximated by simpler symbolic systems, or how to extract such systems from LLMs when they do exist. Those are really interesting questions, but they're not questions about 'how LLMs work'.
I dislike the 'LLMs are magic' framing that seems to be taking over the world. Nobody thinks that Taylor expansion is magical, but LLMs are doing the same sort of thing - approximating a function through a bunch of weights on a bunch of simpler functions. Just because the function we're approximating (intelligent output) is not known in advance (but can be sampled), and multi-dimensional does not fundamentally change how mysterious the process is.
> the gold standard of understanding something is being able to construct a program that replicates it
Cloning animals or even humans did not automatically make us understand how brains work. In fact, these were quite unrelated endeavors.
> I dislike the 'LLMs are magic' framing that seems to be taking over the world
Don't take that out on me. That's not what I'm saying. I'm saying there is a lack of determinism (mathematically provable, per se) in our current understanding of all AI (LLM included). There are many attempts to solve this problem. I've sat in on seminars about it myself. So far, we're not there yet
> As far as I'm concerned the gold standard of understanding something is being able to construct a program that replicates it which is exactly what we can do with LLMs.
But we actually can't! We can build a program that can build a program that is the LLM, which is not the same! I'd argue that you're right insofar as training is concerned. We understand training very well. But the actual model, how it operates, what it actually knows, we don't know how to build that, we don't know what weights to put where.
Malbolge is an esoteric programming language designed to be impossible to use. The first program written in it wasn't written by a human, it was written by another program.
It's a bit similar to the problem of neuroscience. We understand how a single neuron works pretty well, or even a small count of them. Even a few subsystems like balance or lower level vision. A bit on muscle control and endocrine system.
We do not understand language, grammar, music, only partly emotions, or especially sentience and consciousness.
Further, we don't understand how the disparate systems are integrated together.
There are thousands of people around the world trying to reverse engineer what is going on in the billions or trillions of parameters in an LLM.
It's a field called "Mechanistic Interpretability." The people who do the work jokingly call it "cursed" because it is so difficult and they have made so little progress so far.
Literally nobody can predict before they are released what capabilities new models will have in them.
And then, months after a model is released, people discover new abilities in it, such as decent chess playing.
They are black boxes.