|
Correction: hard work, clever people, and massive increases in computational power. I'm sure all three matter quite a lot here. I'm not saying that if your goal is to come up with a usable general learning algorithm that it is just "as simple as neural network and done." What I'm saying is the converse: that the general learning capabilities of LLMs are most likely explained by the fact that, well, they are general learners, via the universal approximation theorem. Your other comment, I think, suggests why we're just now starting to see more general learning capabilities out of neural networks, when the theory says that a single hidden layer is enough: with a single hidden layer, you really need to get all the weights pretty close to "right" to see general learning/universal approximator behavior. When you have more than one hidden layer, then some of your weights can be wrong, as long as the errors are corrected in later layers. Now, I'm not an AI researcher or even anyone who works anywhere near this area, but I did take a course or two in grad school, and this seems at least intuitively plausible to me. If there are researchers in the field reading this, I'd definitely like to hear their takes, because I'm totally open to being completely wrong here. I'd rather be one of the lucky 10,000 than just have this half-baked idea that seems right. :-) |
No matter how clever the programmer there’s no encoding GPT4 with that. It was the hardware constraints that required programmers to be clever to begin with. These days it’s much more “copy paste the math directly because our data set is so robust and our hardware and networks so performant clever low level hacks don’t matter.”
Especially at big tech where they’ve used their own AI to guide them; the ability to just ask an ML system to simplify math has existed for a few years now, we’ve all seen how clever outputs were set aside for safe linear hacking.
Truly clever work is occurring in more traditional sciences like chemistry and biology these days.