|
Quoting Rich Sutton, who wrote the perfect response some years ago: "The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation."[a] The very smart folks at the Alan Turing Institute are learning firsthand how bitter the lesson can be. --- [a] http://incompleteideas.net/IncIdeas/BitterLesson.html |
Short-term improvements made by domain-specific AI result in better outputs than more general AIs, ceteris paribus. But these better outputs can then later be fed back into more powerful general purpose AIs, and consuming the data*compute product from the domain-specific models is a very effective way to train domain-specific behavior.
Today, we see this in reverse – people are training smaller models based on outputs from GPT-4. However, I expect that we'll start to see more and more training going the opposite direction in the future: Domain-specific generative models will be used to build scenarios for large general-purpose AIs to train against.
Here's a concrete example – image diffusion models are really bad at physics, so you can't tell one to draw a person upside down, because it's not well-represented in the dataset, and if you force it to with something like controlnet you typically get a disfigured and horrific image. So obviously diffusion models are not the best long-term solution for image generation. But how do you get this concept of "upside down" into an AI model? Well, maybe you add some kind of neat segmentation technique that involves using several diffusion models and rotating and stitching together their outputs. Great, you made an upside-down generator.
Now, you generate 100,000 images of "upside down" people, and the next advance in image generation AI can come along and learn that concept with ease thanks to the larger data set that it has.
So it's not just that "more compute wins", it's more like: not only does more compute wins, it wins even more because short-term improvements feed directly into the data pipeline that enables it to win.