Hacker News new | ask | show | jobs
by jmole 1134 days ago
I think the problem is actually worse than the article implies, because there are two things being leveraged here: compute and data.

Short-term improvements made by domain-specific AI result in better outputs than more general AIs, ceteris paribus. But these better outputs can then later be fed back into more powerful general purpose AIs, and consuming the data*compute product from the domain-specific models is a very effective way to train domain-specific behavior.

Today, we see this in reverse – people are training smaller models based on outputs from GPT-4. However, I expect that we'll start to see more and more training going the opposite direction in the future: Domain-specific generative models will be used to build scenarios for large general-purpose AIs to train against.

Here's a concrete example – image diffusion models are really bad at physics, so you can't tell one to draw a person upside down, because it's not well-represented in the dataset, and if you force it to with something like controlnet you typically get a disfigured and horrific image. So obviously diffusion models are not the best long-term solution for image generation. But how do you get this concept of "upside down" into an AI model? Well, maybe you add some kind of neat segmentation technique that involves using several diffusion models and rotating and stitching together their outputs. Great, you made an upside-down generator.

Now, you generate 100,000 images of "upside down" people, and the next advance in image generation AI can come along and learn that concept with ease thanks to the larger data set that it has.

So it's not just that "more compute wins", it's more like: not only does more compute wins, it wins even more because short-term improvements feed directly into the data pipeline that enables it to win.