Hacker News new | ask | show | jobs
by exdsq 1607 days ago
How far can we actually take current machine learning technologies by scaling the underlying hardware? Are we going to see some AI algorithms that are 20% better or an order of magnitude better? And what will that realistically look like to an end user? This will have cost a lot of money and maybe the news alone will push stock prices and mean its paid for itself but is it actually going to result in a substantially better product?
4 comments

I was just in a Twitter Spaces room and they have a live transcription feature, so as to be accessible and all, except the transcript was gibberish. If Facebook wants live translation in the Metaverse, they should hope this brings orders of magnitudes improvement to voice recognition, especially in languages other than english (by far the largest training set available)
I obviously don't know the parameters of the room you're referencing, but is it possible that the majority of the issue is on the side of poor user audio and a large number of simultaneous speakers? I find YouTube's transcription to be quite impressive with a handful of speakers and moderate audio quality.
You may find this blog post useful for thinking about AI scaling: https://www.alignmentforum.org/posts/k2SNji3jXaLGhBeYP/extra...

For general tasks like language modeling, we are still seeing predictable improvements (on the next-token-prediction loss) with increasing compute. We will very likely be able to scale things up by 10,000x or so and continue to see increasing performance.

But what does this mean for end users? We are probably going to see sigmoid-like curves, where qualitative features of these models (like being able to do math, or tell jokes, or tutor you in French, or provide therapy, or mediate international conflicts) will suddenly get a * lot * better at some point in the scaling curve. We saw this for simple arithmetic in the GPT-3 paper, where the small <1B param models were terrible at it, and then with 100B scale suddenly the model could do arithmetic with 80%+ accuracy.

Personally I would not expect diminishing returns with increased scale, instead there will be sudden leaps in ability that will be very economically valuable. And that is why Meta and others are so interested in scaling up these models.

It's linear for now (check GPT-2 vs GPT-3), but we're close to the point of diminishing returns.
It's actually not linear, its a power law. That means we need exponentially more compute, data, and model parameters to see linear improvements in performance.
Part of the problem though, is that we don't know for sure what non-linearities may be lurking out there. Maybe we add 100 more "neurons" to the net and it "goes exponential" so to speak. Or maybe not. There's still a lot we don't know about the emergent properties of these systems as they scale up.
I think these things scale sub-linearly