Hacker News new | ask | show | jobs
by arnaudsm 1609 days ago
It's linear for now (check GPT-2 vs GPT-3), but we're close to the point of diminishing returns.
2 comments

It's actually not linear, its a power law. That means we need exponentially more compute, data, and model parameters to see linear improvements in performance.
Part of the problem though, is that we don't know for sure what non-linearities may be lurking out there. Maybe we add 100 more "neurons" to the net and it "goes exponential" so to speak. Or maybe not. There's still a lot we don't know about the emergent properties of these systems as they scale up.