Hacker News new | ask | show | jobs
by pizza 1241 days ago
5 bucks says within a year there’ll be some innovation that shrinks this by 2 orders of magnitude. Either from much cheaper compute cost (eg OPUs) or much more efficient training. Hell, there ought to be some way to leapfrog these innovations in such a way that the huge model of yesteryear becomes a more powerful optimizer/loss function itself. That’d just about solve the “hands off my unique shapes!” problem of acceptable training data trawling too :)