Hacker News new | ask | show | jobs
by sgt101 334 days ago
I guess the gamble was that there would be a certain point where the edge cases disappeared into the noise and it didn't matter that the approximation was/is "wrong" because the behavior of the car would match the requirements of the situation even if it wasn't for the right reasons.

To be fair I remember reading about GPT2 and thinking that LLMs would blow out for similar reasons.

1 comments

Which they more or less have. Larger models are seeing negligible returns. It just turned out that scaling would hold out just enough longer to make LLMs generally useful.
Yup, and if you normalise "improvements vs time" graphs to not linear time but gpu hours invested per unit improvement we're in extremely incremental/small improvement territory as of a year ago. There are no major jumps coming. There are no more gpu hours to allocate to dumping onto this partciular bonfire to keep things looking like exponential improvement, all to keep that vc cash flowing.