Hacker News new | ask | show | jobs
by sendtown_expwy 1914 days ago
I would guess that an average FAANG ML engineer could code up and successfully execute a forward/backward pass on a GPT-1 or GPT-2 model with a day of effort or less. (GPT-3 a little harder, but not significantly). But is that model actually going to perform well? Most likely no. Model performance varies significantly due to subtle details in data processing implementations, seemingly insignificant details in code, and even from different numerical methods of calculating the same semantics.

If you don't believe me, consider that many ML researchers track their commits (or exact code versions) extremely carefully, because oftentimes they will make some change (or changes) they think are inconsequential and later find that actually, their model broke. If they made too many changes, whoops, guess you have to binary search over the diff to see what happened since your last "good run".

If the people who spent months (if not years) tuning a model can't tell whether it will work from the code, how could anyone else? Most ML researchers will not bother with most code that doesn't give proof of results (in terms of a model that can actually be evaluated) because it is just so unlikely that it will actually work well. Now, it might "work" in the sense that it converges and does something when you prompt it with examples. But will this GPT-3 reimplementation actually outperform say, the 10x smaller T5 checkpoint that was released by Google, or the other smaller language models others have released? If it doesn't, it's hard to argue that its very useful at all.

I think that's the spirit of why the original commenter said what they did, but I still do applaud the efforts of this team (and hope that their implementation is, in fact, highly performant!)