Hacker News new | ask | show | jobs
by nickpsecurity 845 days ago
GPT3-176B cost $30 million dollars in compute plus millions in design, preprocessing, and operations. Then, it was able to perform as much better than prior architectures as it does today. You might want to include that in your challenge for competing models.

Let’s rephrase it. If their architecture is superior, and they have $30 million dollars, and similar preparation for training, and similar operational teams during training, then we can see if they can beat the model they’re comparing themselves to. Except, the alternatives don’t have tens of millions of dollars with the best support teams. So, the proof you seek hasn’t had a chance to happen due to severe lack of resources.

Hence, comparisons to GPT2 and small versions of GPT3. Even that might not be fair given the money and teams behind even small GPT3’s. Execution of the project is as critical for success as the model architecture.