Hacker News new | ask | show | jobs
by supermatt 1276 days ago
thats a real shame. Any idea why GPT-3 generations differ so much? Maybe eleuther/whatever could look at refining their model in a similar way.
1 comments

It's most likely because of the actual runtime size of the model. All the open models are sized for consumer-grade devices, and thus 10x-100x smaller than whatever OpenAI runs (probably around 100+ GB in VRAM, maybe even some multiples more). This is one of the main reasons why their API makes business sense - it's not practical at all to run models like GPT-3 yourself, and training them costs incredible amount of money too.