| HN Mirror

It's most likely because of the actual runtime size of the model. All the open models are sized for consumer-grade devices, and thus 10x-100x smaller than whatever OpenAI runs (probably around 100+ GB in VRAM, maybe even some multiples more). This is one of the main reasons why their API makes business sense - it's not practical at all to run models like GPT-3 yourself, and training them costs incredible amount of money too.