Hacker News new | ask | show | jobs
by throwifasd 1250 days ago
This post [0] is a good primer and they are talking about a 20b model gpt3 has 175b. Their other posts go more into details but yes, it really is a massive operation.

Some hard facts from here [1] talking about BLOOM with 175b parameters..

>> Installing the full 175B version is a challenge though as it requires around 350GB of GPU VRAM, which is not something one can easily afford.

But hey why deploy a model like chatGPT at all when you guys can be confidently incorrect all by yourselves.

[0] https://nlpcloud.com/deploying-gpt-neox-20-production-focus-...

[1] https://nlpcloud.com/chatgpt-open-source-alternatives.html

1 comments

So it sounds like this is a question of loading the model into VRAM, and not a question of the cost of a single query. I assume once a model is loaded, many queries can be serviced by that model quickly.

There's nothing incorrect about my assertion. If it were to actually take many GPUs to service one query, then there is no mass scale cost viable consumer product. That's just a clear economic fact. Regardless if a model could be theoretically spun up in a cost inefficient manner.

And even 100s of GB of VRAM is not far off from consumer hardware. Look at how quickly graphics ram has expanded over time. About ~10x in ~10 years for high end cards, at a cursory glance at various Nvidia cards. At the same trajectory we could see a 400GB vram card within the next decade (though lots of assumptions)

> I assume once a model is loaded, many queries can be serviced by that model quickly.

Depends. If you have room to load the whole model, yes. If you need to swap in and out parts of the model, then it matters if you have enough RAM.

You really are like a chatbot... look at the last three node sizes and the density of ram in them. It's not gonna happen as fast as you dream about it especially not with the discounts of the last Gens. The hope is to go to fp4 if you want to run it on consumer hardware and we are still not talking to about 2-3 cards. Why not at least try to Google before hammering down on stupid and uninformed hot takes?