|
|
|
|
|
by jesus_666
366 days ago
|
|
But that's a tiny model; it's the smallest version of Llama 3.1. The commercially marketed models are way bigger - e.g. GPT-4 has been estimated to use about 1.76 trillion parameters, 220 times more than the Llama build you mentioned. Their resource and performance requirements are vastly different. You're essentially arguing that shipping naval diesel aggregates must be trivial because you can fit a dozen moped motors on the bed of your pickup truck just fine. |
|
I have no insight into how many GPT-4 users are served per GPU, but I would assume OpenAI heavily optimizes for that, considering the cost to run that thing. It's probably in the same ballpark: hundreds-thousands of concurrent user requests per GPU. Still better than one GPU per gamer, even if it requires 10x the energy.