| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jesus_666 366 days ago
	But that's a tiny model; it's the smallest version of Llama 3.1. The commercially marketed models are way bigger - e.g. GPT-4 has been estimated to use about 1.76 trillion parameters, 220 times more than the Llama build you mentioned. Their resource and performance requirements are vastly different. You're essentially arguing that shipping naval diesel aggregates must be trivial because you can fit a dozen moped motors on the bed of your pickup truck just fine.

1 comments

elpocko 366 days ago

Okay but these tiny models are being used by people and businesses instead of GPT-4. My point was that they consume less energy per user than a rig used for gaming.

I have no insight into how many GPT-4 users are served per GPU, but I would assume OpenAI heavily optimizes for that, considering the cost to run that thing. It's probably in the same ballpark: hundreds-thousands of concurrent user requests per GPU. Still better than one GPU per gamer, even if it requires 10x the energy.

link