| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 9dev 38 days ago
	It would be awful if running models locally became the primary way of using LLMs. On dedicated servers sharing GPUs across requests, energy usage and environmental impact is way lower overall than if everyone and their mother suddenly needs beefy GPUs. It’s the equivalent of everyone commuting alone in their own car instead of a train picking up hundreds at once.

4 comments

zozbot234 38 days ago

You can batch requests when running locally too, if you're using a model with low-enough requirements for KV-cache; essentially targeting the same resource efficiencies that the big providers rely on. This is useful since it gives you more compute throughput "for free" during decode, even when running on very limited hardware.

link

9dev 38 days ago

That’s still orders of magnitude less efficient, and also not how most people use AI, or probably will use AI.

link

amelius 38 days ago

It's even more awful if the compute capital is owned by only a handful of players.

link

doctorwho42 38 days ago

So instead, we are building data centers for capacity we aren't sure exists...

Data centers that are orders of magnitude more resource intensive than anything than came before. Hell there is one planned for Utah that I saw would consume 2x the power of the states current usage, which would there by triple a single states power usage overnight.

Tell me how that is somehow more efficient?

link

9dev 38 days ago

It's not. But the choice isn't "use cloud AI from a data center" vs. "do not use AI." Way too much money has been sunk into the technology, and it is far too seductive to ignore, so AI is here to stay.

So the question is rather what the most efficient way to serve AI is - locally or from the cloud, so: data centres.

link

duskdozer 38 days ago

Maybe people would target their use more appropriately, then.

link

9dev 38 days ago

Just like people would drive their car as little as possible out of concern for the environment..?

link