| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by miki123211 80 days ago

> would use less electricity

Sorry to shatter your bubble, but this is patently false, LLMs are far more efficient on hardware that simultaneously serves many requests at once.

There's also the (environmental and monetary) cost of producing overpowered devices that sit idle when you're not using them, in contrast to a cloud GPU, which can be rented out to whoever needs it at a given moment, potentially at a lower cost during periods of lower demand.

Many LLM workloads aren't even that latency sensitive, so it's far easier to move them closer to renewable energy than to move that energy closer to you.

4 comments

zozbot234 80 days ago

> LLMs are far more efficient on hardware that simultaneously serves many requests at once.

The LLM inference itself may be more efficient (though this may be impacted by different throughput vs. latency tradeoffs; local inference makes it easier to run with higher latency) but making the hardware is not. The cost for datacenter-class hardware is orders of magnitude higher, and repurposing existing hardware is a real gain in efficiency.

link

Tepix 80 days ago

Seems doubtful. The utilisation will be super high for data center silicon whereas your PC or phone at home is mostly idle.

link

zozbot234 80 days ago

> your PC or phone at home is mostly idle

If you're purely repurposing hardware that you need anyway for other uses, that doesn't really matter.

(Besides, for that matter, your utilization might actually rise if you're making do with potato-class hardware that can only achieve low throughput and high latency. You'd be running inference in the background, basically at all times.)

link

ysleepy 80 days ago

I'm actually not sure that's true. Apart from people buying the device with or without the neural accelerator, the perf/watt could be on par or better with the big iron. The efficiency sweet-spot is usually below the peak performance point, see big.little architectures etc.

link

kortilla 80 days ago

Well this is an article about running on hardware I already have in my house. In the winter that’s just a little extra electricity that converts into “free” resistive heating.

link

woadwarrior01 80 days ago

> Sorry to shatter your bubble, but this is patently false, LLMs are far more efficient on hardware that simultaneously serves many requests at once.

You might want to read this: https://arxiv.org/abs/2502.05317v2

link