| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by GavinAnderegg 96 days ago
	Author here. The reason I wrote that local hardware is "sipping power most of the time" is because most of the time it's not doing LLM-related work. If you're just using your local machine (or eventually maybe even your phone) to do local LLM tasks, you're not doing that all day. I agree that data centres will be set up to be more efficient, but we're also going to need fewer of them if local LLMs take off. If that's true, overbuilding data centres is more revenue pressure for AI companies.

2 comments

benlivengood 96 days ago

Electricity is more expensive at home than where data centers are built, batch inference is more efficient at GPU/TPU inference per watt, power supplies in data centers are more efficient than in average consumer devices, entire racks can be fully powered off when not in use vs. standby power consumption, and of course the investment in hardware is amortized across many users in data centers. It allows more people to have access to larger models than everyone buying an M3 Ultra.

The economy of scale that data centers have is actually a good thing economically and environmentally for many kinds of demand.

I think that the most capable models will continue to be in high demand across the market until at least "a datacenter of PhDs" level of capability. At that point I can see a transition to more local model use if affordable consumer hardware is available (for the median human on Earth). If that turns out to be true then the hyperscaling will plateau at the level allowing sustained commercial/industrial "PhD"-level demand which we aren't at yet (all providers are still struggling to meet current demands).

sponaugle 96 days ago

What I was commenting on was the concept that a small model at home is somehow more efficient. To make a reasonable and fair comparison you would compare many people running a small model at home vs those same people using what would likely be a shared resource in a datacenter.

The core concept is that tokens/watt is tokens/watt ( for a given model of course ). A computer at home is actually less efficient overall because most of the time it is not doing tokens but still using a small footprint of power.

The revenue pressure is an interesting problem , but I suspect the actual demand math will be much more complicated.

I find local models interesting for sure, and run several on my own personal DGX cluster. I am however most certainly not power efficient!