| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by althea_tx 634 days ago
	Does anyone else have a hard time accepting these calculations? I don’t doubt the serious environmental costs of AI but some of the claims in this infographic seem far-fetched. Inference costs should be much lower than training costs. And, if a 100-word email with GPT-4 requires 0.14 kWh of energy, power AI users and developers must be consuming 100x as much. Also, what about running models like Llama-3 locally? Would love to see someone with more expertise either debunk or confirm the troubling claims in this article. It feels like someone accidentally shifted a decimal point over a few places to the right.

3 comments

marginalia_nu 634 days ago

If I run some simple inference locally on a 4090 (450 TDW card) it takes order of seconds and that sucker's going full blast, you're looking at order of 1 kJ, which is significantly higher than what is quoted in the article.

Article numbers line up better with CPU inference for ~1s.

link

Panzer04 634 days ago

1kj is nothing. That's 0.3wh, or 0.0003kwh.

link

marginalia_nu 634 days ago

That's for a single inference though. You can do about 3600 of them in an hour.

link

aubanel 633 days ago

Yes but the article's setting is precisely about 1 email so 1 inference, and their number is 0.14kWh, which is way off.

link

gcr 634 days ago

I’m still kind of skeptical. M-series Apple hardware doesn’t even get warm during inference with some local models.

Edit: Nah I’m convinced, look at table 1. Inference costs are around 20mL in a datacenter environment.

link

marginalia_nu 634 days ago

1 kJ is for reference enough to heat 1 L (33 oz) of water by ~0.25C (~0.5F). The machine will probably heat up a few degrees if you run inference once, but since it's essentially one big heatsink it will dissipate throughout the body and into the air. The problem begins when you run it continuously, as you would in a datacenter.

link

MSFT_Edging 634 days ago

Datacenters aren't running M-series chips.

link

guitarlimeo 634 days ago

Well not M-series chips specifically, but chips optimized for these kind of workloads (like the neural engine in M-series chips is).

link

dartos 634 days ago

IIRC The M series chip isn’t specifically optimized for ML workloads, the biggest gain it has is having unified video and cpu memory as transferring layers between the two is a big bottleneck on non Apple systems.

Real ML hardware (like the Nvidia H1000s) that can handle the kind of inference traffic you see in production get hot and use quite a bit of energy, especially when they run at full blast 24/7

link

gcr 634 days ago

Google’s TPU energy usage is a well-kept secret / competitive advantage. If energy efficiency isn’t a major concern for them, I bet it will be in a couple years.

link

guitarlimeo 634 days ago

Even if the costs were lower, the trend is towards more inference compute time (o1), so these costs might be valid for the future.

link

viraptor 634 days ago

I'm not sure how comparable o1 is in total usage. Remember that people will either adjust the prompt or continue the conversation as needed. If o1 spends more time on the answer, but responds in fewer steps, it may be a net positive on energy use. Also it may skip the planning and self-reflection steps in agent usage completely. It's going to be hard to estimate the real change in usage.

link

FrojoS 634 days ago

I assume you meant 0.14 kWh (kilo watt hours) of energy.

(I can't access the article.)

link