Hacker News new | ask | show | jobs
by althea_tx 634 days ago
Does anyone else have a hard time accepting these calculations? I don’t doubt the serious environmental costs of AI but some of the claims in this infographic seem far-fetched. Inference costs should be much lower than training costs. And, if a 100-word email with GPT-4 requires 0.14 kWh of energy, power AI users and developers must be consuming 100x as much. Also, what about running models like Llama-3 locally? Would love to see someone with more expertise either debunk or confirm the troubling claims in this article. It feels like someone accidentally shifted a decimal point over a few places to the right.
3 comments

If I run some simple inference locally on a 4090 (450 TDW card) it takes order of seconds and that sucker's going full blast, you're looking at order of 1 kJ, which is significantly higher than what is quoted in the article.

Article numbers line up better with CPU inference for ~1s.

1kj is nothing. That's 0.3wh, or 0.0003kwh.
That's for a single inference though. You can do about 3600 of them in an hour.
Yes but the article's setting is precisely about 1 email so 1 inference, and their number is 0.14kWh, which is way off.
I’m still kind of skeptical. M-series Apple hardware doesn’t even get warm during inference with some local models.

Edit: Nah I’m convinced, look at table 1. Inference costs are around 20mL in a datacenter environment.

1 kJ is for reference enough to heat 1 L (33 oz) of water by ~0.25C (~0.5F). The machine will probably heat up a few degrees if you run inference once, but since it's essentially one big heatsink it will dissipate throughout the body and into the air. The problem begins when you run it continuously, as you would in a datacenter.
Datacenters aren't running M-series chips.
Well not M-series chips specifically, but chips optimized for these kind of workloads (like the neural engine in M-series chips is).
IIRC The M series chip isn’t specifically optimized for ML workloads, the biggest gain it has is having unified video and cpu memory as transferring layers between the two is a big bottleneck on non Apple systems.

Real ML hardware (like the Nvidia H1000s) that can handle the kind of inference traffic you see in production get hot and use quite a bit of energy, especially when they run at full blast 24/7

Google’s TPU energy usage is a well-kept secret / competitive advantage. If energy efficiency isn’t a major concern for them, I bet it will be in a couple years.
Even if the costs were lower, the trend is towards more inference compute time (o1), so these costs might be valid for the future.
I'm not sure how comparable o1 is in total usage. Remember that people will either adjust the prompt or continue the conversation as needed. If o1 spends more time on the answer, but responds in fewer steps, it may be a net positive on energy use. Also it may skip the planning and self-reflection steps in agent usage completely. It's going to be hard to estimate the real change in usage.
I assume you meant 0.14 kWh (kilo watt hours) of energy.

(I can't access the article.)