Hacker News new | ask | show | jobs
by Apfel 636 days ago
Is this true in a marginal cost sense? I was under the impression most of the environmental impact occurred during the training stage, and that it was significantly less costly post training?
3 comments

You could argue that this is no longer the case once the model is done; the cost per request will go down over time, as the set amount of power and coolant pumped through data centres gets divided over more people.

However, AI companies can't afford to stand still. They have to keep training or they risk being made irrelevant by whatever AI company comes next.

Furthermore, a non-significant amount of energy and cooling is being used for generating responses as well. It's plainly obvious when you run even the very modest AI models at home how much power these things take.

The paper[1] mentions the statistics used to calculate these numbers. It has a separate column for inference, with numbers ranging from 10mL to 50mL of water per inference depending on the data centre sampled.

The numbers seem bad, but the authors also call out that more transparency is needed. With all the bad rep out there from independent estimations and no AI companies giving detailed environmental impact data, I have to assume the real cost is worse than estimated, or companies would've tried to greenwash themselves already.

[1] https://arxiv.org/pdf/2304.03271

> It's plainly obvious when you run even the very modest AI models at home how much power these things take.

Really good point to put this into perspective. I tried models locally and my gpu was running red hot. Granted, I think the server boards like H100 are more optimized for the AI workloads so they run more efficiently than consumer gpus, but I don't believe they are more than 1 magnitude more efficient.

Another corollary is that AI companies don’t train one model at a time. Typical engineers will have maybe 5-10 models training at once. Large hyperparameter grid searches might have hundreds or thousands. Most of these will turn out to be duds. Only one model gets released, and that one’s energy efficiency is what’s reported.
Trend is towards more inference compute (o1), so the post training costs will increase as they will scale that too.
Llama 403b takes OOM a kilowatt minute to respond on our local gpu server, or about 10 grams of C02 per email. Last I checked, add another 20 grams of amortized manufacturing emissions. A typical commute is OOM 5-10 kg of CO2.

this article is alarmist bullshit. (for entirely unrelated reasons openai delenda est)

So you can double your commute‘s environmental impact by using llama 1000x per day?

That sounds pretty bad still, no?

A thousand times? I’d have a hard time typing out that many queries in 8 hours. Even 100 seems like a stretch for someone who uses it within something like cursor.
More and more environments offer LLM aid without having you explicitly typing in a query. E.g. trigger inference whenever static analysis fails (e.g. on a compile error). Or trigger an LLM aided auto-complete with Ctrl-Space. I don't think it'll be particularly unusual to reach 1000 queries in a working day that way.
Coding models these days use an inference every time you stop typing. Let’s say it’s 0.1 inference oer keystroke. If you keep VSCode open all day, I could believe it’s a significant energy draw.

Google now uses several inferences per Google search.

The average user’s #inferences-per-day is going to skyrocket.

My point is that it’s understandable to consider AI a significant contributor to the average professional’s energy budget. It’s not an insult to point this out.