Hacker News new | ask | show | jobs
by phillipcarter 1269 days ago
While not unsolvable, I think the author is understating this problem a lot:

> Also, let's put things in perspective: yes, it is enviromentally costly, but we aren't training that many of them, and the total cost is miniscule compared to all the other energy consumptions we humans do.

Part of the reason LLMs aren't that big in the grand scheme of things is because they haven't been good enough and businesses haven't started to really adopt them. That will change, but the costs will be high because they're also extremely expensive to run. I think the author is focusing on the training costs for now, but that will likely get dwarfed by operational costs. What then? Waving one's arms and saying it'll just "get cheaper over time" isn't an acceptable answer because it's hard work and we don't really know how cheap we can get right now. It must be a focus if we actually care about widespread adoption and environmental impact.

5 comments

Think about aluminum smelting. At some point in the past, only a few researchers could smelt aluminum, and while it used a ton of energy, it was just a few research projects. Then, people realized that aluminum was lighter than steel and could replace it... so suddenly everybody was smelting aluminum. The method to do this involves massive amounts of electricity... but it was fine, because the value of the product (to society) was more than high enough to justify it. Eventually, smelters moved to places where there were natural sources of energy... for example, the Columbia Gorge dam was used to power a massive smelter. Guess where Google put their west coast data center? Right there, because aluminum smelting led to a superfund site and we exported those to growing countries for pollution reasons. So there is lots of "free, carbon-neutral" power from hydro plants.

The interesting details are: the companies with large GPU/TPU fleets are already running them in fairly efficient setups, with high utilization (so you're not blowing carbon emissions on idle machines), and can scale those setups if demand increases. This is not irresonsible. And, the scaleup will only happen if the systems are actually useful.

Basically there are 100 other things I'd focus on trimming environment impact for before LLMs.

I think quantization (e.g. 4-bit, https://arxiv.org/abs/2212.09720) and sparsity (e.g. SparseGPT, https://arxiv.org/abs/2301.00774) will bring down inference cost.

Edit: This isn’t handwaving btw, this is to say some fairly decent solutions are available now.

>Part of the reason LLMs aren't that big in the grand scheme of things is because they haven't been good enough and businesses haven't started to really adopt them. That will change, but the costs will be high because they're also extremely expensive to run. I think the author is focusing on the training costs for now, but that will likely get dwarfed by operational costs. What then?

Now maybe I'm naive somehow because I'm a machine-learning person who doesn't work on LLMs/big-ass-transformers, but uh... why do they actually have to be this large to get this level of performance?

Dunno! It could be the case that there just needs to be a trillion parameters to be useful enough outside of highly-constrained scenarios. But I would certainly challenge those who work on LLMs to figure out how to require far less compute for the same outcome.
This could, perhaps, become a significant issue if and when such systems achieve commercial viability, but right now, they are research projects, and it seems beside the point to balance their usefulness as such against what would be their energy consumption if their use were scaled up to the level of a major commercial activity.

To add a research-oriented comparison to the others being presented here, the LHC's annual energy budget is about 3,000 times that of training GPT-3.

The models will become much smaller, there are already some papers that show promising results with pruned models

And transformers are not even the final model , who knows what will come next