Hacker News new | ask | show | jobs
by niklasd 1699 days ago
> "training BERT on GPU is roughly equivalent to a trans-American flight"

This gets cited often, but is a number that doesn't impress me very much. Given the fact that (in theory) a language model like BERT has to be pre-trained only once[1], and these weights then can be reused by anyone to fine-tune the model[2], the number doesn't seem impudently high.

I think its good to examine the environmental impact, but there are so many unnecessary flights every day, that could be substituted by trains, that for me the fact that pre-training a state of the art ML model for the CO2 cost of a flight did not seem very high for me. Or do I overlook something there?

[1] Of course, in reality it's not so easy – there is hyperparameter search, domain adpotion, further experiments, etc.

[2] Also because of the great open source work by Hugging Face.

1 comments

Also, the likes of Google, Amazon, MS, etc. generally use renewable power for their data centers and are each very actively investing in making their entire operations carbon neutral. So, yes they produce and use a lot of energy but not a lot of carbon.