|
|
|
|
|
by niklasd
1699 days ago
|
|
> "training BERT on GPU is roughly equivalent to a trans-American flight" This gets cited often, but is a number that doesn't impress me very much. Given the fact that (in theory) a language model like BERT has to be pre-trained only once[1], and these weights then can be reused by anyone to fine-tune the model[2], the number doesn't seem impudently high. I think its good to examine the environmental impact, but there are so many unnecessary flights every day, that could be substituted by trains, that for me the fact that pre-training a state of the art ML model for the CO2 cost of a flight did not seem very high for me. Or do I overlook something there? [1] Of course, in reality it's not so easy – there is hyperparameter search, domain adpotion, further experiments, etc. [2] Also because of the great open source work by Hugging Face. |
|