|
|
|
|
|
by free_rms
2009 days ago
|
|
There's also * this generation of language models leaning into transfer learning reducing the total number of training runs for different applications * TPUs being more power efficient than GPUs (the numbers they used in the paper were based on GPUs) * other energy-centric stuff that's not just offsets, efficiency like you mention in addition to sourcing from renewable |
|