Hacker News new | ask | show | jobs
by free_rms 2009 days ago
There's also

* this generation of language models leaning into transfer learning reducing the total number of training runs for different applications

* TPUs being more power efficient than GPUs (the numbers they used in the paper were based on GPUs)

* other energy-centric stuff that's not just offsets, efficiency like you mention in addition to sourcing from renewable