|
|
|
|
|
by priansh
1524 days ago
|
|
I’ve been saying this for years, language models are the ML equivalent of the billionaire space race, it’s just a bunch of orgs with unlimited funding spending millions of dollars on compute to get more parameters than their rivals. It could be decades before we start to see them scale down or make meaningful optimizations. This paper is a good start but I’d be willing to bet everyone will ignore it and continue breaking the bank. Can you say that about any other task in ML? When Inceptionv3 came out I was able to run the model pretty comfortable on a 1060. Even pix2pix and most GANs fit comfortably in commercial compute, and the top of the line massive models can still run inference on a 3090. It’s so unbelievably ironic that one of the major points Transformers aimed to solve when introduced was the compute inefficiency of recurrent networks, and it’s devolved into “how many TPUs can daddy afford” instead. |
|