|
|
|
|
|
by theferalrobot
2462 days ago
|
|
Your goalposts moved a few figures. Furthermore, $1 million+ was not a university compute budget - that was money for a single lab on campus (at a general state school nonetheless) on a specific project. You still have yet to provide any concrete sources to back up your claims. We're talking about contributing to research here. If multi-million dollar training jobs are what it takes to be at the cutting edge you should be able to provide ample sources of that claim. |
|
- "the current version of OpenAI Five has consumed 800 petaflop/s-days" [2].
- Check out the Green AI paper. They have good number on the amount of compute to train a model and you can translate that into numbers.
- https://medium.com/syncedreview/the-staggering-cost-of-train.... NOTE: That XLNet number has to be wrong - it should be 5-figures, not 6.
I'm not an expert in on-prem ML costs, but I know many of the world's best on-prem ML users use the cloud to handle the variability of their workloads so I don't think on-prem is a magic bullet cost wise.
$1M annually per project (vs per lab) isn't bad at all. It's also way out of whack with what I saw when I was doing AI research in academia, but that was pre deep learning revolution, so what do I know.
Re: the moving goalposts - the distinction is between the cost of a training run and the cost of a paper-worth research result. Due to inherent variability, architecture search, hyperparameter search and possibly data cleaning work, the total cost is a couple orders of magnitude more than the cost of a training run (multiple will vary a lot by project and lab).
I understand why you don't trust what I'm saying. I wish I could give hard numbers, but I'm limited in what I can say publicly so this is the best I can do.
[1] https://medium.com/syncedreview/yoshua-bengio-on-the-turing-... [2] https://openai.com/blog/how-to-train-your-openai-five/