|
|
|
|
|
by simonw
219 days ago
|
|
If you have 1,000 researchers working for your company and you constantly have dozens of different training runs in the go, overlapping each other, how would you split those salaries between those different runs? Calculating the cost in terms of GPU-hours is a whole lot easier from an accounting perspective. The papers I've seen that talk about training cost all do it in terms of GPU hours. The gpt-oss model card said 2.1 million H100-hours for gpt-oss:120b. The Llama 2 paper said 3.31M GPU-hours on A100-80G. They rarely give actual dollar costs and I've never seen any of them include staffing hours. |
|