|
|
|
|
|
by Der_Einzige
219 days ago
|
|
Citation needed on "generally when people talk about training costs like this they include more than just the electricity but exclude staffing costs". It would be simply wrong to exclude the staffing costs. When each engineer costs well over 1 million USD in total costs year over year, you sure as hell account for them. |
|
Calculating the cost in terms of GPU-hours is a whole lot easier from an accounting perspective.
The papers I've seen that talk about training cost all do it in terms of GPU hours. The gpt-oss model card said 2.1 million H100-hours for gpt-oss:120b. The Llama 2 paper said 3.31M GPU-hours on A100-80G. They rarely give actual dollar costs and I've never seen any of them include staffing hours.