|
|
|
|
|
by kristjansson
1173 days ago
|
|
I don’t think it’s fair to just ignore the capex part of the model training costs. If we take AWS pricing, the 21 days of training for 65B cited in the llama paper would cost 2.6m at reserved prices. While there’s a lot of AWS profit there, it’s a reasonable first approximation of the TCO of that hardware. Even if real TCO is a third, that’s still nearly a million to train 65B, never mind the staff costs. |
|