|
|
|
|
|
by duchenne
594 days ago
|
|
Training a 1B model on 1T tokens is cheaper than people might think.
A H100 GPU can be rented for 2.5$ per hour and can train around 63k tokens per second for a 1B model.
So you would need around 4,400 hours of GPU training costing only $11k
And costs will keep going down. |
|