|
|
|
|
|
by anthonix1
745 days ago
|
|
It converges similarly on smaller datasets. About to kick off a training from scratch run on the same fineweb-10B, which at 324k toks/sec should take about 8.6 hours. And with my kWh cost, that is about $2.50 cost to train. Will report back tomorrow when the training has finished.. |
|