Hacker News new | ask | show | jobs
by jeffbee 741 days ago
I don't see how you can evaluate better and worse for training without doing so on cost basis. If it costs less and eventually finishes then it's better.
2 comments

This assumes that you can linearly scale up the number of TPUs to get equal performance to Nvidia cards for less cost. Like most things distributed, this is unlikely to be the case.
This is absolutely the case, TPUs scale very well: https://github.com/google/maxtext .
The repo mentiones a Karpathy tweet from Jan 2023. Andrej has recently created llm.c and the same model trained about 32x faster on the same NVidia hardware mentioned in the tweet. I dont think the perfomance estimate that the repo used (based on that early tweet) was accurate for the performance of the NVidia hardware itself.
Time is money. You might be a lab with long queues to train, leaving expensive staff twiddling their thumbs.