| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jeffbee 741 days ago
	I don't see how you can evaluate better and worse for training without doing so on cost basis. If it costs less and eventually finishes then it's better.

2 comments

tmostak 741 days ago

This assumes that you can linearly scale up the number of TPUs to get equal performance to Nvidia cards for less cost. Like most things distributed, this is unlikely to be the case.

link

logicchains 741 days ago

This is absolutely the case, TPUs scale very well: https://github.com/google/maxtext .

link

pama 741 days ago

The repo mentiones a Karpathy tweet from Jan 2023. Andrej has recently created llm.c and the same model trained about 32x faster on the same NVidia hardware mentioned in the tweet. I dont think the perfomance estimate that the repo used (based on that early tweet) was accurate for the performance of the NVidia hardware itself.

link

fbdab103 741 days ago

Time is money. You might be a lab with long queues to train, leaving expensive staff twiddling their thumbs.

link