| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by matt-p 741 days ago
	Exactly and they are still about 1/18ths as good at training llms as a H100. Maybe they are less than 1/18ths the cost, so google technically have a marginally better unit cost but i doubt it when you consider the R&D cost. They are less bad at inference, but still much worse than even an A100.

4 comments

derefr 741 days ago

Given that Google invented Transformer architecture (and Google AI continues to do foundational R&D on ML architecture) — and that Google's TPUs don't even support the most common ML standards, but require their own training and inference frameworks — I would assume that "the point" of TPUs from Google's perspective, has less to do with running LLMs, and more to do with running weird experimental custom model architectures that don't even exist as journal papers yet.

I would bet money that TPUs are at least better at doing AI research than anything Nvidia will sell you. That alone might be enough for Google to keep getting some new ones fabbed each year. The TPUs you can rent on Google Cloud might very well just be hardware requisitioned by the AI team, for the AI team, that they aren't always using to capacity, and so is "earning out" its CapEx through public rentals.

TPUs are maybe also better at other things Google does internally, too. Running inference on YouTube's audio+video-input timecoded-captions-output model, say.

link

UCBdaPatterson 741 days ago

If you're interested in a peer reviewed scientific comparison, Google writes retrospective papers after contemporary TPUs and GPUs are deployed versus speculation about future products. The most recent compares TPU v4 and A100. (TPU v5 and H100 is for a future paper). Here is a quote from the abstract:

"Deployed since 2020, TPU v4 outperforms TPU v3 by 2.1x and improves performance/Watt by 2.7x. ... For similar sized systems, it is ~4.3x--4.5x faster than the Graphcore IPU Bow and is 1.2x--1.7x faster and uses 1.3x--1.9x less power than the Nvidia A100. TPU v4s inside the energy-optimized warehouse scale computers of Google Cloud use ~2--6x less energy and produce ~20x less CO2e than contemporary DSAs in typical on-premise data centers."

Here is a link to the paper: https://dl.acm.org/doi/pdf/10.1145/3579371.3589350

link

coder543 741 days ago

That quote is referring to the A100... the H100 used ~75% more power to deliver "up to 9x faster AI training and up to 30x faster AI inference speedups on large language models compared to the prior generation A100."[0]

Which sure makes the H100 sound both faster and more efficient (per unit of compute) than the TPU v4, given what was in your quote. I don't think your quote does anything to support the position that TPUs are noticeably better than Nvidia's offerings for this task.

Complicating this is that the TPU v5 generation has already come out, and the Nvidia B100 generation is imminent within a couple of months. (So, no, a comparison of TPUv5 to H100 isn't for a future paper... that future paper should be comparing TPUv5 to B100, not H100.)

[0]: https://developer.nvidia.com/blog/nvidia-hopper-architecture...

link

nomel 740 days ago

As someone unfamiliar with this area, can one of the downvotes explain why they choose to downvote this? Is it wrong?

link

matt-p 740 days ago

I'm sure it probably is faster for thier own workloads (which they are choosing to benchmark on), why bother making it if not. But that is clearly not universally true, a GPU is clearly more versatile. This means nothing to most if they can't for example train an LLM on them.

link

jeffbee 741 days ago

I don't see how you can evaluate better and worse for training without doing so on cost basis. If it costs less and eventually finishes then it's better.

link

tmostak 741 days ago

This assumes that you can linearly scale up the number of TPUs to get equal performance to Nvidia cards for less cost. Like most things distributed, this is unlikely to be the case.

link

logicchains 741 days ago

This is absolutely the case, TPUs scale very well: https://github.com/google/maxtext .

link

pama 741 days ago

The repo mentiones a Karpathy tweet from Jan 2023. Andrej has recently created llm.c and the same model trained about 32x faster on the same NVidia hardware mentioned in the tweet. I dont think the perfomance estimate that the repo used (based on that early tweet) was accurate for the performance of the NVidia hardware itself.

link

fbdab103 741 days ago

Time is money. You might be a lab with long queues to train, leaving expensive staff twiddling their thumbs.

link

blharr 741 days ago

Also energy cost. 18 chips vs 1, it's probably costing a lot more to run 18

link

jeffbee 741 days ago

Google claims the opposite in "TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings " https://arxiv.org/abs/2304.01433

Despite various details I don't think that this is an area where Facebook is very different from Google. Both have terrifying amounts of datacenter to play with. Both have long experience making reliable products out of unreliable subsystems. Both have innovative orchestration and storage stacks. Meta hasn't published much or anything about things like reconfigurable optical switches, but that doesn't mean they don't have such a thing.

link