| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sanketsarang 1762 days ago
	On the same basis, it would also help if you could provide a comparison between GPUs commonly used for ML. Tesla k80, P100, T4, V100 and A100. How has the architecture evolved to make the A100 significantly faster? Is it just the 80GB RAM, or there is more to it from an architecture standpoint?

4 comments

einpoklum 1762 days ago

> How has the architecture evolved to make the A100 significantly faster?

Oh, very much so. By way more than an order of magnitude. For a deeper read, have a look at the "architecture white papers" for Kepler, Pascal, Volta/Turing, and Ampere:

https://duckduckgo.com/?t=ffab&q=NVIDIA+architecture+white+p...

or check out the archive of NVIDIA's parallel4all blog ... hmm, that's weird, it seems like they've retired it. They used to have really good blog posts explaining what's new in each architecture.

You could also have a look here:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index....

for the table of various numeric sizes and limits which change with different architectures. But that's not a very useful resource in and of itself.

link

M277 1762 days ago

You may find this[0] helpful (note -- download link to a .PDF). It's the GA100 whitepaper.

[0]: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Cent...

link

kkielhofner 1762 days ago

As a starter T4 is heavily optimized for low power consumption on inference tasks. IIRC it doesn’t even require additional power beyond what the PCIe bus can provide but basically useless for training unlike the others.

link

touisteur 1761 days ago

One day I'll get my hands on both an A40 and an A100 and I'll maybe get an answer to the question: does the 5120bits memory bus help that much? The A100 has less cuda cores, around 1/4 more tensor cores but seems to be the preferred 'compute' and 'ai training' option all around. What gives?

link