Hacker News new | ask | show | jobs
by winwang 582 days ago
4090 tensor performance (FP8): 660 teraflops, 1320 "with sparsity" (i.e. max theoretical with zeroes in the right places).

https://images.nvidia.com/aem-dam/Solutions/geforce/ada/nvid...

But at these levels of compute, the memory/interconnect bandwidth becomes the bottleneck.