Hacker News new | ask | show | jobs
by cavisne 419 days ago
"An H100 GPU has 989 TFLOPs of half-precision matrix multiply compute, and ~60 TFLOPs of “everything else”"

I always thought there was a lot of crossover between gaming GPU's & DC GPU's (and the volume is why NVIDIA is so far ahead). Are tensor cores somehow related to the pre tensorcore SM's (like an abstraction on top of SM's?)

1 comments

The "tensor cores" are just like the previous "cuda cores" in that they exist inside each SM along with all the other required machinery like register files, schedulers, etc. Volta was the first microarchitecture to have the SM include purpose built tensor cores. Before that, they only contained more general CUDA cores.