|
|
|
|
|
by bacon_blood
2009 days ago
|
|
Most machine learning is using _NVIDIA_ GPUs, which themselves have a neural engine (tensor cores) for the last two generations. An NVIDIA A100 has around 19 Teraflops but 156 "tensor flops" (312 if you use sparse matrices). In addition to being useful for training and inference, the consumer cards use tensor cores for things like mic filtering (RTX Voice) and neural upscaling (DLSS) in games. General purpose GPU hardware is way more wasteful for matrix math, like maybe >10x waste on power and equally worse performance, than tensor cores. |
|
I didn't realize there was that distinction; I thought GPU's were just optimized for vector arithmetic across the board. What is the difference between general purpose GPU hardware and tensor cores? What does general purpose GPU hardware do that tensor cores do not?