|
|
|
|
|
by musicale
475 days ago
|
|
> so you had to have lots of operand re-use to not be memory-bound Looking at Nvidia's spec sheet, an H100 SXM can do 989 tf32 teraflops (or 67 non-tensor core fp32 teraflops?) and 3.35 TB/s memory (HBM) bandwidth, so ... similar problem? |
|