Hacker News new | ask | show | jobs
by musicale 475 days ago
> so you had to have lots of operand re-use to not be memory-bound

Looking at Nvidia's spec sheet, an H100 SXM can do 989 tf32 teraflops (or 67 non-tensor core fp32 teraflops?) and 3.35 TB/s memory (HBM) bandwidth, so ... similar problem?

1 comments

There is caching today.
The cache hitrate is effectively 0 for LLMs since the datasets are so huge.