| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by musicale 475 days ago
	> so you had to have lots of operand re-use to not be memory-bound Looking at Nvidia's spec sheet, an H100 SXM can do 989 tf32 teraflops (or 67 non-tensor core fp32 teraflops?) and 3.35 TB/s memory (HBM) bandwidth, so ... similar problem?

1 comments

There is caching today.

The cache hitrate is effectively 0 for LLMs since the datasets are so huge.