| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rfoo 475 days ago

> Still looks compute-bound to me.

H100 has 3.3TB/s HBM bandwidth on paper, and ~1000TFLOPS bf16 compute on paper. That's 1:300. 0.6GB vs ~2GFLOPS is 1:3. Tell me how is this compute bound?

(also, your number, even after accounting for GQA, is still off. You usually can't store kvcache in fp8.)