Hacker News new | ask | show | jobs
by EvgeniyZh 106 days ago
There are two generations and 4.5 years between A100 and B200.

A100 has 312 TFLOPS of FP16 for 250W, i.e., 1.25 TFLOPS/W.

B200 has 2250 TFLOPS of FP16 compute for 1000W, i.e., 2.25 TFLOPS/W.

This is ~34% growth per generation and ~14% per year. It's hard to believe it will be 400% per generation this time

2 comments

It might be 400% in the one thing everyone is interested in.
you think in FP16. nobody uses FP16 for inference anymore. 400% probably for FP4/INT4 computation.
Tensor core performance is inversely proportional to precision across all generations (i.e., reducing precision by a factor of 2 increases OPS by a factor of 2). 8-bit precision will give you the same improvement ratio. A100/H100 didn't support 4-bit if I remember correctly.

So FP4/INT4 will likely improve the same 30% OPS/W. You could get a separate improvement by reducing precision, but going 1-bit for 4x improvement feels unlikely for now.