Hacker News new | ask | show | jobs
by torginus 106 days ago
I really don't want to overrule your expertise in this regard, but an 5x efficiency gain in a single generation feels like its too much, especially considering how newer process nodes have been yielding less and less improvements.

Just to compare and contrast:

https://www.videocardbenchmark.net/power_performance.html

Here's a synthethic benchmark page listing every GPU in recent memory. True, its not AI, but if we look at the 1080 Ti, a 9 year old card at this point, and compare it with the 5090 we see the gains were 190/74=2.56x in that timespan that involved multiple die shrinks and uArch changes.

I think these numbers might not hold up on IRL workloads, and afaict older datacenter cards still hold up well and are being used in production.

3 comments

Newer process nodes are not the main avenue of improvement. What those transistors are used for is more important and it’s plausible that improvements between generations can increase performance by multiples on a specific task. All of the improvements aren’t necessarily in the chip itself either.

E.g. the next gen might have hardware inference for lower bits, more memory bandwidth, etc.

You could just give the TLDR: by far the biggest improvement in the different generations of nVidia chips is calculating faster at half the accuracy. For blackwell vs hopper it was "double performance". By which they mean blackwell can calculate with NXFP4 at twice the rate hopper can calculate at FP8. Then go back generations all the way until you arrive at FP64, where we started. They even made a slight detour to "FP128".

Decide for yourself if this is a real improvement. You should probably consider that nVidia did not just give the new chips, but also demonstrated training a neural net with NXFP4.

It's not the only improvement, but it is by far the biggest.

As for the future: nobody's gotten FP2 to work satisfactorily yet. But hey, maybe at nVidia's next conference. But, even NXFP4 is not actually 4 bits (meaning various parts of the computation don't actually happen at 4 bits), and neither was FP8 (you could use it like that but people didn't)

Almost seems as if microchips are approaching their "B-52 age":

"Those things are still flying! Introduced in 1955!"

"But that was the B version, all those that are still flying are the H version, so many iterations between them!"

"Welcome to 1962"

> but an 5x efficiency gain in a single generation feels like its too much, especially considering how newer process nodes have been yielding less and less improvements

The efficiency is in other areas too e.g. memory, network, etc. It's TOTAL.

> Here's a synthethic benchmark page listing every GPU in recent memory

We don't have the GPU gains not because of process nodes. Nvidia and later AMD stopped investing in that direction. They started optimizing for AI not graphics.