Hacker News new | ask | show | jobs
by mechagodzilla 821 days ago
They might generate improvements, but I’m not sure why people think those improvements would be unbounded. Think of it like improvements to jet engines or internal combustion engines - rapid improvements followed by decades of very tiny improvements. We’ve gone from 32-bit LLM weights down to 16, then 8, then 4 bit weights, and then a lot of messy diminishing returns below that. Moore’s is running on fumes for process improvements, so each new generation of chips that’s twice as fast manages to get there by nearly doubling the silicon area and nearly doubling the power consumption. There’s a lot of active research into pruning models down now, but mostly better models == bigger models, which is also hitting all kinds of practical limits. Really good engineering might get to the same endpoint a little faster than mediocre engineering, but they’ll both probably wind up at the same point eventually. A super smart LLM isn’t going to make sub-atomic transistors, or sub-bit weights, or eliminate power and cooling constraints, or eliminate any of the dozen other things that eventually limit you.
2 comments

Saying that AI hardware is near a dead end because Moore's law is running out of steam is silly. Even GPUs are very general purpose, we can make a lot of progress in the hardware space via extreme specialization, approximate computing and analog computing.
I'm mostly saying that unless a chip-designing AI model is an actual magical wizard, it's not going to have a lot of advantage over teams of even mediocre human engineers. All of the stuff you're talking about is Moore's Law limited after 1-2 generations of wacky architectural improvements.
Bro, Jensen Huang just unveiled a chip yesterday that goes 20 petaflops. Intel's latest raptorlake cpu goes 800 gigaflops. Can you really explain 25000x progress by the 2x larger die size? I'm sure reactionary America wanted Moore's law to run out of steam but the Taiwanese betrayal made up for all the lost Moore's law progress and then some.
That speedup compared to Nvidia's previous generation came nearly entirely from: 1) a small process technology improvement from TSMC, 2) more silicon area, 3) more power consumption, and 4) moving to FP4 from FP8 (halving the precision). They aren't delivering the 'free lunch' between generations that we had for decades in terms of "the same operations faster and using less power." They're delivering increasingly exotic chips for increasingly crazy amounts of money.
Pro tip: If you want to know who is the king of AI chips, compare FLOPS (or TOPS) per chip area, not FLOPS/chip.

As long as the bottleneck is the fab capacity as wafers per hous, the number of operations per second per chip area determines who will produce more compute with best price. It's a good measure even between different technology nodes and superchips.

Nvidia is leader for a reason.

If manufacturing capacity increases to match the demand in the future, FLOPS or TOPS per Watt may become relevant, but now it's fab capacity.

Taiwanese betrayal? I’m not sure I understand the reference.
There's no reference. It's just a bad joke. What they did was actually very good.