Hacker News new | ask | show | jobs
by PaulHoule 1348 days ago
Moore’s law is alive but the benefits are diminishing.

Until 2005 or so, shrinking transistors automatically increased speed and reduced power consumption. When that ran out of steam, the industry went to multi core and massive parallelism with GPUs.

Until recently each shrink also lowered the cost per transistor, but that seems to have run out also and has something to do with why Intel was stuck at 14nm for so long and why new GPU prices are so insane despite a collapse in demand and resolution of the supply chain crisis for high end chips.

Chiplets at best are neutral with regard to cost. If manufacturing overhead is low, two chiplets give you twice the transistors at twice the cost. The industry did not pursue chiplets with a lot of vigor until now because it was a less competitive approach to scaling than shrinking transistors until now.

8 comments

I've been jesting for years that it should be referred to as "Moore's business plan", except it isn't a joke.

That business plan is no longer functional for Intel. Translated, Moore's business plan was to shrink die size, and speed up clocks, so much as to obsolete their previous offering with an 18 month half life.

It just doesn't work that way anymore, and hasn't for quite some time.

Engelbart's Scaling Observation[0], on the other hand, remains quite interesting, and from what I see remains in force. Genetics in particular is still pulling exponential gains out of the luminiferous ether.

[0]: https://en.wikipedia.org/wiki/Engelbart%27s_law

Something to consider is that smaller chiplets have higher yields than monolithic dies given the same defect rate, which can certainly have an effect on price.
Right, more precisely cut out the defects from a wafer and save as much precious top-tier silicon as possible. Plus, another benefit of chiplets is that not every circuit needs the same performance level. Save the 5nm stuff for "hot paths" and use increasingly older processes for less performance/power critical applications, because while they might take up more space, they won't unnecessarily take up more next-gen fab time. Phones are literally the smallest and most power-limited devices we use so it makes sense to make every chip in them 5nm. Laptops and desktops and servers are not so constrained.
The problem is that GPUs are mostly underutilized outside games and machine learning, because the industry still hasn't moved away from the concept only a few selected group of developers can enjoy tooling to program them.

So everyone that works in other domains, without access to libraries written by the GPU druids, largely ignores their existence.

You can have compute shaders in WebGL2, and WebGPU is around the corner. GPU power is available but then you run into the thorny issue of specs...

Consumer machines vary wildly in their GPU capabilities, especially VRAM. So how do you know that your nice accelerated algorithm is going to work if the user has an old GPU? And what do you do if it doesn’t work? Run on the CPU? Tell the user their machine is too weak?

Here the advantage of GPUs (performance) is also the biggest disadvantage: a gigantic range of performance profiles. At least with CPUs the oldest CPU is only going to be a small integer factor slower than a new one in single thread.

What unites gamers and machine learning is an expectation that the user has a reasonably recent and capable GPU. But these are small, self-selecting populations.

On the server side the issue is cost. GPUs are expensive, and usually not necessary, so nobody is going to write code that requires one without a good reason.

Good luck letting the average JavaScript coder take advantage of them.

This is the problem, GPUs are still a very specialised skill.

Can you blame them? I just built a nice custom PC for my Son, with an 6 core cpu (with graphics, a ryzen -g class) 32GB of ram, 1TB NVMe hard drive, nice case, etc.

That cost about the same as a single mid-range video card. (Nvidia RTX 3070)

Why on earth would you add a requirement to your software/workflow that doubles your cost, and is just about impossible to find in stock?

I remember building my own PCs back in the 90s and early 2000s. In 2022 I don’t think any of my kids has ever even seen a desktop computer. It is all laptops, tablets, and phones.
I thought most generic computation workloads are ill-suited for GPUs. A normal web SaaS application is full of if branches and JMP instructions. Running this on GPU would slow it down, not speed it up.
Exposing GPU programming to anyone besides C, C++ and Fortran developers would already help, even if that would take a speed bump, as proven by the few attempts targeting PTX.

I wasn't talking about Web apps.

Pytorch isn't just for ML, it can do normal signal processing or physics too. The julia libraries for cuda and roc and oneapi also are general enough for those uses and approachable. Both can fall back to cpu without much modification to the rest of your code.

If you aren't doing signal processing, physics or something that would benefit from simd, then the gp is correct, a gpu won't do much for you.

That said people are always discovering algorithms that get better performance than you'd expect from new hardware.

For instance a frightening amount of CPU is spent in financial messaging systems on validating UTF-8, parsing XML and JSON, converting numbers written in decimal digits to binary and things like that. You'd think these are "embarrassingly serial" problems but with clever coding and advanced SIMD instructions such as AVX-512 they can be accelerated for throughput, latency, and economy.

The benefits of the GPU are great enough that you might do more "work" but get the job done faster because it can be done in parallel.

For instance the algorithms used by the old A.I. ("expert systems") parallelize better than you might think (though not as well as the Japanese hoped they would in the 1980s) despite being super-branchy. Currently fashionable neural networks (called "connectionist" back in the day) require only predicated branching (which side of the ReLU are you on?) but spend a lot of calculations on parts of the network which might not be meaningful for the current inference. It depends on the details, but you might be better doing many more operations if you can do them in parallel.

Given that GPUs are out there and that so many people are working on them I think the range of what you can do with them is going to increase, though I think few people will be writing application logic on them directly, but they will increasingly use libraries and frameworks. For instance, see

https://arxiv.org/pdf/1709.02520.pdf

Pytorch belongs to "... libraries written by the GPU druids..." on my comment.

And still requires specific skills to use, and is constrained to Python, C++ and Java based languages.

GPUs need to be exposed like SIMD, something that the language runtime takes care of, even if not perfect, better than not using them at all.

IME simd very rarely gets used by the compiler or runtime unless you make some slight changes in your data structures or flow, that require specific knowledge of the simd hardware. Asking a compiler to target unknown GPU architecture seems more likely to slow execution than speed it up. Even when writing my own cuda kernels I sometimes realize that something I am doing won't work well for a particular card and it is actually making me slower than the cpu. I'm sure we'll get there, but cards will have to converge a bit.
The point stands, the vast majority of workloads are unsuited for GPUs, either because they are full of divergent branches, or because the data transfer and synchronization overheads would cancel any performance gains.
For one thing, most employers will refuse to issue a laptop with a real GPU to developers and other employees because they are afraid they will get used for games.
That’s clearly untrue. Employers source computers from a few selected vendors and generally issue computers with average specifications because they can buy them in bulk and they are good enough. You can get a laptop with a more powerful GPU at most place if you actually need one.

No one is scared of employees gaming. Employees can’t install applications themselves on their laptops at most place.

That seems like a stretch? Isn’t the more obvious explanation that laptops with a real GPU are much more expensive and that the weaker, integrated GPUs are more than good enough for the vast majority of business use?

Today’s iGPUs are fast enough comfortable run plenty of games.

I have work provided high-end POS Dell Precision engineering laptop. It has an Nvidia discrete GPU, but I don’t think I’ve ever actually needed its power, and I’d gladly trade it for a laptop without…

The integrated one would already be quite good, if there was a more mainstream way to make use of it.
The tooling is getting better. Debuggers are a thing now. You can program them in freestanding C++ with a little determination. Openmp target regions are friendlier syntax. Julia and a bunch of python machine learning things have GPU backends. They're still niche but slowly we make progress.
Folding@Home
And the worst thing is that if execs etc. think that they can sit back down and breathe easy because they have a path towards even more infinite money in their chip business now that Moore's law isn't constraining that anymore, real R&D investments into actual advancements might not get the same attraction anymore...
The reason Intel was "stuck" at 14 nm was because it took Extreme ultraviolet lithography (EUV) many years longer to become viable than was predicted. Prices may have more to do with the EUV market being dominated by ASML which has serious trouble producing lithography machines fast enough to meet demand.
No, Intel was "stuck" at 14 nm because they believed that they will succeed to scale down the transistor sizes a lot more, without using EUV, as Pat Gelsinger has just explained in a long interview in the Verge.

However they failed to implement with good results the methods that they had hoped to work, while the others, i.e. TSMC and Samsung had much more realistic roadmaps, which added EUV at the right moment.

Intel was not stalled by waiting for EUV, on the contrary they were not prepared for the transition that was necessary when EUV was eventually ready.

https://www.theverge.com/2022/10/4/23385652/pat-gelsinger-in...

If it’s not one thing it’s another.

If progress continues then they will need some other expensive machine. Either that or they’ll try to stretch the life of EUV the same way Intel tried to delay EUV with extreme multiple patterning.

It seems to me though that the ASML machines ought to get some competition from something more like a free electron laser.

> If manufacturing overhead is low, two chiplets give you twice the transistors at twice the cost

That doesn't take into account yields. One chip with twice the transistors is physically larger than two chips with half as many, and more likely to have a defect during production.

GPU prices are so insane because shareholders must make money, because NVDA miscalculated the crypto demand and doesn't want to hold the bags again (cf gtx 1060), and because NVDA artificially limited the supply of [rtx 30xx] cards.
The nm, today, is a marketing term, divorced from the real manufacturing process.