Hacker News new | ask | show | jobs
by speedplane 1882 days ago
The M1 processor is a direct result of the death of Moore's law. It's an amazing processor, but a sad sign of things to come.

The performance gains from Moore's law have typically come from shrinking die size. That has ended, you can't juice more performance from general purpose CPUs. If general purpose processors no longer advance quickly enough, the only way to get performance gains is to build custom chips for common specific tasks. That's what we're seeing now with the M1. The M1 buys us a few more years of exponential-appearing performance gains, but it's a one-trick pony. You can turn code into an ASIC once, but after that, your performance is at the mercy of the foundry and physics.

The death of Moore's law has many consequences, the rise of ASICs and custom co-processor chips is just one of them.

8 comments

> The M1 processor is a direct result of the death of Moore's law.

I know most people misunderstand Moore's law, but this is HN, so I expect better:

https://en.wikipedia.org/wiki/Moore%27s_law#/media/File:Moor...

Moore's law is quite alive and showing no signs of problems.

> The performance gains from Moore's law have typically come from shrinking die size.

Moore's Law is about number of transistors. Not about their size, and not about performance.

And it's ESPECIALLY not about linear core performance.

> That has ended, you can't juice more performance from general purpose CPUs.

You don't need to, they're fast enough. Performance is expanding in other areas like GPU and ML.

> The death of Moore's law has many consequences, the rise of ASICs and custom co-processor chips is just one of them.

No, Moore's law is the very thing supporting them... You need extra transistors for those co-processors.

> You don't need to, they're fast enough.

You had me with everything except this. Any time someone claims the current state of computing hardware is "enough" I'm reminded of that fake 640K quote. There is no indication we are running out of applications for more compute power.

I'm not saying we're running out of applications for more compute power.

We're specifically running out of reasons to want faster linear (per core) general-purpose performance (in fact I'd say this happened some time ago). Everything else we get from here on in terms of smaller process etc. is just a bonus. We don't fundamentally need it to keep evolving our hardware for our ever-growing computation needs.

And that's because as our problems multiply and grow, parallel execution and heterogenous cores tend to solve our problems much more efficiently on the watt, than asking for "more of the same, but faster".

There's this Ford quote "if I had asked what people want, they'd have said faster horses". Fake or not, it reflects our tendency to stare at the wrong variables and miss the forest for the trees. The industry is full of languages utilizing parallel/heterogenous execution and you don't need a PhD to use one anymore.

CPUs are effectively turning into "controllers" that prepare command queues for various other types of processing units. As we keep evolving our GPU/ML/etc. processing units, CPUs will have less to do, not more. In fact, I expect CPUs will get simpler and slower as our bottlenecks move to the specialized vector units.

Production quality multiplatform software is much much harder and less fun to make for GPUs due to inferior DX, rampant driver stack bugs unique for each (gpu vendor, os) combination, sorry state of GPU programming languages, poor os integration, endemic security problems (eg memory safety not even recognised as a problem yet in gpu languages), fragmentation of proprietary sw stacks and APIs, etc. Creation of performance oriented software is often bottlenecked by sw engineering complexity, effort and cost, and targeting GPUs multiplies these problems.

tldr; we are not running out of reasons for wanting faster CPUs. GPUs are a crap, faustian bargain substitute for them.

https://scicomp.stackexchange.com/a/1395

Explains it better than I could in a HN reply.

Assuming our desired work to be done continues to grow, faster core speed* will eventually matter.

* technically, net instructions per second after prediction/compute ahead etc

Honestly that's a bit too abstract to make sense of.

As a programmer to another, I'd rather ask... what's one example of a problem we have today that needs faster linear performance than our best chips (not in a nice-to-have way but in a must-have way).

I'd rule out all casual computing like home PCs, smartphones, and so on, because honestly we've been there for years already.

Also due to decades of bias we have serialized code in our programs that doesn't have to be serial, just because that's "normal" and because it's deemed easier. Also we have a huge untapped potential of better performance by being more data-oriented. None of this requires faster hardware. It doesn't even require new hardware.

But anyway, I'm open to examples.

> I'd rule out all casual computing like home PCs, smartphones, and so on, because honestly we've been there for years already.

Casual computing can definitely be a lot better than where we are today[0][1].

The software business has moved to a place where it’s not really practical to program bare metal silicon in assembly to get screaming fast performance. We write software on several layers of abstraction, each of which consumes a ton of compute capacity.

We have resigned to live with 100ms latencies in our daily computing. This is us giving up on the idea of fast computers. It should not be confused with actually having a computer where all interactions are sub 10ms (less than 1 frame refresh period at 90fps).

[0]. https://danluu.com/input-lag/

[1]. https://www.extremetech.com/computing/261148-modern-computer...

Linear compute is the ideal solution. Parallelization is a useful tool when we run up against the limitations of linear compute, but it is not ideal. Parallelization is simply not an option for some tasks. Nine mothers cannot make a baby in a month. It also adds overhead, regardless of the context.

Take businesses for example. Businesses don't want to hire employees to get the job done. They want as few workers as possible because each one comes with overhead. There's a good reason why startups can accomplish many tasks at a fraction of what it would cost a megacorp. Hiring, management, training, HR, etc... they are all costs a business has to swallow in order to hire more employees (ie parallelize).

This is not to say parallelization is bad. Given our current technological limitations, adding more cores and embracing parallelization where possible is the most economical solution. That doesn't faster linear compute is a "nice to have".

Virtual reality, being so latency-sensitive, is always going to be hungry for faster raw serial execution. It seems like something that ought to parallelize alright (one GPU per eye!) but my understanding is that there are many non-linearities in lighting, physics, rendering passes, and so on that create bottlenecks.
On his recent Lex Fridman podcast appearance, Jim Keller speaks to exactly this mindset. He says that they've been heralding the death of Moore's law since he started and that the "one trick ponies" just keep coming. He says he doesn't doubt that they will continue.
> they've been heralding the death of Moore's law since he started and that the "one trick ponies" just keep coming. He says he doesn't doubt that they will continue.

The situation is clearly far worse than what you suggest. Back in the 1990s and early 2000s, apparent computer performance was doubling roughly every two years. Your shiny new desktop was obsolete in 24 months.

Today, we're lucky to get a 15% gain in two years. The "one-trick ponies" help narrow the "apparent performance" gap, but by definition, are implemented out of desperation. They aren't enough to keep Moore's law alive (it's already dead), and their very existence is evidence of the death of Moore's law.

Moore's law is only about the number of transistors per chip doubling every 24 months, not about the performance. Seeing that the trend is still happening, Moore's law is not dead, as so many have claimed.
But what is it good for, if it does not improve performance? For example, increasingly larger and larger part of transistors on a chip is unused at a given time, due to cooling issues.
It makes other workloads more economical.

And as long as there's something to gain from going smaller/denser/bigger, and as long as the cost-benefit is good, we'll have bigger chips with smaller and denser features.

Sure, cooling is a problem, but it's not like we're even seriously trying. It's still just air cooled. Maybe we'll integrate microfluidic heat-pump cooling into chips too.

And it seems there's a clear need for more and more computing. The "cloud" is growing at an enormous rate. Eventually it might make sense to make a datacenter oriented well integrated system.

It obviously does improve performance, otherwise why would people be buying newer chips? :) It doesn't mean we'll see exponential performance increases though. In specialized scenarios, like video encoding and machine learning, we do see large jumps in performance.
I thought it was about number of transistors per chip at optimal cost level per transistor.
On the contrary, I think this is fantastic news:

- It means consumers won't have to keep buying new electronic crap every couple years. Maybe we can finally get hardware that's built to be modular and maintainable.

- It means performance gains will have to come from writing better software. Devs (and more importantly, the companies that pay devs) will be forced to care about efficiency again. Maybe we can kill Electron and the monstrosity of multi-MBs of garbage JS on every site.

The sooner we bury Moore's Law and the myth of "just throw more hardware at it" the better.

If you don't just look at the failing Intel, AMD has been doing 15-20% improvements year on year.
>Today, we're lucky to get a 15% gain in two years.

The 2012 MacBook Pro 15-inch I'm typing this on is about 700 on Geekbench single-core, while the 2019 16-inch is about 950. 35% "improvement" in seven years!

M1 13-inch is 1700 on single-core, which is why I hope to upgrade once the 16-inch Apple Silicon version comes out.

>The "one-trick ponies" help narrow the "apparent performance" gap, but by definition, are implemented out of desperation.

I don't think that's right. x86 hit an apparent performance barrier in the early 2000s, with the best available CPUs being Intel Pentium 4 and AMD Thunderbird, both horribly inefficient for the performance gains they eked out; those were very much one-trick ponies created from desperation. It took a skunkworks project by Intel Israel, which miraculously turned Pentium III into Core microarchitecture, to get out of the morass. Another meaningful leap occurred when going from Core Duo to Core i, but the PC industry has been stuck with Core i for almost a decade.

We've finally smashed past this with Apple Silicon, but it is certainly not a one-trick pony; Apple could sell it to the world tomorrow and have a line of customers going out the door, just like it could have sold the A-series mobile processors to rivals. AMD Ryzen isn't quite the breakthrough Apple Silicon is, but it is good enough for those who need x86.

Apple's M1 is a good processor, but the only reason it "smashed past" previous macbook single core results is apple was using older Intel lower powered processors.

It is not twice as fast as even mobile x86 stuff, as much as people seem to want to think otherwise.

Anecdata of one, but compiling our product at work on my three machines (a 2019 intel macbook pro, a 2020 10 core intel imac and an m1 mac mini), the macbook pro is the slowest, but the imac isn’t that much faster than the mini. it’s something like:

- macbook pro: 9 minutes

- mini: 5 minutes

- imac: 4 minutes

Where the M1 really blows any other CPU away is single-threaded performance; multi-threaded performance is just normal. So it's not surprising that it's not faster than your 10-core iMac when compiling (which I assume is using 100% of every core).

In fact, given that the M1 is an 8-core CPU and your iMac has a 10-core CPU, the fact that they take 5 and 4 minutes respectively to compile seems to indicate that they're fairly similar in multi-threaded performance (and the iMac wins only because it has more cores).

Is this a bad thing? This seems like a great outcome for consumers, and will reduce e-waste. I look forward to a future where people see less need to upgrade year after year.
Which is why Apple is moving revenue streams also to services.
This is false, computer performance has been doubling nearly every year. See for example https://www.top500.org/statistics/perfdevel/
How is this calculated? It isn't very clear. Is this representative of individual devices or is it caused by more of the same devices?
Even then Jim Keller is using a looser definition of Moore's law - i.e. he's saying there's a lot of scaling left rather than that the scaling will continue as it did in the past.
"Moore's law" was strictly about the average cost of a transistor, not performance in general.
I get your point but...

> The M1 processor is a direct result of the death of Moore's law.

It is a bit ironic since the M1 is a 5 nm processor, currently the finest process, and I think it plays no small part in its success. A very Moore's law-esque solution.

Death of moore’s law? Hmm. Meanwhile I just got a R9 5950X and it is drastically faster than my 5 year old i7.

There must be some doubling of transistors in there, right?

Also maybe buying a 5950X at the birth of a new generation of ARM CPUs wasn’t the wisest choice.

Or maybe it is, idk.

Mores Law as originally stated said transistor density doubled every 18-24 months. Using larger CPU’s for example let’s you have more transistors, but has nothing to do with Mores Law.

Clearly density has kept increasing, but the law refers to a rate of increase that we haven’t been able to meet. The original 386 released in 1985 had 275,000 transistors, using the slowest interpretation we would need to be at (2^18) = ~72+ Billion transistors today or (2^17) = 36+ Billion in 2019 which is close, but the chip would also need to be the size of a 386 which they aren’t.

AMD Epyc Rome is 1008 mm^2 vs a 386 at 104 mm^2. The M1 is 119 mm^2, but it’s only 16 Billion Transistors. As such it’s safe to say Mores Law is dead.

Did Mores Law take into account 3D density or was it just single layer compactness?
It’s per wafer area. Which effectively compresses the full 3D nature of modern chips into a 2D structure.
Back in the old days you'd get that sort of improvement in 2-3 years, not 5. I used to expect at least a 4x improvement on my last machine every time I upgraded.
Yeah, I bought myself a new PC two years ago or so to replace a 5+ year old one, and the difference was... okay? If it was twice as fast (mostly for gaming) I'd already be impressed.

Whereas back when (thinking of early 90's) you'd upgrade every three years and be taking massive leaps forward. 10x increase in disk space (40 MB to 500 MB), or going from diskettes (~1.5 MB? I don't even remember) to CD's (650MB). We went from Wolfenstein to Half-Life in just six years (it felt longer).

Maybe buying a 5950X at the birth of DDR5, PCIeGen5 and TSMC's 5nm wasn't the wisest choice. Ehhhh seems like all that new stuff would still take lots of time to actually get ready, and the 5950X is the best CPU now.
Ah well at least I can now run my test suite 3 times faster compared to my 16” i9 macbook pro so I’m happy.

From 60s to 20s every run is huge for me.

I expect the new 16" / 14" to have dual M1 cpus. It would solve the number of external display issue. Also, it would bump the RAM to 32GB.

Then the next step, a new Mac Pro, would have up to 4 M1 CPU's. Sounds very sweet to me.

No way the M1 supports "dual socket" configurations. Absolutely no way a configuration like that would "combine" the GPUs and display outputs. I'd bet money on Apple releasing a larger monolithic "M1X" or whatever for the large MBPs.
Is that stock, PBO or manual OC? It's quite wattage limited at stock, you might go significantly under 20s with PBO :D
5950x is the last greatest chip on the AMD4 platform. I think their would be enough demand for it in the future for the price to stay high.
Moore's law is safe for at least two generations of chips, for which there are processes developed.

As we speak people are putting together 3nm(TSMC) designs, which will ship once the infrastructure is there.

Scaling will continue for at least another 10 years.
The death of moore's law made us wonder - there is so much effort trying to optimise hardware, but less emphasis on making software more efficient. Our view is that there is a lot to do with regard to software efficiency to mitigate the limitations in hardware progress... See the company we founded in my profile, this was one of the drivers to build it
Software optimization depends a lot on economical reasons. Which is why it is so hard to prioritize over new features IMO.
Let’s not debate whether we really are at the end of Moore’s law (not a foregone conclusion, given that the M1 is the first CPU at 5nm)

Why do you find it sad that we now have a holistically designed system, rather than the glueing together of ever more powerful parts that desktop PCs have gotten away with for a few decades?