| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by shriver 2851 days ago

> a new architecture that could in one fell swoop kill off the general purpose processor as a concept and the X86 instruction set as the foundation of modern computing.

Do you want me to think you're a credulous idiot? Because this is how you acheive that.

Okay, so laying aside the bizarrely stereotypical tech journalism. From what I understand there are a number of problems with this that need addressing:

If you create a custom compute unit layout for a specific data flow diagram it's very difficult to identify which layout is most efficient, and then when you want to optimize for higher performance it's almost impossible - because you don't know what you're targetting. It may be your optimization pushes your design to a different layout completely and all the cost functions are impossible to know. You end up with too many free variables to optimize for. We're very good at taking a fixed design like a CPU and then taking a program and jamming it in to that paradigm.

The second problem is that either you need 1 architecture that will dynamically reconfigure to different graphs or you needs lots of architectures. They seem to be going for the 'Spin 100 designs' path -so firstly, how is a customer meant to know which of those designs to actually buy, what happens if their design evolves from 1 design to another? Secondly, how is this cost effective? There's a good reason why Intel only spins a handful of designs per CPU generation.

The third problem is that if you have a custom compute unit layout and your program doesn't fit to it well it's not like a CPU. You can't re-order operations to maximally use the units, the bits that aren't useful are just dead silicon - and from history it seems like the killer is that dead silicon tends to be a LOT of silicon for any given program.

To be honest, this is a very well understood problem, and there are good reasons why it hasn't worked so far, and this article doesn't really give us any information on why it would work this time.

1 comments

saas_co_de 2851 days ago

> it's very difficult to identify which layout is most efficient

Don't worry. The compiler will figure it out. And this time the compiler will have AI™.

> the killer is that dead silicon tends to be a LOT of silicon for any given program

Part of the idea here is that the cost dynamic has changed. Silicon is cheap compared to power so even if you have lots of chips not being used at any given time as long as they can be fully powered off total system cost (capex+opex) is still better.

> why it would work this time

The difference is scale. If you are running millions of CPUs and adding 100k's per month then something like this could work, assuming the AI™ magic that figures out which new chips to build.

Intel is talking up their grandiose vision but practically this is the same as AMD's chiplets on active interposers (https://spectrum.ieee.org/tech-talk/semiconductors/design/am...).

The physical technology is real and probably coming soon but it will be limited to incremental improvements to the existing CPU/GPU compute architecture from increasing bandwidth and decreasing latency until the magic compilers arrive.

shriver 2851 days ago

If I'm understanding you right, what you're suggesting is that the individual silicon modules will be fixed, but they will be contected in a single package in lot's of different ways.

If that's correct I'd love to see the cost of making the trip between the modules. I've got to imagine the cost of that is just huge. The interconnect is also crazy - it's easy to do complex routing tasks on silicon at high performance. I don't know how you acheive that in a scalable fashion between silicon.