Hacker News new | ask | show | jobs
by 323 1416 days ago
Wasn't this the Gentoo philosophy?

You downloaded the system source code and recompiled everything targeted to your exact CPU specs.

1 comments

Well, its a bit different when you design a CPU around the Gentoo philosophy as opposed to an OS around the Gentoo philosophy.

When a CPU is designed around that, everyone needs to recompile each generation for maximum performance. The theory is that bytecode like Java recompiles efficiently though (or perhaps NVidia PTX bytecode, a SIMD bytecode that recompiles each NVidia GPU generation for a higher performance example with more support from the hardware).

------

Intel Itanium was a 200x era design that was supposed to be like this, but x86-64 from AMD ended up being faster in practice.

NVidia's PTX really changed things, as well as the habit of GPU programmers for writing very, very small "programs" (called kernels) that's managed by a separate chip (IE: cpu manages the kernels, compiles / calls them as appropriate, etc. etc.). It works out in GPU land, but maybe will never work in CPU land (unless CPU-land picks up upon this kernel-invoke abstraction? Intel ispc for example has the model, as does OpenMP target offload... and thread-pools and Go arguably have it as well)

A CPU built around the Gentoo philosophy would look like https://github.com/SpinalHDL/VexRiscv ;). Don't want an MMU? Fine. Need a larger RAM interface? You got it. Barrel ALU for DSP? Sure.

Interpreted languages work by consolidating all of the optimization effort in the interpreter. This is similar to how CPUs work now, instead of extremely specific optimizations that are hard to create distributed among all code we use very general optimizations that push the limits of mathematics that is centralized in a CPU.

-----

Itanium had a lot of contemporary issues that made it not work. I would certainly blame Intel's business practices and reputation for a large part of it. There are likely niches for such processors. The VLIW is useful for DSP or graphics. Indeed, the only extant VLIW (that I know of) processor is the Russian Elbrus. I think the VLIW is only included to let them reuse a lot of the core logic of the CPU to drive a DSP engine, useful for radar and scientific simulation, though the sci sim would probably use commercial hardware which would be faster.

It works on GPUs because they're doing DSP, basically. We could have weirder topologies for GPUs however, like a massive string of ALUs driven off an embedded core, so you try to kachunk all your data in a single clock domain after configuring the ALU string.