Hacker News new | ask | show | jobs
by dhruvdh 1996 days ago
I can't help but think most commentators haven't actually read the article or the patent. This isn't about having an FPGA embedded into the CPU or near the CPU, it's about having a programmable FPGA like execution unit that can be programmed to be say a 4-bit floating point adder, or any other weird execution unit one might need.

Why is this important? Have a program that does a lot of integer multiplications? Let's program all of these programmable execution units to multiply integers on the fly, etc. Now your integer multiply throughput is higher, as per the current program's needs.

Have lots of weird old x86 instructions you are forced to support but no one actually uses? Don't waste transistors on them just program an execution unit to execute that instruction on the fly, etc.

I think it's great, and that most people are missing the point.

7 comments

> Have lots of weird old x86 instructions you are forced to support but no one actually uses? Don't waste transistors on them

That's been the role of microcode for like three decades now. Why does it matter if the instruction no one uses is implemented with FPGA gates or uops? No one uses them.

Theoretically, Now you can create "micro-codes" in the CPU for your specific needs - e.g. scientists do a lot of calulation and would like a processor optimised for that. Now they can use the FPGA to do it. You want a CPU instruction that is optimised for something else - you can program the FPGA for that.
Or maybe a compiler could recognize that optimization is possible and create it for you
More likely a JIT can do that.
The FPGA can't be reprogrammed fast enough for the JIT approach to work, unless you're running computations that take many minutes or hours. I suppose that does apply to some workloads, but it would be tough to ask your JIT to solve the halting problem and guess whether a workload will last 10 seconds or take longer than that.
For a JIT, the easy way to guess is to wait until it's already taken 10 seconds and if it hasn't stopped, assume it will take at least 10 more.
I would say that program that are run and exit immediatly are a minority of what is produced with languages. A lot of web server, services, gui are produced in JIT languages and have a timespan of multiple minutes. AFAIK FPGA reprogrammation time depends on the size of your edit, and your hardware, the article says that they expect to reprogramm it on a program load, so I don't think it will be that slow.
Why can't a JIT program the FPGA based upon previous runs through the same set of code? IIRC the JVM won't rest on its final optimizations until it runs a chunk of code hundreds (or thousands? I forget) of times.
In scientific computing, that is the typical workload. That is why, say, Julia exists despite having a ridiculous JIT overhead.
Isn't that a general warm-up problem with JITs though?
I've been doing that for years. I have created custom microcoded CPU's in FPGA's for tasks where it would provide an advantage. One example I remember was a microcoded real time image warping engine.
Six actually, given that microcode was the approach taken on most mainframes that started around Burroughs timeframe.
> Don't waste transistors on them just program an execution unit to execute that instruction on the fly, etc.

Possible, but the "x86" part is already a big decoder in front of a murky processor underneath so this is already what the CPU does - if you removed the reference to an FPGA, rewriting old x86 instructions in terms of "new" ones is microcode.

Wonder how that'd work - in practical terms - with modern systems?

eg, you could be running (say) 3 or 4 primary applications at the same time. Which one gets to use the FPGA pieces, or are they re-written every time, on every context switch? ;)

Re-writing them on every context switch sounds extremely unlikely, so it'd be more some kind of resource locking thing instead. Which could mean that FPGA-using applications at least start out being fairly niche, as only one could run "per core" or something.

Maybe dedicated cores per application instead or something?

The work is already being put in with modern NUMA (non-uniform memory access) systems to pin apps to specific cores. This seems like it would overlap if this ended up being used in production.
how about having FPGA execution units in addition to the normal units and os deciding how and who will use these new EUs based on the most CPU intensive apps running currently
I think the point is that what you're describing existed for years. Any Xilinx Zynq chip or Altera SoC chip can do this already. Just because the data doesn't travel through the AXI/AMBA bus does not make this novel.
Of course it does because you get access to the CPU as well so you can hop from an instruction you built on the FPGA to another “silicon” instruction with the same registers and processor state. This is extremely clever and doesn’t involve shuffling code from the main processor over a slow bus, executing some stuff all on the fpga and shuffling it back.
It sounds like that's exactly how another processor worked: https://news.ycombinator.com/item?id=25623763
this won't be a question of what the user wants to do with these parts. I bet it won't even be accessible for common programmers. Applications will simply constantly racing between each other and reprogram the field programmable part of my cpu every startup
Sounds like a more general version of what Sambanova is doing with their Dataflow unit.