| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dhruvdh 1996 days ago

I can't help but think most commentators haven't actually read the article or the patent. This isn't about having an FPGA embedded into the CPU or near the CPU, it's about having a programmable FPGA like execution unit that can be programmed to be say a 4-bit floating point adder, or any other weird execution unit one might need.

Why is this important? Have a program that does a lot of integer multiplications? Let's program all of these programmable execution units to multiply integers on the fly, etc. Now your integer multiply throughput is higher, as per the current program's needs.

Have lots of weird old x86 instructions you are forced to support but no one actually uses? Don't waste transistors on them just program an execution unit to execute that instruction on the fly, etc.

I think it's great, and that most people are missing the point.

7 comments

ajross 1996 days ago

> Have lots of weird old x86 instructions you are forced to support but no one actually uses? Don't waste transistors on them

That's been the role of microcode for like three decades now. Why does it matter if the instruction no one uses is implemented with FPGA gates or uops? No one uses them.

webmobdev 1996 days ago

Theoretically, Now you can create "micro-codes" in the CPU for your specific needs - e.g. scientists do a lot of calulation and would like a processor optimised for that. Now they can use the FPGA to do it. You want a CPU instruction that is optimised for something else - you can program the FPGA for that.

asimpletune 1996 days ago

Or maybe a compiler could recognize that optimization is possible and create it for you

Kuinox 1996 days ago

More likely a JIT can do that.

LeifCarrotson 1996 days ago

The FPGA can't be reprogrammed fast enough for the JIT approach to work, unless you're running computations that take many minutes or hours. I suppose that does apply to some workloads, but it would be tough to ask your JIT to solve the halting problem and guess whether a workload will last 10 seconds or take longer than that.

penteract 1996 days ago

For a JIT, the easy way to guess is to wait until it's already taken 10 seconds and if it hasn't stopped, assume it will take at least 10 more.

Kuinox 1995 days ago

I would say that program that are run and exit immediatly are a minority of what is produced with languages. A lot of web server, services, gui are produced in JIT languages and have a timespan of multiple minutes. AFAIK FPGA reprogrammation time depends on the size of your edit, and your hardware, the article says that they expect to reprogramm it on a program load, so I don't think it will be that slow.

bcrosby95 1995 days ago

Why can't a JIT program the FPGA based upon previous runs through the same set of code? IIRC the JVM won't rest on its final optimizations until it runs a chunk of code hundreds (or thousands? I forget) of times.

snicker7 1995 days ago

In scientific computing, that is the typical workload. That is why, say, Julia exists despite having a ridiculous JIT overhead.

chippiewill 1995 days ago

Isn't that a general warm-up problem with JITs though?

robomartin 1995 days ago

I've been doing that for years. I have created custom microcoded CPU's in FPGA's for tasks where it would provide an advantage. One example I remember was a microcoded real time image warping engine.

pjmlp 1995 days ago

Six actually, given that microcode was the approach taken on most mainframes that started around Burroughs timeframe.

mhh__ 1996 days ago

> Don't waste transistors on them just program an execution unit to execute that instruction on the fly, etc.

Possible, but the "x86" part is already a big decoder in front of a murky processor underneath so this is already what the CPU does - if you removed the reference to an FPGA, rewriting old x86 instructions in terms of "new" ones is microcode.

justinclift 1995 days ago

Wonder how that'd work - in practical terms - with modern systems?

eg, you could be running (say) 3 or 4 primary applications at the same time. Which one gets to use the FPGA pieces, or are they re-written every time, on every context switch? ;)

Re-writing them on every context switch sounds extremely unlikely, so it'd be more some kind of resource locking thing instead. Which could mean that FPGA-using applications at least start out being fairly niche, as only one could run "per core" or something.

Maybe dedicated cores per application instead or something?

freeqaz 1995 days ago

The work is already being put in with modern NUMA (non-uniform memory access) systems to pin apps to specific cores. This seems like it would overlap if this ended up being used in production.

apsient 1995 days ago

how about having FPGA execution units in addition to the normal units and os deciding how and who will use these new EUs based on the most CPU intensive apps running currently

laydn 1995 days ago

I think the point is that what you're describing existed for years. Any Xilinx Zynq chip or Altera SoC chip can do this already. Just because the data doesn't travel through the AXI/AMBA bus does not make this novel.

andy_ppp 1995 days ago

Of course it does because you get access to the CPU as well so you can hop from an instruction you built on the FPGA to another “silicon” instruction with the same registers and processor state. This is extremely clever and doesn’t involve shuffling code from the main processor over a slow bus, executing some stuff all on the fpga and shuffling it back.

nitrogen 1996 days ago

It sounds like that's exactly how another processor worked: https://news.ycombinator.com/item?id=25623763

shuringai 1995 days ago

this won't be a question of what the user wants to do with these parts. I bet it won't even be accessible for common programmers. Applications will simply constantly racing between each other and reprogram the field programmable part of my cpu every startup

av3csr 1996 days ago

Sounds like a more general version of what Sambanova is doing with their Dataflow unit.