Hacker News new | ask | show | jobs
by Traster 1996 days ago
I hate to be that bucket of cold water, but there's multiple reasons FPGAs haven't been successful in package with CPUs. Firstly, the costs of embedding the FPGA - FPGAs are relatively large and power hungry (for what they can do), if you're sticking one on a CPU die, you're seriously talking about trading that against other extremely useful logic. You really need to make a judgement at purchase time whether you want that dark piece of silicon instead of CPU cores for day to day use.

Secondly, whilst they're reconfigurable, they're not reoconfigurable in the time scales it takes to spawn a thread, it's more like the same scale of time to compile a program (this is getting a little better over time). Which makes it a difficult system design problem to make sure your FPGA is programmed with the right image to run the software programme you want. If you're at that level of optimization, why not just design your system to use a PCI-E board, it'll give you more CPU, and way more FPGA compute and both will be cheaper because you get a stock CPU and stock FPGA, not some super custom FPGA-CPU hybrid chip.

Thirdly the programming model for FPGAs are fundamentally very different to CPUs, it's dataflow, and generally the FPGA is completely deterministic. We really don't have a good answer for writing FPGA logic to handle the sort of cache hierarchy, out of order execution that CPUs do. So you're not getting the same sort of advantage that you'd expect from that data locality. It's very difficult to write CPU/FPGA programs that run concurrently, almost all solutions today run in parallel - you package up your work, send it off to the FPGA and wait for it to finish.

Finally, as others have said - the tools are bad. That's relatively solvable.

For me, it boils down to this, if you have an application that you think would be good on the same package as a CPU, it's probably worth hardening it into ASIC (see: error correction, Apple's AI stuff). If you have an application that isn't, then a PCI-E card is probably a better bet - you get more FPGA, more CPU and you're not trading the two off.

5 comments

ASICs only make sense if you have high volume. PCI-e takes a lot of resources/space. The sweet spot for FPGA-CPU hybrid chips are embedded devices that are latency sensitive. For example, time-of-flight sensors and specialty cameras.
I guess to overcome the reconfiguration latency others have mentioned, the use case would be systems that configure their custom instructions once on boot and then the software just sees a cpu as normal, just with these custom instructions. Ie not intended for reconfiguration on context switch.
I definitely agree that a PCI-E card is preferable. Hell even if you have it in CPU, you probably want it sat on the PCI-E bus anyways so it can P2P DMA with other hardware.

Also (not disagreeing but I'm curious), last time I checked FPGAs could pull off some level of partial reconfiguration in the millisecond and sub millisecond ranges. I may be a bit off on these times but I saw them in a research paper a few years back. What types of speed would be necessary for CPUs to actually be able to benefit from a small FPGA onboard (rather than on an expansion card) with all the context switching.

High end FPGAs are theoretically capable of millisecond fast partial reconfigurations but doing so requires making a lot of tradeoffs that just highlight the impedance mismatch between the generic nature of CPUs and the purgatory that is FPGAs. The more of the FPGA you want to reconfigure, the longer it takes (stop the world, depending on which parts its touches) and unless the reconfigured portion is limited to a standard bus, the reconfiguration won't work (or you have to reconfigure more of the design to deal with different interfaces, timings, etc. blowing up reconfiguration time and defeating the purpose). All of the bitstreams have to be compiled ahead of time as well.

Unless latency is so critical that the speed of light is the limiting factor, partial reconfiguration just replaces PCIe with a much harder to work with AXI interconnect (or similar, but it always end up being AXI...).

It's easier to provide "custom instructions" and only accelerate CPU bottlenecks if you don't have PCIe as a massive bottleneck. If you are using an accelerator behind a bus you always have to make sure there is enough work for the accelerator to justify a data transfer. GPUs are built around the idea of batching a lot of work and running it in parallel. You can make an FPGA work like that but you are throwing away the low latency benefits of FPGAs.
Even the best-case scenarios for integrating a FPGA onto the same die as CPU cores would still have the FPGA separate from the CPU cores. It's really not possible to make an open-ended high bandwidth low latency interface to a huge chunk of FPGA silicon part of the regular CPU core's tightly-optimized pipeline, without drastically slowing down that CPU. The sane way to use an FPGA is as a coprocessor, not grafted onto the processor core itself. Then, you're interacting with the FPGA through interfaces like memory-mapped IO whether it's on-die, on-package, or on an add-in card.
> It's really not possible to make an open-ended high bandwidth low latency interface to a huge chunk of FPGA silicon part of the regular CPU core's tightly-optimized pipeline

That's what's interesting about the article, because that's what the patent is about: "implementing as part of a processor pipeline a reprogrammable execution unit capable of executing specialized instructions".

Yeah, worth mentioning highly optimized FPGA designs run at up to 600MHz (or to put it another way, 400MHz lower than what Intel advertised 4 years ago). So at a minimum, you're going to clock cross, have a >10 cycle pipeline at CPU speeeds (variable clock) and clock cross back.
Replace pcie with AXI and that seems to be pretty close to what zynq/cycle v soc have today on the same package?
The downside of PCIe is PCIe is very complex. And the tools make interfacing with it bewildering. I really want a PCIe FPGA that looks to me like data magically appears on an AXI bus.