Hacker News new | ask | show | jobs
by imtringued 1999 days ago
It's easier to provide "custom instructions" and only accelerate CPU bottlenecks if you don't have PCIe as a massive bottleneck. If you are using an accelerator behind a bus you always have to make sure there is enough work for the accelerator to justify a data transfer. GPUs are built around the idea of batching a lot of work and running it in parallel. You can make an FPGA work like that but you are throwing away the low latency benefits of FPGAs.
1 comments

Even the best-case scenarios for integrating a FPGA onto the same die as CPU cores would still have the FPGA separate from the CPU cores. It's really not possible to make an open-ended high bandwidth low latency interface to a huge chunk of FPGA silicon part of the regular CPU core's tightly-optimized pipeline, without drastically slowing down that CPU. The sane way to use an FPGA is as a coprocessor, not grafted onto the processor core itself. Then, you're interacting with the FPGA through interfaces like memory-mapped IO whether it's on-die, on-package, or on an add-in card.
> It's really not possible to make an open-ended high bandwidth low latency interface to a huge chunk of FPGA silicon part of the regular CPU core's tightly-optimized pipeline

That's what's interesting about the article, because that's what the patent is about: "implementing as part of a processor pipeline a reprogrammable execution unit capable of executing specialized instructions".

Yeah, worth mentioning highly optimized FPGA designs run at up to 600MHz (or to put it another way, 400MHz lower than what Intel advertised 4 years ago). So at a minimum, you're going to clock cross, have a >10 cycle pipeline at CPU speeeds (variable clock) and clock cross back.