| GPUs work great for accelerating many applications, and it's true that that reduces interest in FPGAs. For applications that map well to GPUs, you're absolutely correct that the higher clock speeds (and greater effective logic area) make GPUs superior as accelerators. However, some applications do not map well to GPUs. Particularly those applications with a great deal of bit-level parallelism can achieve enormous speedups with bespoke hardware. For those applications where it doesn't make sense to tape out an ASIC, FPGAs are beautiful--even if they only operate at a few hundred MHz. I think the "programming model" is actually the biggest barrier to wider adoption. Your comment is suffused with what I believe is the source of this disagreement: The idea that one programs an FPGA. One designs hardware that is implemented on an FPGA. The difference may sound pedantic, but it really is not. There is a massively huge difference between software programming and hardware design, and hardware design is downright unnatural for software developers. They are completely different skill sets. On top of that add all the headaches that come with implementing a physical device with physical constraints (the article complains about P&R times but this is far from the only burden) and it becomes clear that FPGAs are quite frankly a massive pain in the ass compared to software running on CPUs or GPUs. |
(Also, in general, FPGA tools are just some of the lowest quality garbage out there... and that is saying something. They're that bad. This is a completely unnecessary speedbump.)
The rebuttal to your objection is always tools like "HLS" (High-Level Synthesis), or in English it's "C to HDL" (FPGAs are 'programmed' in the two Hardware Definition Languages VHDL (bad) or Verilog (worse, but manageable if you learn VHDL first).) These are not programming languages, they are hardware definition languages. That means things like "everything in a block always executes in parallel". (Take that, Erlang?) In fact, everything on the chip always executes in parallel, all the time, no exceptions; you "just" select which output is valid. That's because this is how hardware works.
This model maps very, very poorly to traditional programming languages. This makes FPGAs hard to learn for engineers and hard to target for HLS tools. The tools can give you decent enough output to meet low- to mid-performance needs, but if you need high performance -- and if not, why are you going through this masochism? -- you're going to need to write some HDL yourself, which is hard and makes you use the industry's worst tools.
Thus, FPGAs languish.