| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kyboren 2163 days ago

GPUs work great for accelerating many applications, and it's true that that reduces interest in FPGAs. For applications that map well to GPUs, you're absolutely correct that the higher clock speeds (and greater effective logic area) make GPUs superior as accelerators.

However, some applications do not map well to GPUs. Particularly those applications with a great deal of bit-level parallelism can achieve enormous speedups with bespoke hardware. For those applications where it doesn't make sense to tape out an ASIC, FPGAs are beautiful--even if they only operate at a few hundred MHz.

I think the "programming model" is actually the biggest barrier to wider adoption. Your comment is suffused with what I believe is the source of this disagreement: The idea that one programs an FPGA. One designs hardware that is implemented on an FPGA. The difference may sound pedantic, but it really is not. There is a massively huge difference between software programming and hardware design, and hardware design is downright unnatural for software developers. They are completely different skill sets.

On top of that add all the headaches that come with implementing a physical device with physical constraints (the article complains about P&R times but this is far from the only burden) and it becomes clear that FPGAs are quite frankly a massive pain in the ass compared to software running on CPUs or GPUs.

2 comments

exmadscientist 2163 days ago

Very much this.

(Also, in general, FPGA tools are just some of the lowest quality garbage out there... and that is saying something. They're that bad. This is a completely unnecessary speedbump.)

The rebuttal to your objection is always tools like "HLS" (High-Level Synthesis), or in English it's "C to HDL" (FPGAs are 'programmed' in the two Hardware Definition Languages VHDL (bad) or Verilog (worse, but manageable if you learn VHDL first).) These are not programming languages, they are hardware definition languages. That means things like "everything in a block always executes in parallel". (Take that, Erlang?) In fact, everything on the chip always executes in parallel, all the time, no exceptions; you "just" select which output is valid. That's because this is how hardware works.

This model maps very, very poorly to traditional programming languages. This makes FPGAs hard to learn for engineers and hard to target for HLS tools. The tools can give you decent enough output to meet low- to mid-performance needs, but if you need high performance -- and if not, why are you going through this masochism? -- you're going to need to write some HDL yourself, which is hard and makes you use the industry's worst tools.

Thus, FPGAs languish.

jhj 2163 days ago

The biggest problem with HLS is that the HLS vendors still want to pretend it's "C++ / OpenCL / whatever to gates". What you get is pretending that there is no such concept of a clock even though you know it is always there and you care about it, and the language you are really writing consists mostly of all the crazy pragmas that you have to sprinkle over everything. It ends up failing on both counts: it isn't C++ to gates, and it is an exceedingly difficult HDL to use because it tries to hide the clock from you always even when you really need to do something with it (e.g., a handshake).

A weak spot of high-end commercial HLS tools (Catapult, Stratus) is in interfacing with the rest of the hardware world, and how the clock is handled (SystemC, you handle it yourself) or kind of vaguely (Catapult's ac_channel). Getting HLS to deal with pipeline scheduling is great, but sometimes you want to break through and do something with the clock. Want to write a memory DMA in HLS? Talk AXI? Build a NoC in HLS? Build even something like a CPU in HLS? Interface with "legacy" RTL blocks, whether combinational or straight pipeline or with ready/valid interfaces or whatever? These things are sort of/just feasible at present with these commercial HLS tools, but very very hard (I've tried it).

If they want to stick with it, I think C++11 could provide a superior type-safe metaprogramming facility for building hardware (compared to the extremely primitive metaprogramming and lack of type safety notions in SystemVerilog) or generators such as Chisel or the hand-written Perl/Python/TCL/whatever ones in use at most companies, but sometimes you need to break down and do something with the clock or interface with things that care about a clock, much in the same way that one would put inline asm statements in code. I want to do that, but not have to deal with the clock 95% of the time when I don't really need to, which is where the generators fail (let the tool determine the schedule most of the time). HLS needs to sit between the two: not a generator (glorified RTL), but not "pretend you write untimed C++ all the time" (not hardware at all).

jcranmer 2163 days ago

Again, a counterpoint:

I worked on hardware for something akin to a FPGA on a much coarser granularity (kind of like coarse-grained reconfigurable arrays)--close enough that you have to adapt tools like place-and-route to compile to the hardware. The programming for this was mostly driven in pretty vanilla C++, with some extra intrinsics thrown in. This C++ was close enough to handcoded performance that many people didn't even bother trying to tune their applications by resorting to hand-coding in the assembly-ish syntax.

This helped bolster my opinion that FPGAs aren't really the answer that most people are looking for, and that there are useful nearby technologies that can leverage the benefits of FPGAs while having programming models that are on par with (say) GPGPU.

kyboren 2163 days ago

For sure. FPGAs are probably not the answer that most people are looking for. FPGAs are but one point in the trade-off space, and they're not one you jump to "just because".

> [...] there are useful nearby technologies that can leverage the benefits of FPGAs while having programming models that are on par with (say) GPGPU

I think CGRAs are really cool but they're even more niche, and I suspect your original point about GPUs eating everyone's lunch applies particularly strongly to CGRAs. The point is well taken, though, and I don't necessarily disagree.

panpanna 2163 days ago

> FPGA tools are just some of the lowest quality garbage out there

I think things are about to change thanks to yosys and other open source tools.

> VHDL (bad) or Verilog (worse,

VHDL (and its software counterpart Ada) are very well thought and great to use once you get to know them (and understand why they are the way they are). Yeah, they are a bit verbose but I prefer a strong base to syntactic sugar.

adwn 2163 days ago

> VHDL (and its software counterpart Ada) are very well thought and great to use once you get to know them (and understand why they are the way they are). Yeah, they are a bit verbose but I prefer a strong base to syntactic sugar.

As a professional FPGA developer: VHDL (and Verilog even moreso) are bad [1] at what they're used for today: implementing and verifying digital hardware designs. In fact, they're at most moderately tolerable at what they were originally intended for: describing hardware.

[1] They're not completely terrible – a completely terrible idea would be to start with C and try to bend it so that you can design FPGAs with it...

tinus_hn 2162 days ago

So what’s better?

fanf2 2160 days ago

I've heard good things about Bluespec. It is used for Cambridge's CHERI capability architecture extensions, for example - https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/

roastsquirrel 2163 days ago

Parts of VHDL leave a little to be desired but overall I find it to be a really great language. To the extent I bought Ada 2012 by John Barnes and I kind of like that too after coding in C/C++ etc, but maybe I'm now biased after many years of VHDL coding :) It's not uncommon to see "VHDL is bad" and such like, and I do wonder what the reasons are for those comments.

adwn 2162 days ago

> It's not uncommon to see "VHDL is bad" and such like, and I do wonder what the reasons are for those comments.

VHDL is bad because it's bad at prototyping and implementing digital hardware [1]. One reason why it's bad at that task is the mismatch between the hardware you want and the way you have to describe it in the language. For example: You want a 32-bit register x which is assigned the value of a plus b whenever c is 0, and you want its reset value to be 25. VHDL code:

    signal x: unsigned(31 downto 0);
    ...
    process (clk, rst)
    begin
        if rst then
            x <= to_unsigned(25, x'length);
        elsif rising_edge(clk) then
            if c = '0' then
                x <= a + b;
            end if;
        end if;
    end;

The synthesis software has to interpret the constructs you use according to some quasi-standard conventions, and will hopefully emit those hardware primitives you intended. I say "hopefully", because of the many, many footguns arising from those two translation steps.

[1] Okay, I concede that in theory, there might be a use case where VHDL is perfectly suited for, which would make VHDL a not-bad language. But designing digital hardware is not such a use case.

panpanna 2162 days ago

Writing this with good intentions, not trying to start a fight...

---

There are some minor issues with your code that shows you are probably a verilog/SV guy and not an experienced VHDL guy.

Please read Andrew Rushtons "VHDL for Logic Synthesis". I also recommend you read on VHDLs 9-valued logic and why it was designed this way and how it differs from verilogs Bit.

exmadscientist 2162 days ago

As someone who just said that exact thing upthread, half of it is general curmudgeonry. VHDL is not a terrible language, though it does have terrible tools. The IDE side of things is a big opportunity to improve the language. Making refactoring easier by not needing to manually touch up three different files to fix one name is a huge help. (And the IDEs have probably improved in recent times; I've done mostly hardware recently.) The compilers/synthesizers... those are vendor crud and so dragons lie there. VHDL-2008 support would go a long way to improving life....

froh 2162 days ago

If IDE support for basics is an issue,like consistent renaming, then language server protocol support will help:

https://github.com/ghdl/ghdl-language-server

Edit: typo in url

kyboren 2163 days ago

> The rebuttal to your objection is always tools like "HLS"

Yup. I know HLS has gotten a lot better recently but my impression is that, somewhat like fusion, HLS as a first-class design paradigm is always a decade away.

> FPGA tools are just some of the lowest quality garbage out there

Absolutely. I think the problem is vendors see FPGA tooling as a cost center and a necessary evil in order to use their real products, the chips themselves. Users are also highly technical and traditionally have no alternative, so (mostly) working but poor-quality software is simply pushed out the door. "They'll figure it out".

Finally, to expand on the difficulties imposed by physical constraints, I think another huge blocker to wide adoption is that FPGAs are physically incompatible. I cannot take a bitstream compiled for one FPGA and program it to any other FPGA. Hell, I can't even take a bitstream compiled for one FPGA and use that bitstream for any other device in the same device family. Without some kind of standardized portability, FPGAs will remain niche devices used only for very specific applications.

s_gourichon 2163 days ago

> cannot take a bitstream compiled for one FPGA and program it to any other FPGA.

Like considering dumping memory content on a PC and reinject it on another with different RAM layout and devices and complaining the OS and programs can't continue running? Is that a sane expectation?

There are upstream formats targeting FPGAs that can be shared, although yes redoing place and route is slow.

Should manufacturers provide new formats closer to final form yet would allow binaries that can be adjusted, kind of like .a .so or even llvm?

Alternatively, would building whole images for many families of FPGA make sense? Feels like programs distributed as binaries for p OS variants times q hardware architectures, each producing a different binary... random example https://github.com/krallin/tini/releases/tag/v0.19.0 has 114 assets.

ianhowson 2163 days ago

> bitstream ... Is that a sane expectation?

No. Bitstream formats are not in any way compatible across devices. Because timing is a factor, even if you had the same physical layout of LUTs and routing, it's unlikely that your design would work.

(From parent)

> use that bitstream for any other device in the same device family

Not at the bitstream level. However, you can take a place&routed chunk of logic and treat it as a unit. You can replicate it (without repeating P&R), move it around, copy it onto other devices in the same family. This is super useful as most FPGA applications have large repeating structures, but P&R doesn't know that it's a factorable unit. It'll repeat P&R for each instance and you'll get unpredictable timing characteristics.

> Should manufacturers provide new formats closer to final form yet would allow binaries that can be adjusted, kind of like .a .so or even llvm?

> would building whole images for many families of FPGA make sense

You can license libraries that are a P&R'd blob and drop them into your design. There's no easy way to make this generalizable across devices without shipping the original RTL, and conversion from RTL->bitstream is where most of the pain lies.

kyboren 2163 days ago

> Like considering dumping memory content on a PC and reinject it on another with different RAM layout and devices and complaining the OS and programs can't continue running? Is that a sane expectation?

Even worse; it's more like that plus extracting the raw microarchitectural state of a CPU, serializing it in a somewhat arbitrary way, trying to shove that blob into a different CPU and still expecting everything to continue running.

I'm not necessarily complaining, just pointing out this significant difference WRT software programs running on CPUs.

> There are upstream formats targeting FPGAs that can be shared, although yes redoing place and route is slow.

Can you show me an example? I'd like to see this. You do not mean FPGA overlays, correct?

> Should manufacturers provide new formats closer to final form yet would allow binaries that can be adjusted, kind of like .a .so or even llvm?

Like you say, at the very least you will need to re-do place and route. But actually the problem is much worse than this. Different FPGAs have different physical resources. Not just differing amounts of logic area, but different amounts of block RAM, different DSP blocks and in varying numbers, high-speed transceivers, etc. This necessitates making different design trade-offs. Simply shoehorning the same design into different FPGAs, even if it were kind of possible, will not work well.

> Alternatively, would building whole images for many families of FPGA make sense?

Currently I think that's the only real option. But the extreme overhead, duplication of effort and maintenance burden make it very unattractive.

My napkin sketch is some sort of generalized array of partial reconfiguration regions with standardized resources in each region. Accelerator applications can distribute versions targeting different numbers of regions (e.g. one version for FPGAs supporting up to 8 regions, one for FPGAs supporting up to 16 regions, etc.). The FPGA gets loaded with a bitstream supporting a PCIe endpoint and management engine, and some sort of crossbar between regions. At accelerator load time, previously mapped, placed, and routed logical regions used in the application are placed onto actual partial reconfiguration regions and connections between regions are routed appropriately. The idea is to pre-compute as much of the work as possible, leaving a lower dimension problem to solve for final implementation. Timing closure and clock management are left as exercises for the reader :P.

monocasa 2163 days ago

> Can you show me an example? I'd like to see this. You do not mean FPGA overlays, correct?

Some of the coolest work to come out of the Chisel project is their intermediate representation FIRRTL.

phkahler 2163 days ago

Not sure why they think chip details and bitstreams need to be kept secret. If they would open up, people would make better tools for them.

imtringued 2162 days ago

Because competitors could make compatible chips.

vzidex 2163 days ago

>I think the problem is vendors see FPGA tooling as a cost center and a necessary evil

Yes to a degree, but another part of the problem is the "physical constraints" you mention. FPGA tooling has to solve multiple hard problems, on the fly, at large scale (some of the latest chips are edging up to 10M logic elements). Unfortunately for the FPGA industry, I think that this is unavoidable - though a lot of interesting work is being done around partial reconfiguration, which should allow for users to work with smaller designs on a large chip.

kyboren 2163 days ago

Well, that's an explanation for why FPGA compilation flows take so much time, but it's not a good explanation for why the software is so crap.

I think partial reconfiguration is really sexy, but it's been around for a long time. What's new and exciting there? Genuinely curious.

qppo 2163 days ago

> HLS as a first-class design paradigm is always a decade away.

What about Chisel?

henrikeh 2163 days ago

Chisel is not a HSL. Chisel is much closer to VHDL and Verilog, since the hardware is directly described.

qppo 2163 days ago

Chisel would allow me to write say, a codec algorithm and compile it into hardware, correct? As well as specify the hardware that is necessary to describe it?

I'm a casual in that space but I thought Chisel was an HDL that could be used to support HLS.

henrikeh 2162 days ago

And you do the same in VHDL and Verilog. And like in Chisel, you have to manually pipeline it and you can exactly control where registers are used and how resources are reused.

You could build something HLS like using Scala/JVM and Chisel, but Chisel itself is much closer to traditional HDLs.

https://en.m.wikipedia.org/wiki/High-level_synthesis

seldridge 2162 days ago

> These are not programming languages, they are hardware definition languages.

There's a subtle point in that Verilog/SystemVerilog and VHDL are also just not powerful languages. While parametric, they lack polymorphism, object oriented programming (excluding SV simulation-only constructs), functional programming, etc.

Your point about the abstraction being different is well taken---hardware description languages describe circuits and programming languages describe programs. However, it's exceedingly unfortunate that the industry is stuck in a rut of such weak languages and trying to explain that weakness to hardware engineers, who haven't seen anything else, runs into the "Blub paradox" (e.g., a programmer who only knows assembly can't evaluate the benefits of C++). [^1]

[^1]: http://www.paulgraham.com/avg.html

mikevin 2161 days ago

While there's plenty of room to improve a language like Verilog I fail to see how these paradigms would help me in RTL. What would polymorphism even look like in an environment without a concept of runtime? Can you elaborate and enlighten me?

Edit: Disclaimer, I'm well aware of the pros and cons of these paradigms in software development and use them plenty

seldridge 2151 days ago

(Sorry! Just saw this!)

Polymorphism makes it way easier to build hardware that can handle any possible data type. Things like queues and arbiters beg for type parameters (you should be able to enqueue any data). Without polymorphism you can make something parameterized by data width (and then flatten/reconstruct the data), but it's janky and you lose any concept of type safety (as you're "casting" to a collection of bits and then back).

There was some interesting work out of the University of Washington [^1] to build a "standard template library" using SystemVerilog. Polymorphism was identified as one of the shortcomings that made this difficult (Section 5: "A Wishlist for SystemVerilog"). [^2]

[^1]: https://github.com/bespoke-silicon-group/basejump_stl [^2]: http://cseweb.ucsd.edu/~mbtaylor/papers/BaseJump_STL_DAC_Sli...

imtringued 2162 days ago

Just let those programmers play around with Redstone in Minecraft before you hand them an FPGA. They'll understand it very quickly.

Stubb 2163 days ago

Another big advantage of FPGAs is low latency and the ability to hit precise timing deadlines. When working with radio hardware, you still need an FPGA for automatic gain control calculations and recording/playing out samples. Similarly, you need to do your CRC and other calculations in an FPGA if you need to immediately respond to incoming signals, such as the CTS->RTS->DATA->ACK exchange in 802.11.

daxfohl 2163 days ago

I think that's the big advantage of FPGA. If you need acceleration to hit a 10 microsecond latency target, FPGA is what you need. If your latency target is like a millisecond or longer, then GPU can handle a lot more throughput. But GPU can't typically give you a 10-us guarantee.

Okay, bit-banging is another advantage of FPGA that GPU doesn't do as well. There are a few things.

inaccel 2162 days ago

Regarding DNN inference FPGA can provide low latency AND higher throughput than GPUS.

If you want to compare apples-to-apples, we have done a comparison with realistic (and not synthetic) data regarding the performance of GPUs and FPGAs.

https://medium.com/@inaccel/faster-inference-real-benchmarks...

daxfohl 2162 days ago

Ugh, ad spam taking over HN.