Hacker News new | ask | show | jobs
by exmadscientist 2163 days ago
Very much this.

(Also, in general, FPGA tools are just some of the lowest quality garbage out there... and that is saying something. They're that bad. This is a completely unnecessary speedbump.)

The rebuttal to your objection is always tools like "HLS" (High-Level Synthesis), or in English it's "C to HDL" (FPGAs are 'programmed' in the two Hardware Definition Languages VHDL (bad) or Verilog (worse, but manageable if you learn VHDL first).) These are not programming languages, they are hardware definition languages. That means things like "everything in a block always executes in parallel". (Take that, Erlang?) In fact, everything on the chip always executes in parallel, all the time, no exceptions; you "just" select which output is valid. That's because this is how hardware works.

This model maps very, very poorly to traditional programming languages. This makes FPGAs hard to learn for engineers and hard to target for HLS tools. The tools can give you decent enough output to meet low- to mid-performance needs, but if you need high performance -- and if not, why are you going through this masochism? -- you're going to need to write some HDL yourself, which is hard and makes you use the industry's worst tools.

Thus, FPGAs languish.

6 comments

The biggest problem with HLS is that the HLS vendors still want to pretend it's "C++ / OpenCL / whatever to gates". What you get is pretending that there is no such concept of a clock even though you know it is always there and you care about it, and the language you are really writing consists mostly of all the crazy pragmas that you have to sprinkle over everything. It ends up failing on both counts: it isn't C++ to gates, and it is an exceedingly difficult HDL to use because it tries to hide the clock from you always even when you really need to do something with it (e.g., a handshake).

A weak spot of high-end commercial HLS tools (Catapult, Stratus) is in interfacing with the rest of the hardware world, and how the clock is handled (SystemC, you handle it yourself) or kind of vaguely (Catapult's ac_channel). Getting HLS to deal with pipeline scheduling is great, but sometimes you want to break through and do something with the clock. Want to write a memory DMA in HLS? Talk AXI? Build a NoC in HLS? Build even something like a CPU in HLS? Interface with "legacy" RTL blocks, whether combinational or straight pipeline or with ready/valid interfaces or whatever? These things are sort of/just feasible at present with these commercial HLS tools, but very very hard (I've tried it).

If they want to stick with it, I think C++11 could provide a superior type-safe metaprogramming facility for building hardware (compared to the extremely primitive metaprogramming and lack of type safety notions in SystemVerilog) or generators such as Chisel or the hand-written Perl/Python/TCL/whatever ones in use at most companies, but sometimes you need to break down and do something with the clock or interface with things that care about a clock, much in the same way that one would put inline asm statements in code. I want to do that, but not have to deal with the clock 95% of the time when I don't really need to, which is where the generators fail (let the tool determine the schedule most of the time). HLS needs to sit between the two: not a generator (glorified RTL), but not "pretend you write untimed C++ all the time" (not hardware at all).

Again, a counterpoint:

I worked on hardware for something akin to a FPGA on a much coarser granularity (kind of like coarse-grained reconfigurable arrays)--close enough that you have to adapt tools like place-and-route to compile to the hardware. The programming for this was mostly driven in pretty vanilla C++, with some extra intrinsics thrown in. This C++ was close enough to handcoded performance that many people didn't even bother trying to tune their applications by resorting to hand-coding in the assembly-ish syntax.

This helped bolster my opinion that FPGAs aren't really the answer that most people are looking for, and that there are useful nearby technologies that can leverage the benefits of FPGAs while having programming models that are on par with (say) GPGPU.

For sure. FPGAs are probably not the answer that most people are looking for. FPGAs are but one point in the trade-off space, and they're not one you jump to "just because".

> [...] there are useful nearby technologies that can leverage the benefits of FPGAs while having programming models that are on par with (say) GPGPU

I think CGRAs are really cool but they're even more niche, and I suspect your original point about GPUs eating everyone's lunch applies particularly strongly to CGRAs. The point is well taken, though, and I don't necessarily disagree.

> FPGA tools are just some of the lowest quality garbage out there

I think things are about to change thanks to yosys and other open source tools.

> VHDL (bad) or Verilog (worse,

VHDL (and its software counterpart Ada) are very well thought and great to use once you get to know them (and understand why they are the way they are). Yeah, they are a bit verbose but I prefer a strong base to syntactic sugar.

> VHDL (and its software counterpart Ada) are very well thought and great to use once you get to know them (and understand why they are the way they are). Yeah, they are a bit verbose but I prefer a strong base to syntactic sugar.

As a professional FPGA developer: VHDL (and Verilog even moreso) are bad [1] at what they're used for today: implementing and verifying digital hardware designs. In fact, they're at most moderately tolerable at what they were originally intended for: describing hardware.

[1] They're not completely terrible – a completely terrible idea would be to start with C and try to bend it so that you can design FPGAs with it...

So what’s better?
I've heard good things about Bluespec. It is used for Cambridge's CHERI capability architecture extensions, for example - https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/
Parts of VHDL leave a little to be desired but overall I find it to be a really great language. To the extent I bought Ada 2012 by John Barnes and I kind of like that too after coding in C/C++ etc, but maybe I'm now biased after many years of VHDL coding :) It's not uncommon to see "VHDL is bad" and such like, and I do wonder what the reasons are for those comments.
> It's not uncommon to see "VHDL is bad" and such like, and I do wonder what the reasons are for those comments.

VHDL is bad because it's bad at prototyping and implementing digital hardware [1]. One reason why it's bad at that task is the mismatch between the hardware you want and the way you have to describe it in the language. For example: You want a 32-bit register x which is assigned the value of a plus b whenever c is 0, and you want its reset value to be 25. VHDL code:

    signal x: unsigned(31 downto 0);
    ...
    process (clk, rst)
    begin
        if rst then
            x <= to_unsigned(25, x'length);
        elsif rising_edge(clk) then
            if c = '0' then
                x <= a + b;
            end if;
        end if;
    end;
The synthesis software has to interpret the constructs you use according to some quasi-standard conventions, and will hopefully emit those hardware primitives you intended. I say "hopefully", because of the many, many footguns arising from those two translation steps.

[1] Okay, I concede that in theory, there might be a use case where VHDL is perfectly suited for, which would make VHDL a not-bad language. But designing digital hardware is not such a use case.

Writing this with good intentions, not trying to start a fight...

---

There are some minor issues with your code that shows you are probably a verilog/SV guy and not an experienced VHDL guy.

Please read Andrew Rushtons "VHDL for Logic Synthesis". I also recommend you read on VHDLs 9-valued logic and why it was designed this way and how it differs from verilogs Bit.

> you are probably a verilog/SV guy and not an experienced VHDL guy

Wrong on both counts.

Please, enlighten me, what's wrong with my code? Note that it's in VHDL-2008, and the async. reset is intentional.

> I also recommend you read on VHDLs 9-valued logic and why it was designed this way

My main issue with VHDL is not the IEEE 1164 std_(u)logic, although it really doesn't help that this de-facto standard type for bitvectors and numbers (via the signed/unsigned types) is just a second-class citizen in the language – as opposed to bit and integer, which are fully supported syntactically and semantically, but which have serious shortcomings.

As someone who just said that exact thing upthread, half of it is general curmudgeonry. VHDL is not a terrible language, though it does have terrible tools. The IDE side of things is a big opportunity to improve the language. Making refactoring easier by not needing to manually touch up three different files to fix one name is a huge help. (And the IDEs have probably improved in recent times; I've done mostly hardware recently.) The compilers/synthesizers... those are vendor crud and so dragons lie there. VHDL-2008 support would go a long way to improving life....
If IDE support for basics is an issue,like consistent renaming, then language server protocol support will help:

https://github.com/ghdl/ghdl-language-server

Edit: typo in url

> The rebuttal to your objection is always tools like "HLS"

Yup. I know HLS has gotten a lot better recently but my impression is that, somewhat like fusion, HLS as a first-class design paradigm is always a decade away.

> FPGA tools are just some of the lowest quality garbage out there

Absolutely. I think the problem is vendors see FPGA tooling as a cost center and a necessary evil in order to use their real products, the chips themselves. Users are also highly technical and traditionally have no alternative, so (mostly) working but poor-quality software is simply pushed out the door. "They'll figure it out".

Finally, to expand on the difficulties imposed by physical constraints, I think another huge blocker to wide adoption is that FPGAs are physically incompatible. I cannot take a bitstream compiled for one FPGA and program it to any other FPGA. Hell, I can't even take a bitstream compiled for one FPGA and use that bitstream for any other device in the same device family. Without some kind of standardized portability, FPGAs will remain niche devices used only for very specific applications.

> cannot take a bitstream compiled for one FPGA and program it to any other FPGA.

Like considering dumping memory content on a PC and reinject it on another with different RAM layout and devices and complaining the OS and programs can't continue running? Is that a sane expectation?

There are upstream formats targeting FPGAs that can be shared, although yes redoing place and route is slow.

Should manufacturers provide new formats closer to final form yet would allow binaries that can be adjusted, kind of like .a .so or even llvm?

Alternatively, would building whole images for many families of FPGA make sense? Feels like programs distributed as binaries for p OS variants times q hardware architectures, each producing a different binary... random example https://github.com/krallin/tini/releases/tag/v0.19.0 has 114 assets.

> bitstream ... Is that a sane expectation?

No. Bitstream formats are not in any way compatible across devices. Because timing is a factor, even if you had the same physical layout of LUTs and routing, it's unlikely that your design would work.

(From parent)

> use that bitstream for any other device in the same device family

Not at the bitstream level. However, you can take a place&routed chunk of logic and treat it as a unit. You can replicate it (without repeating P&R), move it around, copy it onto other devices in the same family. This is super useful as most FPGA applications have large repeating structures, but P&R doesn't know that it's a factorable unit. It'll repeat P&R for each instance and you'll get unpredictable timing characteristics.

> Should manufacturers provide new formats closer to final form yet would allow binaries that can be adjusted, kind of like .a .so or even llvm?

> would building whole images for many families of FPGA make sense

You can license libraries that are a P&R'd blob and drop them into your design. There's no easy way to make this generalizable across devices without shipping the original RTL, and conversion from RTL->bitstream is where most of the pain lies.

> Like considering dumping memory content on a PC and reinject it on another with different RAM layout and devices and complaining the OS and programs can't continue running? Is that a sane expectation?

Even worse; it's more like that plus extracting the raw microarchitectural state of a CPU, serializing it in a somewhat arbitrary way, trying to shove that blob into a different CPU and still expecting everything to continue running.

I'm not necessarily complaining, just pointing out this significant difference WRT software programs running on CPUs.

> There are upstream formats targeting FPGAs that can be shared, although yes redoing place and route is slow.

Can you show me an example? I'd like to see this. You do not mean FPGA overlays, correct?

> Should manufacturers provide new formats closer to final form yet would allow binaries that can be adjusted, kind of like .a .so or even llvm?

Like you say, at the very least you will need to re-do place and route. But actually the problem is much worse than this. Different FPGAs have different physical resources. Not just differing amounts of logic area, but different amounts of block RAM, different DSP blocks and in varying numbers, high-speed transceivers, etc. This necessitates making different design trade-offs. Simply shoehorning the same design into different FPGAs, even if it were kind of possible, will not work well.

> Alternatively, would building whole images for many families of FPGA make sense?

Currently I think that's the only real option. But the extreme overhead, duplication of effort and maintenance burden make it very unattractive.

My napkin sketch is some sort of generalized array of partial reconfiguration regions with standardized resources in each region. Accelerator applications can distribute versions targeting different numbers of regions (e.g. one version for FPGAs supporting up to 8 regions, one for FPGAs supporting up to 16 regions, etc.). The FPGA gets loaded with a bitstream supporting a PCIe endpoint and management engine, and some sort of crossbar between regions. At accelerator load time, previously mapped, placed, and routed logical regions used in the application are placed onto actual partial reconfiguration regions and connections between regions are routed appropriately. The idea is to pre-compute as much of the work as possible, leaving a lower dimension problem to solve for final implementation. Timing closure and clock management are left as exercises for the reader :P.

> Can you show me an example? I'd like to see this. You do not mean FPGA overlays, correct?

Some of the coolest work to come out of the Chisel project is their intermediate representation FIRRTL.

Not sure why they think chip details and bitstreams need to be kept secret. If they would open up, people would make better tools for them.
Because competitors could make compatible chips.
>I think the problem is vendors see FPGA tooling as a cost center and a necessary evil

Yes to a degree, but another part of the problem is the "physical constraints" you mention. FPGA tooling has to solve multiple hard problems, on the fly, at large scale (some of the latest chips are edging up to 10M logic elements). Unfortunately for the FPGA industry, I think that this is unavoidable - though a lot of interesting work is being done around partial reconfiguration, which should allow for users to work with smaller designs on a large chip.

Well, that's an explanation for why FPGA compilation flows take so much time, but it's not a good explanation for why the software is so crap.

I think partial reconfiguration is really sexy, but it's been around for a long time. What's new and exciting there? Genuinely curious.

> HLS as a first-class design paradigm is always a decade away.

What about Chisel?

Chisel is not a HSL. Chisel is much closer to VHDL and Verilog, since the hardware is directly described.
Chisel would allow me to write say, a codec algorithm and compile it into hardware, correct? As well as specify the hardware that is necessary to describe it?

I'm a casual in that space but I thought Chisel was an HDL that could be used to support HLS.

And you do the same in VHDL and Verilog. And like in Chisel, you have to manually pipeline it and you can exactly control where registers are used and how resources are reused.

You could build something HLS like using Scala/JVM and Chisel, but Chisel itself is much closer to traditional HDLs.

https://en.m.wikipedia.org/wiki/High-level_synthesis

> These are not programming languages, they are hardware definition languages.

There's a subtle point in that Verilog/SystemVerilog and VHDL are also just not powerful languages. While parametric, they lack polymorphism, object oriented programming (excluding SV simulation-only constructs), functional programming, etc.

Your point about the abstraction being different is well taken---hardware description languages describe circuits and programming languages describe programs. However, it's exceedingly unfortunate that the industry is stuck in a rut of such weak languages and trying to explain that weakness to hardware engineers, who haven't seen anything else, runs into the "Blub paradox" (e.g., a programmer who only knows assembly can't evaluate the benefits of C++). [^1]

[^1]: http://www.paulgraham.com/avg.html

While there's plenty of room to improve a language like Verilog I fail to see how these paradigms would help me in RTL. What would polymorphism even look like in an environment without a concept of runtime? Can you elaborate and enlighten me?

Edit: Disclaimer, I'm well aware of the pros and cons of these paradigms in software development and use them plenty

(Sorry! Just saw this!)

Polymorphism makes it way easier to build hardware that can handle any possible data type. Things like queues and arbiters beg for type parameters (you should be able to enqueue any data). Without polymorphism you can make something parameterized by data width (and then flatten/reconstruct the data), but it's janky and you lose any concept of type safety (as you're "casting" to a collection of bits and then back).

There was some interesting work out of the University of Washington [^1] to build a "standard template library" using SystemVerilog. Polymorphism was identified as one of the shortcomings that made this difficult (Section 5: "A Wishlist for SystemVerilog"). [^2]

[^1]: https://github.com/bespoke-silicon-group/basejump_stl [^2]: http://cseweb.ucsd.edu/~mbtaylor/papers/BaseJump_STL_DAC_Sli...

Just let those programmers play around with Redstone in Minecraft before you hand them an FPGA. They'll understand it very quickly.