I've been working with FPGAs for years (in hobby, at work I'm a mere "user" of them), and it always baffled me how poorly matched the chosen imperative paradigm of Verilog and VHDL is to them.
I think the idea was to make it look "familiar" to engineers by looking like C (Verilog) or Ada (VHDL). But FPGAs are nothing like CPUs, and what you end up with instead is an unfitting language where you use a whole lot of "common constructs", knowing how they will be synthesized into hardware. And worse: Practically no good way to do abstraction.
Functional languages are a much, much better match, because that's what FPGAs are: Combining functions together. This works on higher orders as well, and it works well with polymorphism!
So privately at least, for anything substantial I've since been using Clash, which is essentially a Haskell subset translated to Verilog or VHDL: https://clash-lang.org
The learning curve is steep, it definitely helped that I was already proficient in Haskell. But then the code is so enormously concise and modular, and I now have a small library of abstractions that I can just reuse (for example, adding AXI4 to my designs). It's a joy.
There are other options as well: nMigen (python), Chisel (Scala), SpinalHDL (Scala), Bluespec (Haskell).
It's worth taking a quick glance at all of them (maybe making a toy or two) to see if any of them strike your fancy. Being able to do metaprogramming and parameterization in an actual programming language rather than a not-quite C-preprocessor pass makes a lot of things easier.
Just about anything feels "nicer" to me than VHDL or Verilog, but I particularly like nMigen, and find SpinalHDL to be at least somewhat readable.
Thanks, I will give them a try. Right now I've done enough with Clash to feel very comfortable with it. Having already done a lot of Haskell before helps, because I tend to know what's available in general. You can use almost the entirety of Haskell's functional toolbox, including Monads and Arrows. But it certainly won't hurt checking all the other options out.
I agree, metaprogramming and parameterization (both enabled by polymorphism) is the key, and also what enables me to just "plug in" a function that essentially implements e.g. an AXI4 enabled register, or a whole burst transfer.
I shudder having to do that by hand every time in Verilog/VHDL now, tightly coupled to the rest of the logic.
Fully agreed. As an amateur, just trying to get into the FPGA world, my experience was very similar. I've recently discovered nMigen [1] (in a very nice series building Motorola 6800 [2]) which approaches this by using Python and overloading type operators, something like the TF dataflow model and that feels more natural.
However even for nMigen, and doubly so for Verilog/VHDL, it feels very 90s when it comes to engineering practices. The tooling is often lacking (module management, automated testing, CI and so on), cryptic acronyms are used everywhere as if each source code byte cost a LUT, you have to keep various things (eg. signal bit widths) in sync in multiple places and many more things which make C99 feel modern.
I'll have to introduce Clash, it seems like a big step in the right direction - thanks for mentioning it.
I think you're conflating two different things, and although you've identified a real problem, that conflation is causing you to blame the wrong thing for it.
Verilog has a vaguely C-ish syntax, and VHDL has a vaguely Ada-ish syntax, but in my experience neither of them feel "imperative" in any real way once you get a handle on them. The issue with the syntax appearing imperative is superficial: imagine adding a C-like syntax to Haskell, perhaps to spur adoption the way Reason did for Ocaml - it wouldn't make it any less functional.
The real issue isn't functional vs. imperative - it's just that VHDL and Verilog are still painfully primitive in terms of their capacity for abstraction.
How do you add C syntax to Haskell without resorting to monadic syntax? (But then your expression is, well, in a Monad, and does not have the same type anymore.) There isn't even any inherent sequencing (much unlike ocaml), so would a function just be some long C statement?
No, I stand by what I said, the syntax itself is essentially imperative, and it is a bad fit. A purely functional language (like Haskell, not like ocaml) with a C-like, so imperative, syntax would be a bad fit as well. And FPGA designs are naturally pure functions within an applicative functor, which languages like Clash demonstrate well.
> I think the idea was to make it look "familiar" to engineers
I'm not so sure. I'm fairly certain that at least VHDL existed before FPGAs were invented. I think they are programming language-like so that they can be somewhat easily verified for syntax errors and undergo some static analysis.
You are 100% correct that these languages don't match up with the type of work you use these languages for. I find C# and Go (and other languages) to be pretty intuitive in a lot of ways, and HDLs are the exact opposite of intuitive, to me. I am sure it would be easier for me to learn to speak, write, and read Icelandic than it would be for me to learn VHDL or Verilog.
Languages don't have to look like C or Ada in order to do syntax checking and static analysis on them. In fact, the latter arguably works better on languages like Haskell...
The Bluespec Language is another Haskell-like that compiles to hardware, but I think it's a bit more mature, too. It also has a second non-Haskell-like syntax to avoid scaring those used to Verilog.
I disagree that the learning curve is steep. I had no functional programming experience and no FPGA/hardware design experience and jumped straight into it. The doc is really nice and within a week I could make my own basic circuits.
Dataflow languages might be an even better fit. But I don’t know if there are any HDL languages based on dataflow and if they provide useful means of abstraction.
Funny, it looks like the practical work we had to do in our CS master class "advanced digital systems design", something like 20 years ago, on a nowadays archaic XC4013 FPGA... (including the VGA part).
We had a vaguely similar lab at Cambridge. We were given a mostly-implemented MIPS machine in Verilog (IIRC we just had to write the code that visualised the memory on screen), and had to implement GoL to run on the MIPS core.
Very nice! I would quibble with this bit: Verilog initially focused on describing the hardware–very close to what could be expressed by conventional schematic–and later added general-purpose programming elements to create more complex components.
The concept here is 'inference' or 'synthesis' and it is the fundamental difference between and HDL and an imperative programming language. When you write general purpose statements in an HDL, the tools have to infer what sort of hardware would give you that behavior, and in a funny twist, the more you lean on high level language features the more likely you are to run into something that cannot be synthesized into hardware gates. Perfectly valid HDL with no valid solution!
I implemented Conway’s game of life in VHDL to run on a Xilinx FPGA board in my digital electronics class in college - interestingly, I think the hardest part was actually the HDMI driving.
Both of these approaches lose to a CPU. The state of the art algorithm is Hashlife [1], which compresses both time and space, and can evaluate billions of generations on a grid of trillions cells in milliseconds.
The GPA approach is really efficient at what it does but ultimately it doesn't scale. For one, it needs 1 bit per cell in the 2D torus, but FPGA have kilobytes or low-megabytes amounts of memory. That makes it hard to simulate a 10,000 x 10,000 grid, let alone a 1,000,000 x 1,000,000 grid. For two, the FPGA explicitly calculates each iteration one-by-one. This is pretty fast in the beginning, and it means you can use it to calculate a billion iterations in a few seconds or a trillion iterations in a few hours, but you can't scale past that.
Hashlife can probably be sped up by GPUs a bit, but it processes a symbolic representation and consequently is quite suited to CPUs. It spends a lot of its time doing hash table lookups (hence the name) which is not a good fit for GPUs and a terrible fit for FPGAs.
This reminds me of how I was fascinated by N-body simulations and fractals in high school, and then later found out there are much better ways of calculating both gravity and the Mandelbrot set than the obvious ones.
(i.e. tree methods like Barnes-Hut for gravity and perturbation theory for Mandelbrot)
GPU L0 cache latency IIUC is ~20x higher than CPU. In fact in this case I think GPU would have to use L2 cache since the data is shared across so many cores, so now you're talking ~50x. So even if you get full parallelism of cell computation you can plug in the numbers and find it would be far slower than FPGA (but still faster than CPU).
I'm not an expert though. Maybe GPUs have some way of mitigating the high cache latencies.
The main trick GPUs use is having a massive amount of hardware threads per actual core. If an instruction in one thread is stalled on a load operation, the core will just switch to another thread. If you have more runnable threads at all times than your memory latency in cycles, the latency will not affect your throughput anymore.
Considering that the cell of each next generation can be individually calculated in parallel, I don't think a GPU implementation would be able to beat it. A GPU can have many pipelines and quickly process many "pixels" simultaneously but it will only be able to parallelize all of them for very small screen sizes.
I love that Conway published a request for proof that infinite games existed, or something to that effect. And the proof turned out to be known as a Gun structure (iirc). A Gun is essentially a starting condition that leads to new projectiles continuously getting launched into the world, thus creating infinite (and non repeating?) games.
I think the idea was to make it look "familiar" to engineers by looking like C (Verilog) or Ada (VHDL). But FPGAs are nothing like CPUs, and what you end up with instead is an unfitting language where you use a whole lot of "common constructs", knowing how they will be synthesized into hardware. And worse: Practically no good way to do abstraction.
Functional languages are a much, much better match, because that's what FPGAs are: Combining functions together. This works on higher orders as well, and it works well with polymorphism!
So privately at least, for anything substantial I've since been using Clash, which is essentially a Haskell subset translated to Verilog or VHDL: https://clash-lang.org
The learning curve is steep, it definitely helped that I was already proficient in Haskell. But then the code is so enormously concise and modular, and I now have a small library of abstractions that I can just reuse (for example, adding AXI4 to my designs). It's a joy.