Hacker News new | ask | show | jobs
by rthomas6 3292 days ago
Another HW FPGA guy here. Albeit one who has never used HLS. My concern with the whole idea of HLS is that it fails to take advantage of the parallelization capability of FPGAs, which in my opinion is one of the main reasons to use an FPGA in the first place. It sounds great for designs that are linear in nature, that is, putting data through a bunch of sequential processing blocks and then outputting some result. But for most of those cases, why not just use a processor + DSP SoC? Or even something like a Zynq? It will probably be faster.

Seeing how FPGAs do not operate in a linear way the way that software does on a processor, why are we trying to make them work that way? It would make more sense to me to design a high-level synthesis language with a paradigm that is also not imperative: functional programming. Like, for example, how would this kind of C code even be synthesized in hardware?:

  A = 5;
  B_out = A + 3;
  A = 6;
  C_out = A;
"A" is used as two different things, which is totally fine when the code is run sequentially, which must be what is happening when code like this is synthesized, but that's wasteful on an FPGA, because B_out and C_out don't actually have dependence on each other and could be computed concurrently, which is what would happen if we used VHDL to do something similar. We need a high-level synthesis language that describes a system which solves the algorithm we want, the same way VHDL does, except with more abstraction capabilities. In my opinion this could be a functional language.
1 comments

I agree about the parallelism but you have to understand the design methodology.

Your example is somewhat pointless. The code is written to create the HW not the other way around. I can't feed it just any crap.

You want parallelism you have to code it.

Zynq would actually be what I use! You start with SW. The ARM core is not that quick. You will use the FPGA to accelerate the tough parts. You may think you will have throughput issues but you have options via the high performance AXI ports. Your FPGA modules access the data in memory via DMAs.

KNOWING what part of the algorithm you need to accelerate actually suits FPGAs, you grab the HLS and start coding.

You have to look at some of the libraries to understand what abstraction level you are working at: https://www.xilinx.com/products/design-tools/vivado/integrat...

Video, matrices, linear algebra, encoders/decoders. Etc. I can string them together in the same way I would string HDL IP.

The advantage is I can run the algorithm in C++ first and test it all, under the assumption that the HLS library has the equivalent HW version for synthesis.

There is still a lot of HW work involved. For instance in your example with A used twice. One module would calculate B_out by reading A prior to changing its value then you would have to start the C_out module. You would need a way to coordinate the two modules to share the same memory at A. But they would be running in parallel, just not started at the same time.