| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gluggymug 3289 days ago

As a long time HW FPGA guy, I think you might want to take a look at the C projects again. I don't know whether Go has any advantages but the concept of a higher level language for development is being used by the major FPGA companies.

Both Xilinx and Altera have High Level Synthesis (HLS) tools. These use C or C++. If you know how FPGA work is generally done, you can separate the hype from the reality and you can understand how to use it for a real application.

The vendors have lots of libraries for IP. You don't write RTL from scratch. It would take too long to verify. You tie IP together. It can be DSP or generic maths or a video codec thing. The VHDL is done for you.

You write your algorithm in C++ in a particular format using compatible data types and calling HLS libraries. You run it all in C++ first and make sure it does exactly what you want in SW. This is where the algorithm is developed.

THEN you fire up the HLS tool and a couple of hours of synthesizing later (lol) you get to load a bitstream onto a FPGA to verify it.

Of course there can be problems in that translation. It takes good engineering to dive down into the design and find the issues.

My current work does not touch any HLS. I am doing the VHDL stuff. But I know the algorithms all started from SW first. It always does. For the bulk of the work, verification, it is somewhat irrelevant whether it is manually converted to RTL or done via tools.

1 comments

rthomas6 3288 days ago

Another HW FPGA guy here. Albeit one who has never used HLS. My concern with the whole idea of HLS is that it fails to take advantage of the parallelization capability of FPGAs, which in my opinion is one of the main reasons to use an FPGA in the first place. It sounds great for designs that are linear in nature, that is, putting data through a bunch of sequential processing blocks and then outputting some result. But for most of those cases, why not just use a processor + DSP SoC? Or even something like a Zynq? It will probably be faster.

Seeing how FPGAs do not operate in a linear way the way that software does on a processor, why are we trying to make them work that way? It would make more sense to me to design a high-level synthesis language with a paradigm that is also not imperative: functional programming. Like, for example, how would this kind of C code even be synthesized in hardware?:

  A = 5;
  B_out = A + 3;
  A = 6;
  C_out = A;

"A" is used as two different things, which is totally fine when the code is run sequentially, which must be what is happening when code like this is synthesized, but that's wasteful on an FPGA, because B_out and C_out don't actually have dependence on each other and could be computed concurrently, which is what would happen if we used VHDL to do something similar. We need a high-level synthesis language that describes a system which solves the algorithm we want, the same way VHDL does, except with more abstraction capabilities. In my opinion this could be a functional language.

gluggymug 3288 days ago

I agree about the parallelism but you have to understand the design methodology.

Your example is somewhat pointless. The code is written to create the HW not the other way around. I can't feed it just any crap.

You want parallelism you have to code it.

Zynq would actually be what I use! You start with SW. The ARM core is not that quick. You will use the FPGA to accelerate the tough parts. You may think you will have throughput issues but you have options via the high performance AXI ports. Your FPGA modules access the data in memory via DMAs.

KNOWING what part of the algorithm you need to accelerate actually suits FPGAs, you grab the HLS and start coding.

You have to look at some of the libraries to understand what abstraction level you are working at: https://www.xilinx.com/products/design-tools/vivado/integrat...

Video, matrices, linear algebra, encoders/decoders. Etc. I can string them together in the same way I would string HDL IP.

The advantage is I can run the algorithm in C++ first and test it all, under the assumption that the HLS library has the equivalent HW version for synthesis.

There is still a lot of HW work involved. For instance in your example with A used twice. One module would calculate B_out by reading A prior to changing its value then you would have to start the C_out module. You would need a way to coordinate the two modules to share the same memory at A. But they would be running in parallel, just not started at the same time.