| HN Mirror

I agree about the parallelism but you have to understand the design methodology.

Your example is somewhat pointless. The code is written to create the HW not the other way around. I can't feed it just any crap.

You want parallelism you have to code it.

Zynq would actually be what I use! You start with SW. The ARM core is not that quick. You will use the FPGA to accelerate the tough parts. You may think you will have throughput issues but you have options via the high performance AXI ports. Your FPGA modules access the data in memory via DMAs.

KNOWING what part of the algorithm you need to accelerate actually suits FPGAs, you grab the HLS and start coding.

You have to look at some of the libraries to understand what abstraction level you are working at: https://www.xilinx.com/products/design-tools/vivado/integrat...

Video, matrices, linear algebra, encoders/decoders. Etc. I can string them together in the same way I would string HDL IP.

The advantage is I can run the algorithm in C++ first and test it all, under the assumption that the HLS library has the equivalent HW version for synthesis.

There is still a lot of HW work involved. For instance in your example with A used twice. One module would calculate B_out by reading A prior to changing its value then you would have to start the C_out module. You would need a way to coordinate the two modules to share the same memory at A. But they would be running in parallel, just not started at the same time.