Hacker News new | ask | show | jobs
by odmkSeijin 3333 days ago
I programmed FPGAs using both VHDL and Verilog for many years. Recently I have started at a start-up where we predominantly program using C++ HLS. I never want to go back to full-time HDL again. We have found it is possible to get the same performance as carefully written RTL, but you still have to write with the underlying device architecture in mind. There are advantages with HLS, simulation is vastly faster, and C++ templates can be used. This makes it easy to try many iterations and find clever optimizations. If you try to do the same with HDL it would be a nightmare with a large design. More people should move to HLS and push for the tools to improve. the world would be a better place.
2 comments

Be honest though, HLS works well for DSP-like applications, not for anything else. Not every digital design is image processing and if it works well you're still sacrificing ~20% of your LUTs.
HLS lets the compiler optimize your layouts for you more, since you're expressing your algorithm's goals at a much higher level (and have left more actual implementation details unspecified). I've implemented the same algorithm in both HLS and Verilog and with a few pragmas the HLS code, the result was infinitely more tweakable on the space/speed gradient - I was pretty much able to utilize the maximal space I could to achieve maximal execution speed - and limited to the same area I had used for the verilog implementation, it had almost identical performance.

It's also incredibly time saving in that with HLS you can actually rapidly unit test your designs (outside of a simulator), and you can usually get a codegen for bindings for the SoC you're designing for so you don't have to write any headers/interface code for the onboard processor yourself.

IMO, people who don't think HLS is a win either have years of experience invested in older toolchains and are more productive with them (and don't see HLS as worth their time), or haven't actually tried it.

I've mostly used a slightly hamstrung C++ as a HLS language with a Xilinx SoC. C++ is infinitely nicer than an HDL, but I can't help but feel that Rust's safety model more closely matches how an HLS wants to structure its invariants, so I feel like it could be an excellent HLS language in the future, given the chance.

Having worked with a few variations that promised gold and delivered very little, allow me disagree.

For complex designs I always came back crawling to vhdl and systemverilog.

(Jury is still out on Chisel, but it doesn't look good right now. Looks more like it was designed by CS folks who just didn't get modern hardware design)

I don't agree with this, but I think I see what you are trying to say. I think typically a full design would include some interface portion with maybe DMA or PCIe or whatever that would be done in HDL, and maybe a processor or not. HLS works for the processing that is done inside the FPGA. If this is what you mean by DSP-like, then sure, but it does not have to be image processing, it could be anything done in fabric. It is possible to write a specific truth-table, and similar basic elements in C++, so why would you be forced to do any specific type of application? Where did you get %20 percent number? At least for Xilinx HLS, I don't think that has anything to do with anything (maybe for some other compiler??). If you take some generic C or C++ code and try to put it in an FPGA, the number will be more like 80%. The results will be horrible. On the other hand, if you write code in a way that naturally maps to the hardware you are using then the results can be every bit as good RTL. But this is not necessarily easy to do. I think that this has more to do with the quality of the current compilers, not some inherent limitation with the concept. I like HDL, but I would much much rather program in C++. It is a more sophisticated language. I think Intel/Altera is supposed to release some HLS tool, and I know there are others that I haven't tried. What I am saying is that it would be nice if enough effort was put into these tools to not have to worry about whether there are limitations. Even more so since I think the newer C++ standards are moving towards multi-threading/concurrency.
> Where did you get %20 percent number?

I get the 20% number from a real world case, guys who converted a huge existing vhdl design into HLS with the help of several Xilinx FAEs, the application was ideal for HLS.

> On the other hand, if you write code in a way that naturally maps to the hardware you are using then the results can be every bit as good RTL

You only believe this if you're deep in the Xilinx marketing bubble. HSL covers maybe ~20% of the usecases of FPGAs. Even the guys who teach HLS will not tell you it's a general solution.

> I think that this has more to do with the quality of the current compilers, not some inherent limitation with the concept.

This concept has been researched for more than 25 years, C to FPGA has failed except for the aforementioned case. Btw, I'm not saying that a general high-level synthesis solution isn't possible, I'm saying that it should never be based on C or C++.

Forget about Xilinx marketing. I am curious, what did you find causes a design fall outside the '20%' usecase situation? Are you talking about asynchronous clocks? feedback? IO configurations? or what? Why do you say HLS should not be based on C++? Is this related to concurrency or something else? I am not disagreeing with you necessarily. I would say that C++ has the good/bad quality that it is possible to express the same thing 20 different ways. Also, the behavioral simulation is vastly faster. Given that a design has to be done in a limited amount of time realistically, there is an advantage to being able to iterate rapidly and make many structural changes to optimize a large design for both area and clock frequency. Only trivial designs or very specific blocks would be hand placed. I want the compiler to do register re-balancing and other optimizations. The same way that almost no-one could beat the performance of a modern C compiler by typing in machine code. Definitely Vivado is not there yet, but it should be.
There are many examples: a pci-express bus or an application optimized ddr controller or a full tcp/ip stack or a caching/prefetch system or any advanced processor with feedback .... these kinds systems require precise control.

It can be done but all the advantages of HLS are gone. The code is filled with a ton of pragmas that make the code unreadable and a lot longer than the VHDL or SV equivalent.

Register-rebalancing (other companies call it retiming) is a very old technique. You can do it with SV & VHDL, just add delays & the synthesizer will know what to do. Vivado has caught up with the solutions from Altera but there are better (more expensive) synthesizers that easily beat both, the have supported this feature for at least 15 years.

Uh, yes, that is all true (except maybe the processor with feedback bit is debatable..). I agree with all this, and yet my arguments for why C++ HLS is a good thing remain the same.
What tool do you use for this? Also, are you targeting ASICS or FPGAs? I think for ASICs there are probably a ton of custom non-public tools that use C++, one is for example described in http://scale.eecs.berkeley.edu/papers/krashinsky-phd.pdf, the Mill Architecture people seem to plan the same, but I haven't seen any in the open so far.