Hacker News new | ask | show | jobs
by anfilt 3062 days ago
So an FPGA has an area to speed trade off. So if you use a lot area things are generally quicker. The main reason is an FPGA has really fine grain parallelism. A GPU might have ~2000 simple cores. If you consider one these Ultrascale+ FPGAs likely have have over a million logic cells. The one in this post has 2852K logic cells. So if we were computing just basic logic we have over 1000 times more parallelism than the GPU. However, most problems are not basic logic, and the FPGA does not have enough IO pins for that unless we combined the results. It also excludes built ins like adders, DSP cells ect... However, that's the general idea fine grain parallelism.

So if your looking for a solution that will perform faster on FPGA you are going to want something that is simple but you need compute often. That way you can duplicate it 1000s of times on the FPGA. An other place an FGPA excels is data that quite long. Compare 32 bit number to 1024 bit number. You could do what ever your doing to the 1024 bit number in one pass with an FPGA. However, the GPU's native int size is probably 32 bits. So to just perform one operation the GPU has to perform at least 32 operations for that one number. So that overhead has to be carried around for every operation a GPU core would have to perform. The HBM added to this FPGA makes it even better in cases like this. That's just the general idea though.

So if what you are doing can take advantage of the FPGA's strengths you could come out with a much faster solution. Also there is power usage. If your going to be building clusters to perform what ever processing you need and the FPGA performs about the same as GPU you will use less electricity.