Hacker News new | ask | show | jobs
by fenollp 3231 days ago
This looks like the slowest routines are FFT and GEMM (CPU bound). I wonder if one can find DSPs easily for racked servers. Maybe hardware h264 encoders can be repurposed that way? I obviously don't know what I am talking about! Would an FPGA implementation accelerate execution?
1 comments

Yes, FPGA can help, as can GPU.

The real problem tends to be the (CPU to other thing and back again) latency, not the how fast can the other thing do the computation.