|
|
|
|
|
by fenollp
3231 days ago
|
|
This looks like the slowest routines are FFT and GEMM (CPU bound).
I wonder if one can find DSPs easily for racked servers. Maybe hardware h264 encoders can be repurposed that way?
I obviously don't know what I am talking about! Would an FPGA implementation accelerate execution? |
|
The real problem tends to be the (CPU to other thing and back again) latency, not the how fast can the other thing do the computation.