|
|
|
|
|
by tdba
1564 days ago
|
|
Thanks! The general answer is that it depends on your model and on which FPGA platform we're talking about, but in a head-to-head benchmark test you'll find results in the ballpark of 2-10x CPU and 0.5-2x GPU. As you point out, the power and cost are big differentiators. The other thing to consider is (as another commenter mentioned) that usually inference on CPU or GPU will require you to do some model quantization or compression, which can degrade model accuracy. Tensil can give you a way around that dilemma, so that you can have great performance without sacrificing accuracy. |
|
The only time I had to reach for quantized (integer) networks to do anything at all was inferencing on FPGAs. Are you targeting dsp slices by default or implementing full ieee754 floating point by default?
Are you saying that with Tensil you can run single precision non-quantized models with up to 2x gpu perf?
I probably misunderstood your last sentence, sorry.
Genuinely curious!