|
|
|
|
|
by CthulhuOvermind
4065 days ago
|
|
Interesting to see this here. I did my masters thesis on this sdk the past September. We compared a neural network in native C to a CPU opencl implementation, and a FPGA implementation. The FPGA had about 8-10 times the kernel performance of a i7-2600k for the task. Interesting enough, what caused the jump in performance was the capability to have memory close to the kernel, with enough capacity to handle the kernel demands. The CPU was capped on what the ram-cpu bandwidth was, around 21gigs, however, the slower pci-e FPGA did not suffer, because of FPGA implemented memory could hold the necessary data at hand. Hence I sent the data to the kernel asynchronously, then a kernel with around 120 parallel implementations would operate and feed back the data through pci-e. Having OpenCl certainly reduced dev time by around 85% id say. And that's from someone fluent with verilog, who didn't know openCL before doing this. |
|