OpenCL vs CUDA is pretty boring debate, both run on the same hardware and so have similar performance. Difference is in the tooling and ecosystems, you can run OpenCL on FPGA's for example.
Out of curiosity, has anyone successfully deployed some OpenCL code across very different platforms?
It seems neat that it'll compile and run on a GPU, CPU or FPGA, but it seems like code written for one style of architecture would be appallingly slow on the others.
It seems neat that it'll compile and run on a GPU, CPU or FPGA, but it seems like code written for one style of architecture would be appallingly slow on the others.