|
|
|
|
|
by hkhall
3619 days ago
|
|
As this thread is filled with people that know way, way more about CUDA and OpenCL than myself I hope that you will indulge me a serious question: I get that graphics cards are great for floating point operations and that bitwise binary operations are supported by these libraries, but are they similarly efficient at it? Some background: I occasionally find myself doing FPGA design for my doctoral work and am realizing that the job market for when I get done may be better for me if I was fluent in GPGPU programming as it is easier to build, manage, and deploy a cluster of such machines than the same for FPGAs. My current problem has huge numbers of XOR operations on large vectors and if OpenCL or CUDA could be learned and spun up quickly (I have a CS background) I may be inclined to jump aboard this train vs buying another FPGA for my problem. |
|
Throughput of integer operations ranges between 25% and 100% of floating point FMA performance. 32-bit bitwise AND, OR, XOR throughput is equal to 32-bit FMA throughput.