Hacker News new | ask | show | jobs
by liscio 6056 days ago
It's probably the algorithm in question that's to blame, in conjunction with the slow OpenCL implementation for the 8800GT, as you found.

On my machine (I'm the article's author), even Apple's GPU-tuned version of Galaxies runs much faster on the Mac Pro's CPUs than the GPU. So, something's up. I think only the GTX285 for the Mac Pro beats out the CPUs on that test, but I could be wrong...

The 1-2 seconds of overhead could also be contributed to by the compilation of the OpenCL program for the GPU, as I do a compile of the .cl kernel on each run of the program.

Furthermore, I wasn't very scientific about the GPU case, because I wasn't planning to ship a GPU-tuned algorithm. To actually pull this off for a consumer app is easier said than done.

For instance, I'd prefer not to ship the .cl kernel in the application, and would rather provide binary-compiled kernels. Doing this for >1 flavor of GPU is nontrivial, from what I gather, as I'd have to actually own the GPUs in question to get compiles for the different targets (I could only cover the GeForce 9400M, and 8800GT from my own collection of hardware).

That said, I still want to stay open to the idea in the future as I play around with the algorithm, and understand it further.

Thanks for the nudge, though. I really should dig deeper.