Hacker News new | ask | show | jobs
by groups 4593 days ago
I genuinely don't understand what "'at best' 'rather' 'unexpected'" means.
1 comments

I'm not sure either! Either way, implementing a clustering algorithm on a GPU was thinking out of the box for us :)
What it means is I'm hesitant to believe an 18000x speedup.
This was my reaction as well. I think there should be a whitepaper explaining the comparison as many GPU/FPGA application acceleration companies tend to do. I would say a typical GPU speedup would be in the 10-20x range with 100x being possible for highly "regular" data parallelism. I have no doubt the GPU wallclock time is correct so I am guessing the CPU implementation is just very poor.
We used parallel Python to implement the first version of the clustering algorithm. Although the CPUs weren't particularly beefy, the implementation was by no means poor. It's worth mentioning that the laptop GPU wasn't powerful either.
In that case, how much of your 18000x speedup is due to the GPU and how much of it is due to re-implementing the algorithm in a compiled language? Python->C/C++/etc is a 100x-1000x speedup right off the bat.

That's not a fair CPU/GPU comparison at all.

You're right, it isn't a perfectly fair comparison. However, even if we had coded it in C++, the run time difference would still have been significant.

We're just sharing our story at this point. Next, we'll be sharing our whitepaper and data. We rather start gathering feedback now, as opposed to after writing 100k LoC for the compiler.

Ah, python... That explains it.