Hacker News new | ask | show | jobs
by jandrese 3057 days ago
From what I understand you can see a decent improvement in speed by going 64bit on ARM because they used the crossover to dump a fair bit of legacy braindamage in the instruction set that was preventing them from making certain optimizations.

On the other hand, I look at the 3000 core figure and think that it's roughly on par with high end GPUs. The clock rates aren't terribly different either. The range of applications where this beats out GPU solutions is probably fairly narrow, especially given the terrible IO bottleneck on the RPis.

For comparison, a $7,500 TITAN X has 3072 CUDA cores clocked at 1Ghz. This cluster has 3,000 CPU cores clocked at 1.2Ghz. On the TITAN card all of those cores share the same 12GB of memory with 336.5GB/s of memory bandwidth. On the cluster every 4 cores shares 1GB of memory with (I think) 3.6GB/s bandwidth. Of course communication outside of those 4 cores is restricted to 0.0125GB/s at best.