Hacker News new | ask | show | jobs
by ballen 3057 days ago
Also consider the ecosystem for these boards. Do you want your researchers dealing with the non-upstreamed support around Allwinner chips? Additionally consider time to solution, this machine is based on BitScope's Blade: http://bitscope.com/product/blade/.
2 comments

32-bit vs 64-bit is largely inconsequential when the nodes only have 1-2GB of RAM. The ecosystem I'm referring to means you can run a vanilla Linux on a RPi with no extra work. You can google a problem and have a reasonable chance of finding a solution around the RPi and so on.
From what I understand you can see a decent improvement in speed by going 64bit on ARM because they used the crossover to dump a fair bit of legacy braindamage in the instruction set that was preventing them from making certain optimizations.

On the other hand, I look at the 3000 core figure and think that it's roughly on par with high end GPUs. The clock rates aren't terribly different either. The range of applications where this beats out GPU solutions is probably fairly narrow, especially given the terrible IO bottleneck on the RPis.

For comparison, a $7,500 TITAN X has 3072 CUDA cores clocked at 1Ghz. This cluster has 3,000 CPU cores clocked at 1.2Ghz. On the TITAN card all of those cores share the same 12GB of memory with 336.5GB/s of memory bandwidth. On the cluster every 4 cores shares 1GB of memory with (I think) 3.6GB/s bandwidth. Of course communication outside of those 4 cores is restricted to 0.0125GB/s at best.

no it's not inconsequential

for one, why are _researchers_ using largely obsolete technology; for another, many high performance computing tasks perform significantly faster on 64bit (e.g., lmdb)

Performance isn't actually the objective of the Pi cluster; the people using it have a real supercomputer next door. It's a testbed so they can validate programs before transferring them to the expensive supercomputer.
is their real supercomputer 32bit? because if it's not then i'm not sure how they are validating anything
I would imagine going from a 10-node to 100-node system is more overall complicated than going from 32 to 64. Sure the instructions change, but that should basically be all abstracted away by the toolchain. However job management, allocation, data logistics, queues, cache invalidation, bottlenecks, etc, are all key issues that compound non-linearly with scale.
Chemists still use thin-layer chromatography, a technique 100 years old, day-in-and-out, in every lab in the world, even when HPLC and NMR exist. Why? It's cheap, fast, and works well enough.
if we are 'considering the ecosystem' then that means these are running a 32bit os, as mentioned in my original comment