for one, why are _researchers_ using largely obsolete technology; for another, many high performance computing tasks perform significantly faster on 64bit (e.g., lmdb)
Performance isn't actually the objective of the Pi cluster; the people using it have a real supercomputer next door. It's a testbed so they can validate programs before transferring them to the expensive supercomputer.
I would imagine going from a 10-node to 100-node system is more overall complicated than going from 32 to 64. Sure the instructions change, but that should basically be all abstracted away by the toolchain. However job management, allocation, data logistics, queues, cache invalidation, bottlenecks, etc, are all key issues that compound non-linearly with scale.
Chemists still use thin-layer chromatography, a technique 100 years old, day-in-and-out, in every lab in the world, even when HPLC and NMR exist. Why? It's cheap, fast, and works well enough.