Probably going to mirror the transition from single-threaded to multi-threaded compute. Took a while until application architectures took hold of the populous to utilize multi-core.
Probably not. Multicore has been a thing for 30 years (We had a 32 core Sequent Systems and a 64 core KSR-1 at UW CS&E in the early 1990s). Everything about these models has been developed in a multicore computing context, and thus far, it still isn't massively-parallel-distributable. An algorithm can be massively parallel without being sensibly distributable. Change the latency between compute nodes is not always a neutral or even just linear decrease in performance.