|
Even years after Amazon started selling a lot more than just books, if people were asked "what does Amazon sell?", the answer was often "books." I looked at YARN now. I've not heard of it before. It doesn't look like it has anything to do with the topic at hand. How would one build an explicit solver for a 1D diffusion equation, corresponding to the examples given in the "HPC is dying, ..." article, using YARN? How do you do checkpointing so you can restart your 10 million atom simulation should there be a system fault after 2 weeks of run-time? (Checkpoints need about 220 MB; each atom has an x,y,z position as well as a vx,vy,vz velocity vector. Also, it needs to be at the same timestep across the entire distributed machine.) Instead, it looks like YARN is designed for service-based components, where the components are relatively independent from each other, and where failure recovery is mostly a matter of starting a new service and resending the request. If my understanding is correct, then it's certainly more capable than map-reduce. But not in a direction that's relevant for most current HPC. |
It is analogous to a set of Docker containers distributed across nodes. The same methods you would use synchronize state in that situation you could use with YARN. For example using a persistent distributed system e.g. Hazelcast to handle system failures and checkpointing.
I am not saying this is some amazing solution to every HPC problem only that Hadoop is far, far more flexible than many people give it credit for.