|
|
|
|
|
by threeseed
4086 days ago
|
|
YARN is just a resource manager on top of which Hadoop jobs are run e.g. Hive, Pig. It is analogous to a set of Docker containers distributed across nodes. The same methods you would use synchronize state in that situation you could use with YARN. For example using a persistent distributed system e.g. Hazelcast to handle system failures and checkpointing. I am not saying this is some amazing solution to every HPC problem only that Hadoop is far, far more flexible than many people give it credit for. |
|
Parts of my simulation are out of phase. I need some gather step to collect the data from individual nodes, when a given timestep is reached, and save the state. A simple solution is to do a barrier every ~30 minutes, send to the master node, and have it save the data.
When I look at Hazelcast I see what looks to be a different sort of clustering - using clusters for redundancy, and not for CPU power. Eg, I see "Hazelcast keeps the backup of each data entry on multiple nodes", and I think "I don't care." If a node goes down, the system goes down, and I restart from a checkpoint. It's much more likely that one of the 512 compute nodes will go down than some database node.
I'll withdraw my original statement that "A map-reduce system like Hadoop" and say simply "a system like Hadoop isn't a good fit for HPC problems".
Here's a lovely essay which agrees with me ;) http://glennklockwood.blogspot.com.au/2014/05/hadoops-uncomf... . It considers the questions:
> Why does Hadoop remain at the fringe of high-performance computing, and what will it take for it to be a serious solution in HPC?