| I understand your last paragraph. Looking this time at Hazelcast, what I see is layers of code to understand before being able to do something simple. It really does look like all of the technology you are pointing to is solving a different problem. It's not related to any of the HPC needs I've heard of. Parts of my simulation are out of phase. I need some gather step to collect the data from individual nodes, when a given timestep is reached, and save the state. A simple solution is to do a barrier every ~30 minutes, send to the master node, and have it save the data. When I look at Hazelcast I see what looks to be a different sort of clustering - using clusters for redundancy, and not for CPU power. Eg, I see "Hazelcast keeps the backup of each data entry on multiple nodes", and I think "I don't care." If a node goes down, the system goes down, and I restart from a checkpoint. It's much more likely that one of the 512 compute nodes will go down than some database node. I'll withdraw my original statement that "A map-reduce system like Hadoop" and say simply "a system like Hadoop isn't a good fit for HPC problems". Here's a lovely essay which agrees with me ;) http://glennklockwood.blogspot.com.au/2014/05/hadoops-uncomf... . It considers the questions: > Why does Hadoop remain at the fringe of high-performance computing, and what will it take for it to be a serious solution in HPC? |
Sorry, I'm not involved in HPC at all. I know a little bit about Hadoop. I'm mostly interested in building online message processing and blended real-time/historical analytics. Our problem domain wouldn't want to lose all capacity if part of the system became unavailable.