Hacker News new | ask | show | jobs
by vonmoltke 3670 days ago
> The only advantage of clustered systems like Spark, Hadoop, and others is aggregate bandwidth to disk and memory.

No it isn't. There are plenty of CPU-bound tasks that run much faster if the work is distributed in parallel across multiple machines. We use Hadoop at my company primarily for that reason.

1 comments

Can you say something more about your workload? Ever since dipping my feet in the MPI pool, I've wondered what kind of problems really lend themselves to running in parallell across multiple machines these days - assuming a single machine has at least 32 hardware threads.

I know there are modelling work loads, like weather forecasting and analysing seismic data - but curious what kind of work you are doing?