|
|
|
|
|
by cornellphds
4047 days ago
|
|
This is classic case of "Algorithm/Problem Selection" if your algorithm/problem is tailored to a task such as PageRank, surely a single threaded highly optimized code will beat a cluster designed for ETL tasks. In real organizations where there are multiple workflows/algorithms, distributed systems always win out. Systems like Hadoop take care of Administration, Redundancy, Monitoring and Scheduling in a manner that a single machine cannot. Sure you can "Grep" faster on a laptop than AWS EMR with 4 Medium instances, but in reality where you have 12 types of jobs which are run by team of 6 people, you are much better off with a distributed system. |
|
There's no a single simple answer, but sure, whenever less computers are enough, less should be used.
The recent problem is, some people love "clouds" so much today that they push there the work that could really be done locally.