| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fat0wl 3668 days ago

This is true but keep in mind that you have to set up a pipeline for collecting the data into HDFS (Storm? batch loading?) & you have to pay for the machines.

So while your analysis is valid, there are more "costs" at play like developer time, cluster maintenance, hardware. I like to play with Spark's ML libraries but am wary about designing projects specifically around them because of this overhead, especially when trying to distribute some API/tech that you'd like others to use.

Not trying to be a downer, I actually wish the choice to go distributed was more of a no-brainer, hah. Would love for some APIs to emerge that could be used locally/distributed transparently without actually having to run a dummy cluster & data migration to run locally.