Hacker News new | ask | show | jobs
by dxbydt 4839 days ago
Distributed systems building is not a precondition to big data ML. Most of those systems have been built and commoditized...to such an extent that the difference between having one and not boils down to a command line flag. I routinely run ML algos in local mode on my mac on a small dataset. Once its up to snuff, I turn off the--local flag, and it now runs on giant MR clusters over terabytes of data. I personally have not done any changes other than turning off the local flag.
1 comments

Sure, lots of existing ML algorithms have efficient big-data implementations. But for new algorithm development, my (admittedly limited) experience is that the Matlab-prototyping stage usually comes well before the implement-at-scale stage. You're right that modern tools effectively abstract out a lot of the difficulty of implementing at scale, but IMHO it's still generally not the first thing you'd want to do.