| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jamesmishra 3209 days ago

I think blog posts like these are an interesting way to show off what goes on in a large company like Uber.

If you're a tiny startup, then Spark + MLLib is more than enough. Even that would be overkill if your data fits on a single machine.

But if you're at a young, but quickly-growing company with:

- terabytes of data

- tens of thousands of features extracted from the data

- dozens or hundreds of unique machine learning models being tweaked over time

then hopefully a blog post like this is helpful. It shows off various effective patterns for solving machine learning patterns at scale. Presumably, you'll want to build your own internal system with its own set of hooks, but the best practices and lessons learned should be roughly the same.