Hacker News new | ask | show | jobs
by jamesmishra 3210 days ago
Former Uber engineer here.

I don't think you would find an incredible amount of use from an open-sourced Michelangelo. The biggest advantage that Michelangelo has for Uber is that it is easy to integrate into all of Uber's other tools.

Depending on what your machine learning needs are, you could get pretty far with just Spark + MLLib, and wouldn't need any of the customization that Michelangelo has on top.

2 comments

This is the sense that I got. I am a one-person data science team for my startup, and I basically cobbled together most of the automation described in Michaelangelo over the course of a few months. Spinning off Spark ML jobs on EMR and saving metadata to a database.
Why make all that noise with a detailed blog post then? If it's a custom-fit internal tool, then good for you, the rest of the world doesn't care. Each company has internal tools and stuff.
There is the sharing of ideas. Maybe they couldn't open source it, but were given permission to publish about it. Google never opensourced some of their greatest contributions, just the ideas behind them.
I think blog posts like these are an interesting way to show off what goes on in a large company like Uber.

If you're a tiny startup, then Spark + MLLib is more than enough. Even that would be overkill if your data fits on a single machine.

But if you're at a young, but quickly-growing company with:

- terabytes of data

- tens of thousands of features extracted from the data

- dozens or hundreds of unique machine learning models being tweaked over time

then hopefully a blog post like this is helpful. It shows off various effective patterns for solving machine learning patterns at scale. Presumably, you'll want to build your own internal system with its own set of hooks, but the best practices and lessons learned should be roughly the same.

Recruitment, internal PR