Hacker News new | ask | show | jobs
by bdod6 2598 days ago
I had this exact same thought when I read the headline. It seems like MS and others are viewing ML as a similar opportunity to Big Data/BI ten years ago. You saw the "democratization of data" as people with little technical skills could suddenly create analytics dashboards within tools like Tableau.

In my opinion, it's far too easy to make a critical mistake during design/implementation of ML to follow this same path. And what's more, if you mess up making an analytics dashboard, it's usually fairly obvious. In ML, there are MANY ways to mess up a model and you have no easy way to tell.

If someone doesn't have the technical experience behind creating these models, I would not trust any output they give me from using one of these tools. And if they do have the experience, they would certainly not be choosing to use one of these tools either.

2 comments

Can you please elaborate more on what kind of critical mistakes a machine can make, while someone with math background would not make.

I am building a competing tool, so I am not affiliate with MS, but I do think that auto ML has value.

Machine learning is different from imperative programming in such that most of the "programming" is done by experiments and not with actual "program", hence there is an opportunity to replace programming with compute. I.e. an automl platform can create 100's of models/pipelines and just try them all.

Also, why would you trust a model which was created manually and not a model which was auto created.

When a model is created in auto ML it pass the same validation process as manually created model, so in both cases the quality of the model should be judged independent from the way that it was created.

In addition, all models (regardless of how they were created - human / not human), should be monitored for predictive performance. I.e. I will not "trust" any model without continuous verification.

A common error is target leaking. An AutoML system will likely consider this a "strong feature". This is where having someone that actually understands the business domain is critical.

There's no question that there's value in AutoML system yet most ML production systems I've worked on / seen were way more complex than feature vector -> model -> prediction. You likely have multiple models, pipelines, normalizations and plain old conditionals. Hard to automate all of this.

Right. I am aiming at the group of companies that have 0 data scientist and would like to avoid hiring one. I assume that their use cases is simple/common and can be automated.

Note that automation is not only building the model, but automating the full life cycle - pre processing, hp optimization , pipeline deployment and monitoring/retraining.

> "Can you please elaborate more on what kind of critical mistakes a machine can make, while someone with math background would not make. I am building a competing tool"

the short answer is, go study stats and fundamentals of ML instead of asking hn to build your product for you.

> "why would you trust a model which was created manually and not a model which was auto created."

one of many reasons: domain knowledge is important, and math alone cant tell you things are muffed up. contrived example: you build a linear regression model to predict home price and square footage has a negative coefficient. Math conclusion: bigger house = lower price. domain knowledge: oh, we are missing a feature and the model cant tell the difference between city homes vs rural.

there is value to auto ml but there is a lot of room to go horribly wrong

Again, my point is that for a given data set, an auto ml system is much more efficient and radically cheaper than human modeler.

You are pointing to an area outside the realm of automl (feature engineering/generation) , which is domain specific. But this was not my original question.

this has nothing to do with feature engineering and generation. I never added or changed any features in the example. It is exactly in the realm of automl, you run a model, -because- you are missing data, your model is making wrong assumptions.

You could argue (which you didn't) that this would fall under model interpretation, but a model in this example would probably fail to generalize and make bad predictions in the future: IE slamming home values because they have large square footage.

>In ML, there are MANY ways to mess up a model and you have no easy way to tell.

What about all those businesspeople who only hire analysts to tell them (and their peers) what they want to hear? Now they can tell themselves what they want to hear, having laundered it through a computer.